ROC-curve.

		
Task 1: HMM-profile of the target family

The motif of the peptidyl-tRNA hydrolase (PTH, PfamAC-PF01195) was chosen, by which was recognised the target family - proteobacteria (phylum proteobacteria)

During the workshop, an Excel file was created with the "correct" findings of SwissProt, with findings on the HMM profile, a histogram of the weights found and ROC and PRC curves. Alignment used to build an HMM profile. The HMM-profile issued by the hmm2build command
		
		

Threshold search:

Histogram of finding's weights


Roc-curve


PRC-curve


From the graphs above, it can be concluded that this method is well suited for finding organisms related to proteobacteria (the ROC curve deviates strongly from the diagonal to the "better" side, like the PRC curve) The best threshold is 250 or more (score > 250), since the specifity is still 1, and sensitivity is about 81%, there is no way to gain too much, losing less than 20%

Table with search results at threshold > 250:
		
Positive (SwissProt) Negative (SwissProt)
Positive (predicted) 314 0
Negative (predicted) 74 335
Sensitivity: 0.81 Specificity: 1 Precision: 1


© Popov Nikita 2016