ROC-curve

ROC-curve



Task 1

I've chosen the very same protein family, as in pr.7, but with only one domain (the first one) and taxon: Eukaryota.
AC: PF01237
ID: Oxysterol_BP
Domain function: lipid-binding, implicated in many cellular processes related with oxysterol, including signaling, vesicular trafficking, lipid metabolism, and nonvesicular sterol transport.
Pfam link
Using Jalview, I created Pfam seed alignment and removed some odd sequences. You can see the result here: new_pr8.fasta


Fig. 1. A part of my alignment

Task 2

My next step was to build and calibrate an HMM profile. Using hmm2build and hmm2calibrate, I was able to receive an .out file with the table in it, which I imported to my Excel file.
It's available for download too: lets_find.out

Task 3

Then I used hmm2search to search for the profile in uniprot_sprot.fasta. Received sequenced was added to list 2 'hmm2search' in my Excel file.

Task 4

Next step was to create a histogram and ROC-curve based on found sequences and their scores.



Fig. 2. Created histogram



Fig. 3. ROC-curve




Fig. 4. Data for chosen threshold
As we can see, ACU - Area Under Curve - isn't equal to 1, which means my prediction isn't perfect, but okay.
Precision and Recall are 1 (and that's great - everything i found is in), but the Accuracy value is only 67%.

Excel file: pr8.xlsx


Back to term 4 page 🚶

© Sophia Veselova, 2017.