Pracise 10. BLAST

1. 1. Searching for protein homologues in Swissprot

The following parameters were entered into BLAST to search:

Database: UniProtKB/Swiss-Prot(swissprot)

Algorithm: blastp (by default)

Maximum number of displayed sequences: 100

E-value threshold: 0.05

Word length: 6

Alignment parameters: BLOSUM62, Existance: 11, Extension: 1 (by default)

The filter for areas of "low complexity" is set by default

Program text output

Next, 5 random finds were selected, of which I left only the proteins with the lowest E-value and made a multiple alignment.

Multiple alignment

I think that the selected proteins are homologous. This is evidenced by the small E-value and the rather large number of overlapping sites.

2. Search in Swissprot for homologues of a mature viral protein excised from a polyprotein

Selected polyprotein:

ID:POL2_BBWV2

AC:P03599

Organism:Broad bean wilt virus 2

Program text output

Selected protein: Large capsid protein

Coordinates:467 – 868

Alignment

Since there were only 7 hits in the output of the program, I decided to take them all. I decided to remove proteins P13561.1 and P03599.1 because they did not have a very high E-value and a small percentage of similarity to the original sequence

3. A study of the dependence of E-value on the volume of the bank

After a second search, the number of proteins found did not change, probably due to the fact that capsid proteins are found only in viruses.

According to Carlin's theorem, E-value = Kmn-e^(-λS), where n is the size of the database. Since all values except the database size in our case remain unchanged, we can estimate the fraction of viral proteins in Swissprot. To do this, we consider the ratio of E-values after and before the filter application. It turns out that the share of viral proteins in Swiss-prot is 6%