Database: UniProtKB/Swiss-Prot(swissprot)
Algorithm: blastp (by default)
Maximum number of displayed sequences: 100
E-value threshold: 0.05
Word length: 6
Alignment parameters: BLOSUM62, Existance: 11, Extension: 1 (by default)
The filter for areas of "low complexity" is set by default
Program text outputNext, 5 random finds were selected, of which I left only the proteins with the lowest E-value and made a multiple alignment.
Multiple alignmentI think that the selected proteins are homologous. This is evidenced by the small E-value and the rather large number of overlapping sites.
Selected polyprotein:
ID:POL2_BBWV2
AC:P03599
Organism:Broad bean wilt virus 2
Program text outputSelected protein: Large capsid protein
Coordinates:467 – 868
AlignmentSince there were only 7 hits in the output of the program, I decided to take them all. I decided to remove proteins P13561.1 and P03599.1 because they did not have a very high E-value and a small percentage of similarity to the original sequence
After a second search, the number of proteins found did not change, probably due to the fact that capsid proteins are found only in viruses.
According to Carlin's theorem, E-value = Kmn-e^(-λS), where n is the size of the database. Since all values except the database size in our case remain unchanged, we can estimate the fraction of viral proteins in Swissprot. To do this, we consider the ratio of E-values after and before the filter application. It turns out that the share of viral proteins in Swiss-prot is 6%