For my protein from pr7 there are only homologous proteins with E Value 0, search with standard parameters except:
So I tried to BLAST another archaea which has more reviewed proteins in UniProt: I chose Sulfolobus acidocaldarius (strain ATCC 33909 / DSM 639 / JCM 8929 / NBRC 15157 / NCIMB 11770) and protein DNA double-strand break repair helicase HerA (F2Z5Z6, HERA_SULAC). I made BLAST with similar parameters and had result which can be seen via the link. Also there is multiple alignment with homologous proteins is in the file.
I think these proteins are homologous because their percent identity are more than 30% and they have few conservative sites in alignment.
The results of the BLAST can be found via the link. The multiple alignment is at the link.
The results of the BLAST can be found at the link. There are 13 proteins in this result which is more than in the last (8 proteins). For the investigation I took the second protein (from the Drosophila, accession: O36966).
Task | E Value | Max Score | Total Score | Query Coverage | Percent Identity |
---|---|---|---|---|---|
2 | 1e-07 | 55.1 | 55.1 | 71% | 27.78 |
3 | 5e-09 | 55.1 | 55.1 | 71% | 27.78 |
As it seems (and this is logical) there is only one differ: in E Value, it is 20 times more in second task. So parameters which don't depend on bank volume don't change. Differs in E Value can be explained by differs in bank volume: in the second task it's larger than in third. The number of possible sequences decreases and therefore decreases E Value.
The number of proteins in Swiss-Prot is 569,213 (24.04.2023). Divide this number by 20 and get the estimated number of viral proteins: 28,460 proteins. There are 18,150 viral proteins in Swiss-Prot which is less than 10,310. Given the roughness of the estimate and several rounding (in the number of proteins, the assumed rounding of the E Value in the BLAST) the results are approximately the same.