Work with alignments using BLAST

The local-resemblance map of two polyproteins

AlignmentIdentities, %Positives, %LengthGapsScoreScore, bits
Range 1: 1684 to 23272948615, 64459638250
Range 2: 1121 to 13833751277, 26326397157
Range 3: 298 to 7182640486, 42181307122
Range 4: 811 to 9172340121, 107288236,2
Range 5: 554 to 637244183, 84144622,3
Range 6: 1810 to 1837365324, 2844521,9
Range 7: 442 to 459396023, 1854120,4
Range 8: 1237 to 1262314829, 2634120,4
Range 9: 224 to 235587512, 1204020,0
Range 10: 894 to 929265538, 3623819,2


The best Score results are 250 and 157 bits.
For the first result: PO330 - Protein 3CD P03306 - RNA-directed RNA polymerase 3D-POL (1863 - 2332), Picornain 3C (1650 (1684) - 1862;
probably, there are one protein and the piece of other protein in this alignment
For the second result: P0330 - Protein 2C (probably, the part of rhe protein) Po3306 - Protein 2C (probably, the part of rhe protein)

Comparing the weight of alignment with random

ID1, ID2ScoreMedian (m)Upper quartile (Q1)Bits (B)Probability (P)
TRPB_ECOLI, TRPB_BACSU1102.055.2562.25149.78.626E-46
DNAA_ECOLI, FADA_ECOLI55.545.050.02.30.203

So, we can see that the first alignment is not accidentally; and the second may be appeared because of the occasional reasons

Increasing of penalty for gap extension (= 4.0)

ID1, ID2ScoreMedian (m)Upper quartile (Q1)Bits (B)Probability (P)
TRPB_ECOLI, TRPB_BACSU11003638.00532.54.03E-161
DNAA_ECOLI, FADA_ECOLI4434373.70.077

So, we can notice that increasing of penalty for gap extension (= 4.0)
gives us decrease of the probability of accidental alignment

Alignments using BLOSUM30

ID1, ID2ScoreMedian (m)Upper quartile (Q1)Bits (B)Probability (P)
TRPB_ECOLI, TRPB_BACSU1498297.75308.75109.21.34E-33
DNAA_ECOLI, FADA_ECOLI264277.5289.25-1.06 (uncorrect)2.08 (uncorrect)

With the help of BLOSUM30 we got results with higher significans of probability (P).
We can also see negative bit values, which are incorrect (even with the recount of score).
Thus, we can say that BLOSUM30 does not always work properly

Checking the formula for conversion to bits

Let's check our results using standard parameters.
In the first case we will look at trpb_ecoli and trpb_bacsu and in the second case - dnaa_ecoli and fada_ecoli.
We will shuffle trp_bacsu and fada_ecoli sequences for 1000 terms

ID1, ID2Upper Quartile (Q1)Median (m)Upper 1/8Bits (B)Probability (P)
TRPB_ECOLI, TRPB_BACSU61.2554.567.52.10.233
DNAA_ECOLI, FADA_ECOLI50.545.055.020.25

The bits'results smaller than 3 for a while

BLAST: search for homologs in the database

 
ACIDOrganismIdentities, %Positives, %LengthGapsScoreScore, bitsExpectCoverage, %
Q49492MYCF_MICGRMicromonospora griseorubida4258281, 253346022361,00E-76100
Q9L9F2NOVP_STRNVStreptomyces niveus (Streptomyces spheroides)5172207, 20925922325,00E-7573,7

© Belyaeva Julia, 2018