JalView

Alignment as a reflection of evolution

Task 1

We have to make a sequence alignment of 6 sequences of proteins from family HSP70. I used the program "seqret" (EMBOSS) to load these sequences of proteins in format FASTA. Then I merged into one file and made an alignment by Jalview.

Here You can see some information about proteins I used:
Table 1.
EntryEntry nameProtein nameOrganismSupergroup
Q8TQR2DNAK_METACChaperone protein DnaK (HSP70)Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A)Archaea
Q18GZ4DNAK_HALWDChaperone protein DnaK (HSP70)Haloquadratum walsbyi (strain DSM 16790 / HBSQ001)Archaea
Q13E60DNAK_RHOPSChaperone protein DnaK (HSP70)Rhodopseudomonas palustris (strain BisB5)Bacteria
Q73GL7DNAK_WOLPMChaperone protein DnaK (HSP70)Wolbachia pipientis wMelBacteria
Q74ZJ0HSP7F_ASHGOHeat shock protein homolog SSE1Ashbya gossypii (strain ATCC 10895 / CBS 109.51 / FGSC 9923 / NRRL Y-1056) (Yeast) (Eremothecium gossypii)Eukaryota
Q8SQR8HSP7F_ENCCUHeat shock protein homolog SSE1Encephalitozoon cuniculi (strain GB-M1) (Microsporidian parasite)Eukaryota
*HSP70 = Heat shock protein 70 (Heat shock 70 kDa protein)

The alignment was made by Jalview by program Tcoffee with Defaults, colored by ClustalX with Identity Threshold = 100%. There are labels (identity 80%, plurality 100%, gaps): C - positions which are conserved 80% or more, F - absolutely functionally conserved, G - positions with gaps. To see the whole alignment press here.

    About functionally conserved:
  1. L and I are aiphatic amino acid residues with tne same numbers of atoms;
  2. T and S have an alcohol group (-OH);
  3. K and R are positively charged amino acid residues

Picture 1. The part of alignment of 6 sequences of proteins from family HSP70 wis labels.

The datas about alignment was taken by program "infoalign" (EMBOSS).

Table 2. Data when the conserved 100%.
NameSeqLenAlignLenGapLen% of GapLenIdent% of IdentSimilar% of Similar
DNAK_METAC_1-61761784723027,15313,6600
DNAK_HALWD_1-64164184720624,32313,6600
DNAK_RHOPS_1-63363384721425,27313,6600
DNAK_WOLPM_1-64064084720724,44313,6600
HSP7F_ASHGO_1-69769784715017,71313,6600
HSP7F_ENCCU_1-65865884718922,31313,6600

Table 3. Data when the conserved 70%.
NameSeqLenAlignLenGapLen% of GapLenIdent% of IdentSimilar% of Similar
DNAK_METAC_1-61761784723027,1528433,53121,42
DNAK_HALWD_1-64164184720624,3228233,2991,06
DNAK_RHOPS_1-63363384721425,2729434,7140,47
DNAK_WOLPM_1-64064084720724,4429634,4940,47
HSP7F_ASHGO_1-69769784715017,7112915,23536,26
HSP7F_ENCCU_1-65865884718922,319110,74576,73

Table 4. Data when the functionally conserved 100%. (calculate by python script with matrix BLOSUM62)
NameSeqLenAlignLenGapLen% of GapLenIdent% of IdentSimilar% of Similar
DNAK_METAC_1-61761784723027,15313,6611012,00
DNAK_HALWD_1-64164184720624,32313,6611012,00
DNAK_RHOPS_1-63363384721425,27313,6611012,00
DNAK_WOLPM_1-64064084720724,44313,6611012,00
HSP7F_ASHGO_1-69769784715017,71313,6611012,00
HSP7F_ENCCU_1-65865884718922,31313,6611012,00

Table 5. Average values.
AlignLenGapLen%Identity
identity 100%%plurality 100%%identity 70%%
847199,3323,53313,6624629,04313,66

Task 2

I used a sequence of protein called Human Insulin. With help of "msbar" (EMBOSS) I have done 7 generations of mutant insulin (7 artificial point mutations in every generation), then I combined these mutant generations and the origin one in the file (see "Bash script").

Bash script (task 2)

The alignment was made by Jalview by program Tcoffee with Defaults, colored by ClustalX with Identity Threshold = 100%. To see the whole alignment press on picture or here.

Picture 2. The alignment made by program.

Table 5. The list of artificial mutations in insulin.
PositionType of point mutationGeneration
2insert of Sp3 -> p4
3deletion of Ap4 -> p5
9insert of Np1 -> p2
10insert of Lp1 -> p2
11insert of Lp1 -> p2
12insert of Lp6 -> p7
21insert of Norigin -> p1
29deletion of Ap2 -> p3
29insert of Ap5 -> p6
30insert of Yp2 -> p3
31insert of Ap2 -> p3
36insert of Vp5 -> p6
75replacement R to Sp3 -> p4
78replacement E to Rorigin -> p1

Picture 3. The alignment made by program with some changes which I did.

I had to change some items in the alignment (To see the whole alignment press here):

Task 3

I used the nucleotide sequence of the gene of the protein called Human Insulin. With help of "msbar" (EMBOSS) I have done 7 generations of mutant insulin gene. Than with help "transeq" (EMBOSS) I made the transcriptions and constructed alignment of the respective mutant proteins (you can see the script). The alignment was made by Jalview by program Tcoffee with Defaults, colored by ClustalX with Identity Threshold = 60%.


Term II
© Potanina Darya, 2017