JalView

Alignment as a reflection of evolution

Task 1

We have to make a sequence alignment of 6 sequences of proteins from family HSP70. I used the program "seqret" (EMBOSS) to load these sequences of proteins in format FASTA. Then I merged into one file and made an alignment by Jalview.

Here You can see some information about proteins I used:
Table 1.

Entry Entry name Protein name Organism Supergroup

Q8TQR2 DNAK_METAC Chaperone protein DnaK (HSP70) Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A) Archaea

Q18GZ4 DNAK_HALWD Chaperone protein DnaK (HSP70) Haloquadratum walsbyi (strain DSM 16790 / HBSQ001) Archaea

Q13E60 DNAK_RHOPS Chaperone protein DnaK (HSP70) Rhodopseudomonas palustris (strain BisB5) Bacteria

Q73GL7 DNAK_WOLPM Chaperone protein DnaK (HSP70) Wolbachia pipientis wMel Bacteria

Q74ZJ0 HSP7F_ASHGO Heat shock protein homolog SSE1 Ashbya gossypii (strain ATCC 10895 / CBS 109.51 / FGSC 9923 / NRRL Y-1056) (Yeast) (Eremothecium gossypii) Eukaryota

Q8SQR8 HSP7F_ENCCU Heat shock protein homolog SSE1 Encephalitozoon cuniculi (strain GB-M1) (Microsporidian parasite) Eukaryota

*HSP70 = Heat shock protein 70 (Heat shock 70 kDa protein)

The alignment was made by Jalview by program Tcoffee with Defaults, colored by ClustalX with Identity Threshold = 100%. There are labels (identity 80%, plurality 100%, gaps): C - positions which are conserved 80% or more, F - absolutely functionally conserved, G - positions with gaps. To see the whole alignment press here.

About functionally conserved:

L and I are aiphatic amino acid residues with tne same numbers of atoms;
T and S have an alcohol group (-OH);
K and R are positively charged amino acid residues

Picture 1. The part of alignment of 6 sequences of proteins from family HSP70 wis labels.

The datas about alignment was taken by program "infoalign" (EMBOSS).

Table 2. Data when the conserved 100%.

Name SeqLen AlignLen GapLen % of GapLen Ident % of Ident Similar % of Similar

DNAK_METAC_1-617 617 847 230 27,15 31 3,66 0 0

DNAK_HALWD_1-641 641 847 206 24,32 31 3,66 0 0

DNAK_RHOPS_1-633 633 847 214 25,27 31 3,66 0 0

DNAK_WOLPM_1-640 640 847 207 24,44 31 3,66 0 0

HSP7F_ASHGO_1-697 697 847 150 17,71 31 3,66 0 0

HSP7F_ENCCU_1-658 658 847 189 22,31 31 3,66 0 0

Table 3. Data when the conserved 70%.

Name SeqLen AlignLen GapLen % of GapLen Ident % of Ident Similar % of Similar

DNAK_METAC_1-617 617 847 230 27,15 284 33,53 12 1,42

DNAK_HALWD_1-641 641 847 206 24,32 282 33,29 9 1,06

DNAK_RHOPS_1-633 633 847 214 25,27 294 34,71 4 0,47

DNAK_WOLPM_1-640 640 847 207 24,44 296 34,49 4 0,47

HSP7F_ASHGO_1-697 697 847 150 17,71 129 15,23 53 6,26

HSP7F_ENCCU_1-658 658 847 189 22,31 91 10,74 57 6,73

Table 4. Data when the functionally conserved 100%. (calculate by python script with matrix BLOSUM62)

Name SeqLen AlignLen GapLen % of GapLen Ident % of Ident Similar % of Similar

DNAK_METAC_1-617 617 847 230 27,15 31 3,66 110 12,00

DNAK_HALWD_1-641 641 847 206 24,32 31 3,66 110 12,00

DNAK_RHOPS_1-633 633 847 214 25,27 31 3,66 110 12,00

DNAK_WOLPM_1-640 640 847 207 24,44 31 3,66 110 12,00

HSP7F_ASHGO_1-697 697 847 150 17,71 31 3,66 110 12,00

HSP7F_ENCCU_1-658 658 847 189 22,31 31 3,66 110 12,00

Table 5. Average values.

AlignLen GapLen % Identity

identity 100% % plurality 100% % identity 70% %

847 199,33 23,53 31 3,66 246 29,04 31 3,66

AlignLen	GapLen	%	Identity
identity 100%	%	plurality 100%	%	identity 70%	%
847	199,33	23,53	31	3,66	246	29,04	31	3,66

Task 2

I used a sequence of protein called Human Insulin. With help of "msbar" (EMBOSS) I have done 7 generations of mutant insulin (7 artificial point mutations in every generation), then I combined these mutant generations and the origin one in the file (see "Bash script").

Bash script (task 2)

The alignment was made by Jalview by program Tcoffee with Defaults, colored by ClustalX with Identity Threshold = 100%. To see the whole alignment press on picture or here.

Picture 2. The alignment made by program.

Table 5. The list of artificial mutations in insulin.

Position Type of point mutation Generation

2 insert of S p3 -> p4

3 deletion of A p4 -> p5

9 insert of N p1 -> p2

10 insert of L p1 -> p2

11 insert of L p1 -> p2

12 insert of L p6 -> p7

21 insert of N origin -> p1

29 deletion of A p2 -> p3

29 insert of A p5 -> p6

30 insert of Y p2 -> p3

31 insert of A p2 -> p3

36 insert of V p5 -> p6

75 replacement R to S p3 -> p4

78 replacement E to R origin -> p1

Picture 3. The alignment made by program with some changes which I did.

I had to change some items in the alignment (To see the whole alignment press here):

There wasn't any amino acids before A in the origin sequence (insgead of M at the start of sequence), so I decided that there were an insert of S (position 2, p3 -> p4) and a deletion of A (position 3, p4 -> p5). Also, I fixed a computer alignment and changed the insert of S (position 2, p3 -> p4) and replacement A to S (position 3, p4 -> p5).
According the origin sequence the are insert of H (position 69, p2 -> p3) and conserved K (position 70). So, I fixed a computer alignment again and changed the replacement K to H (position 69, p2 -> p3) and insert of K (position 70, p2 -> p3).

Task 3

I used the nucleotide sequence of the gene of the protein called Human Insulin. With help of "msbar" (EMBOSS) I have done 7 generations of mutant insulin gene. Than with help "transeq" (EMBOSS) I made the transcriptions and constructed alignment of the respective mutant proteins (you can see the script). The alignment was made by Jalview by program Tcoffee with Defaults, colored by ClustalX with Identity Threshold = 60%.

Term II

Entry	Entry name	Protein name	Organism	Supergroup
Q8TQR2	DNAK_METAC	Chaperone protein DnaK (HSP70)	Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A)	Archaea
Q18GZ4	DNAK_HALWD	Chaperone protein DnaK (HSP70)	Haloquadratum walsbyi (strain DSM 16790 / HBSQ001)	Archaea
Q13E60	DNAK_RHOPS	Chaperone protein DnaK (HSP70)	Rhodopseudomonas palustris (strain BisB5)	Bacteria
Q73GL7	DNAK_WOLPM	Chaperone protein DnaK (HSP70)	Wolbachia pipientis wMel	Bacteria
Q74ZJ0	HSP7F_ASHGO	Heat shock protein homolog SSE1	Ashbya gossypii (strain ATCC 10895 / CBS 109.51 / FGSC 9923 / NRRL Y-1056) (Yeast) (Eremothecium gossypii)	Eukaryota
Q8SQR8	HSP7F_ENCCU	Heat shock protein homolog SSE1	Encephalitozoon cuniculi (strain GB-M1) (Microsporidian parasite)	Eukaryota

Name	SeqLen	AlignLen	GapLen	% of GapLen	Ident	% of Ident	Similar	% of Similar
DNAK_METAC_1-617	617	847	230	27,15	31	3,66	0	0
DNAK_HALWD_1-641	641	847	206	24,32	31	3,66	0	0
DNAK_RHOPS_1-633	633	847	214	25,27	31	3,66	0	0
DNAK_WOLPM_1-640	640	847	207	24,44	31	3,66	0	0
HSP7F_ASHGO_1-697	697	847	150	17,71	31	3,66	0	0
HSP7F_ENCCU_1-658	658	847	189	22,31	31	3,66	0	0

Position	Type of point mutation	Generation
2	insert of S	p3 -> p4
3	deletion of A	p4 -> p5
9	insert of N	p1 -> p2
10	insert of L	p1 -> p2
11	insert of L	p1 -> p2
12	insert of L	p6 -> p7
21	insert of N	origin -> p1
29	deletion of A	p2 -> p3
29	insert of A	p5 -> p6
30	insert of Y	p2 -> p3
31	insert of A	p2 -> p3
36	insert of V	p5 -> p6
75	replacement R to S	p3 -> p4
78	replacement E to R	origin -> p1