JalView. Multiple sequence aligning

Task 1

Used program: Jalview
Steps: Add sequence - UniProt - 6 IDs - Colour - ClustalX - Above Identity Threshold (100%) - Web service - TcoffeeWS


Table 1. Main information about chosen proteins

Entry nameLengthProtein nameOrganismPhylum
DNAK_THEVO 613 Chaperone protein DnaK Thermoplasma volcanium (strain ATCC 51530 / DSM 4299 / JCM 9571 / NBRC 15438 / GSS1) Archaea
DNAK_LACAC 614 Chaperone protein DnaK Lactobacillus acidophilus (strain ATCC 700396 / NCK56 / N2 / NCFM) Bacteria
DNAK_SALTY 638 Chaperone protein DnaK Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720) Bacteria
BIP1_ORYSJ 665 Heat shock 70 kDa protein BIP1 Oryza sativa subsp. japonica (Rice) Eukaryota
DNAK_HALWD 641 Chaperone protein DnaK Haloquadratum walsbyi (strain DSM 16790 / HBSQ001) Archaea
HSP74_DROME 641 Major heat shock 70 kDa protein Bbb Drosophila melanogaster (Fruit fly) Eukaryota

Used program: EMBOSS pack
Used command: infoalign


Table 2. Results of using infoalign EMBOSS command

NameAligned LengthGap Length% (Gap Length)Identity 100.0% (Identity)Identity 70.0% (Identity)Similarity 100.0%(Similarity)Similarity 70.0% (Similarity)
DNAK_THEVO_1-613 719 106 14.7 166 23.1 279 38.8 283 39.4 55 7.7
DNAK_LACAC_1-614 719 105 14.6 166 23.1 309 43.0 283 39.4 54 7.5
DNAK_SALTY_1-638 719 81 11.3 166 23.1 304 42.3 283 39.4 64 8.9
BIP1_ORYSJ_1-665 719 54 7.5 166 23.1 292 40.6 283 39.4 59 8.2
DNAK_HALWD_1-641 719 78 10.9 166 23.1 292 40.6 283 39.4 62 8.6
HSP74_DROME_1-641 719 78 10.9 166 23.1 284 39.5 283 39.4 64 8.9


Name - displays name and length of sequence with '_'
Aligned length - displays length of alignment
Gap length - displays number of gaps
Identity - displays the required number of identities at a position for it to give a consensus (if this is set to 100%, then displays only columns of identities contribute to the consensus, if 70.0 has been set, then it displays 70+ % of contribution to the consensus).
Similarity - displays a cut-off for the % of positive scoring matches below which there is no consensus (when similarity 100.0 is taken, it shows 100% of the total weight of all the sequences in the alignment, same with 70.0).


 
UNIPROT|Q97BG8|DNAK_THEVO/1-613UNIPROT|Q84BU4|Q5FJP4|DNAK_LACAC/1-614UNIPROT|Q56073|DNAK_SALTY/1-638UNIPROT|Q6Z7B0|O24182|BIP1_ORYSJ/1-665UNIPROT|Q18GZ4|DNAK_HALWD/1-641UNIPROT|Q9VG58|HSP74_DROME/1-641ConservationQualityConsensusTypes of conservation
102030405060708090100110120130140150160170180190200210220230240250260270280290300310320330340350360370380390400410420430440450460470480490500510520530540550560570580590600610620630640650660670680690700710MS-------------------------------KIIGIDLGTSNSAAAVVISGKPTVIPSSEGVSIGGKAFPSYVAFTKDGQMLVGEPARRQALLNPEGTIFAAKRKMGTDY--------------------------KFKV---FDKEFTPQQISAFILQKIKKDAEAFLGEPVNEAVITVPAYFNDNQRQATKDAGTIAGFDVKRIINEPTAAALAYGVDKSGK-SEKILVFDLGGGTLDVTIIEISK----RPNVQVLSTSGDTQLGGTDMDEAIVNYIADDFQKKEGIDLRKDRGAYIRLRDAAEKAKIELSTTLSSDIDLPYITVTSSGPKHIKMTLTRAKLEELISPIVERVKAPIDKALEGAKLKKTDITKLLFVGGPTRIPYVRKYVEDYL-GIKAEGGVDPMEAVAIGAAIQGAVLKGE----IKDIVLLDVTPLTLSVETLGGIATPIIPANTTIPVRKSQVFTTAEDMQTTVTIHVVQGERPLAKDNVSLGMFNLTGIAPAPRGIPQIEVTFDIDSNGILNVTAVDKATGKKQGITITASTK-LSKDEIERMKKEAEQYAEQDRKMKEQIETLNNAESLAYSVEKTLNEA---GDKVDKETKDRILSEVKDLRKAIEEK-N--MDNVKTLMEKISKDIQEVGTKMYQSASSTTQ-T-GSGNQN---------------------SSKQENDKTVDAEY---K---EKSMS-------------------------------KVIGIDLGTTNSAVAVLEGKEPKIITNPEG----NRTTPSVVAFK-DGEIQVGEVAKRQAITNP-NTIVSIKRHMGEAD-------------------------YKVKV---GDKSYTPQEISAFILQYIKKFSEDYLGEEVKDAVITVPAYFNDAQRQATKDAGKIAGLNVQRIINEPTASALAYGLDKDDD-DEKVLVYDLGGGTFDVSVLQLGD----G-VFQVLSTNGDTHLGGDDFDNRIMDWLIKNFKDENGVDLSKDKMAMQRLKDASEKAKKDLSGVSSTHISLPFISAGESGPLHLEADLTRAKFDELTNDLVEKTKIPFDNALKDAGLTVNDIDKVILNGGSTRIPAVQKAVKEWA-GKEPDHSINPDEAVALGAAIQGGVISGD----VKDIVLLDVTPLSLGIETMGGVFTKLIDRNTTIPTSKSQIFSTAADNQPAVDVHVLQGERPMAADDKTLGRFELTDIPPAPRGVPQIQVTFDIDKNGIVNVSAKDMGTGKEQKITIKSSSG-LSDEEIKRMQKDAEEHAEEDKKRKDEADLRNEVDQLIFTTEKTLKET---KGKVSDEDTKKVQEALDDLKKAQKDN-N--LDEMKEKKDALSKAAQDLAVKLYQQNGGAQG-AAGQAGPQ-----------------GG-NPNDGNNGGAQDGEFHKVD---PNKMG-------------------------------KIIGIDLGTTNSCVAIMDGTQARVLENAEG----DRTTPSIIAYTQDGETLVGQPAKRQAVTNPQNTLFAIKRLIGRRFQDEEVQRDVSIMPYKIIGADNGD--AWLDV---KGQKMAPPQISAEVLKKMKKTAEDYLGEPVTEAVITVPAYFNDAQRQATKDAGRIAGLEVKRIINEPTAAALAYGLDKEVG-NRTIAVYDLGGGTFDISIIEIDEVDGEK-TFEVLATNGDTHLGGEDFDTRLINYLVDEFKKDQGIDLRNDPLAMQRLKEAAEKAKIELSSAQQTDVNLPYITADATGPKHMNIKVTRAKLESLVEDLVNRSIEPLKVALQDAGLSVSDINDVILVGGQTRMPMVQKKVAEFF-GKEPRKDVNPDEAVAIGAAVQGGVLTGD----VKDVLLLDVTPLSLGIETMGGVMTPLITKNTTIPTKHSQVFSTAEDNQSAVTIHVLQGERKRASDNKSLGQFNLDGINPAPRGMPQIEVTFDIDADGILHVSAKDKNSGKEQKITIKASSG-LNEEEIQKMVRDAEANAESDRKFEELVQTRNQGDHLLHSTRKQVEEA---GDKLPADDKTAIESALNALETALKGE-D--KAAIEAKMQELAQVSQKLMEIAQQQHAQQQA---GSADAS---------------------ANNAKDDDVVDAEFEEVK---DKKMDRVRGCAFLLGVLLAGSLFAFSVAKEETKKLGTVIGIDLGTTYSCVGVYKNGHVEIIANDQG----NRITPSWVAFT-DSERLIGEAAKNQAAVNPERTIFDVKRLIGRKFEDKEVQRDMKLVPYKIVN-KDGKPYIQVKIKDGENKVFSPEEVSAMILGKMKETAEAYLGKKINDAVVTVPAYFNDAQRQATKDAGVIAGLNVARIINEPTAAAIAYGLDKKGG-EKNILVFDLGGGTFDVSILTIDN----G-VFEVLATNGDTHLGGEDFDQRIMEYFIKLIKKKYSKDISKDNRALGKLRREAERAKRALSNQHQVRVEIESLFDG----TDFSEPLTRARFEELNNDLFRKTMGPVKKAMDDAGLEKSQIHEIVLVGGSTRIPKVQQLLRDYFEGKEPNKGVNPDEAVAYGAAVQGSILSGEGGDETKDILLLDVAPLTLGIETVGGVMTKLIPRNTVIPTKKSQVFTTYQDQQTTVSIQVFEGERSMTKDCRLLGKFDLSGIPAAPRGTPQIEVTFEVDANGILNVKAEDKGTGKSEKITITNEKGRLSQEEIDRMVREAEEFAEEDKKVKERIDARNQLETYVYNMKNTVGDKDKLADKLESEEKEKVEEALKEALEWLDENQTAEKEEYEEKLKEVEAVCNPIISAVYQRTGGAPG-G-G-ADG---------------------------EGGVD-DE----H---DELMAS-----------------------------NKILGIDLGTTNSAFAVMEGDDPEIIVNAEG----DRTTPSVVAMTDDEERLVGKPAKNQVIQNPDQTIRSIKRHMGEED-------------------------YTVEL---GGEDYTPEQVSAMILQKIKRDAEEYLGDEIEKAVITVPAYFNDRQRQATKDAGEIAGFEVDRIVNEPTAASMAYGLDDE-S-NQTILVYDLGGGTFDVSVLDLGG----G-VYEVVATNGDNDLGGDDWDDAVIDWLAGEFEDNHGIDLRDDRQALQRLKDAAEEAKIELSSRKETTINLPFITATDSGPVHLEETLSRAKFESLTEDLIERTVGPTEQALEDAGYDDSDIDEVILVGGSTRMPQVREKVEDLL-GTEPKKNVNPDEAVALGAAIQGGVLAGD----VDDIVLLDVTPLSLGIEVKGGLFERLIDKNTTIPTEESKVFTTAAANQTSVNVRVFQGEREIAEENELLGEFQLAGIPPAPAGTPQIEVTFNIDENGIVNVEAEDQGSGNAESITIEGGAG-LSDEEIEQMQEEAEAHAEEDERRRERIEARNSAESAVQRAETLLEEN---EEDIDDDLKESIEDEVESVEAVLEDE-DATKEEIEDVTESLSSELQEIGKQMYDAQQAAAGAGAGAAGAGAGAGPGGMGDMGDMGDMGGAAGSGDADNEYVDADFEDVDDDTKDEM-------------------------------P-AIGIDLGTTYSCVGVYQHGKVEIIANDQG----NRTTPSYVAFT-DSERLIGDPAKNQVAMNPRNTVFDAKRLIGRKYDDPKIAEDMKHWPFKVVS-DGGKPKIGVEYK-GESKRFAPEEISSMVLTKMKETAEAYLGESITDAVITVPAYFNDSQRQATKDAGHIAGLNVLRIINEPTAAALAYGLDKNLKDERNVLIFDLGGGTFDVSILTIDE----GSLFEVRSTAGDTHLGGEDFDNRLVTHLAEEFKRKYKKDLRSNPRALRRLRTAAERAKRTLSSSTEATIEIDALFEG----QDFYTKVSRARFEELCADLFRNTLQPVEKALNDAKMDKGQIHDIVLVGGSTRIPKVQSLLQEFFHGKNLNLSINPDEAVAYGAAVQAAILSGDQSGKIQDVLLVDVAPLSLGIETAGGVMTKLIERNCRIPCKQTKTFSTYSDNQPGVSIQVYEGERAMTKDNNALGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAKEMSTGKAKNITIKNDKGRLSQAEIDRMVNEAEKYADEDEKHRQRITSRNALESYVFNVKQSVEQA--PAGKLDEADKNSVLDKCNETIRWLDSNTTAEKEEFDHKMEELTRHCSPIMTKMHQQGAGAAG-GPG-ANCG---------------------QQAGGFGGYSGPT---VE---EVD*1-------------------------------27+******87*96+963334659+2948*----5967**69*87-*48479*56*97*876**14*9357**59*534-------------------------13766---256287*589*859*479*748*49**62956**9********4*********2***87*2**9*****889***9*6403-6669799******8*989+6+53----4-668*48*7**94***6*8*64986887348665445*+64843*739*9548*7**54**73266496+55+535----45833398**9896*646+766653*664*955*574377*469987**6**9*3*84393976-*55553598*5****7***9*+89+4*9----75*99*9**8**8*89*84**9764+*44*94**845877*8*6457*67*696*68***25839635**4*7*55*67**4*7***8***79*38**95*4*39678*6364***57445-*954**57*459**34*96*7937747655*4794575567659374---34694344754956385464357645-7--6446625553944158397237674366656-2-*-+550---------------------0100045353146----5---332 MS+VRGCAFLLGVLLAGSLFAFSVAKEETKKL+KIIGIDLGTTNS+VAV+EGGKPEIIAN+EGVSIGNRTTPS+VAFT+DGERLVGEPAK+QA+TNPENTIF+IKRL+GRK++D+EVQRDMK++PYKIV+AD+GKP++KVKVKDG++K+FTPE+ISAMILQK+KKTAEAYLGE+++DAVITVPAYFNDAQRQATKDAG+IAGLNVKRIINEPTAAALAYGLDKEG+D+++ILV+DLGGGTFDVSIL+IDEVDGEG+VFEVL+TNGDTHLGGEDFDNRI++YLA+EFKKKYGIDLRKD+RALQRL+DAAEKAKIELSS+++T+I+LP+ITAG+SGPKH+EE+LTRAKFEELT+DLVERTKGPV+KALEDAGLDKSDI++VILVGGSTRIPKVQK+VE++F+GKEPNK+VNPDEAVA+GAA+QGGVLSGD++++VKDI+LLDVTPLSLGIETMGGVMTKLI+RNTTIPTKKSQVF+TA+DNQT+V+IHV+QGERPMAKDNK+LG+F+L+GIPPAPRG+PQIEVTFDIDANGILNVSAKDKGTGK+QKITIK+S+GRLS+EEI+RMV+EAE++AEED+KRKERI++RNQ+ESLV+++EKT+EEADK++DKLDDEDKE++E+AL++LEKAL++N++AEKEEIEEKMEELSKVCQ+I+TKMYQQ++GA+GAGAGSA+AGAGAGPGGMGDMGDMGDMGGA++++G++GG+VDAEFE+V+DDT+KKGFFFCCCGG

Figure 1. Alignment with Types of conservation row



F stands for columns, containing only conservative mutations. It is a kind of mutation, when an amino acid can be replaced only with a similar one (by 'similar one' I mean similarity in size, charge and other biochemical properties[1]).


C stands for columns, which are at least 80% consisting of the same aminoacid. If it is not absolutely conservative position than it can be F too, if all aminoacids are in the same group[2].
G is for positions with gaps.

Task 2

For this task I had to simulate the evolution of 100aa part of the protein sequence (one from the task 1 was taken). According to instruction, there were 7 new point mutations each 100 years and no block mutations was noticed.

Used program: EMBOSS pack
Used commands: msbar, descseq
Bash script (link)

Used program: Jalview
Steps: Add sequence - UniProt - 6 IDs - Colour - ClustalX - Above Identity Threshold (100%) - Web service - TcoffeeWS


 
P1/1-104P2/1-107P3/1-112P4/1-115P5/1-118P6/1-116P7/1-120ConservationQualityConsensus
102030405060708090100110120MD-RRGCAFL---LGVLLAN-GSLFAFSV-AKEETI-KKLGGTVIGID-LGTRTYSCV--VGVY-KNGGHVEII-A---ND-Q--GNRITPSWVAFTDSERLIGEAAKNQAAVNP-ERTIFDVMD-RRGCAFL---LGVLLANQGSLFAFSV-AKEETI-KKLEG-VIGIDCLGTRTYSCV--VGVCYGNGGHVEII-A---ND-Q-GGNRITPSWVAFTDSERLIGEAAKNQAAVNP-ERTIFDVMDGRRGCAFL---LGVLLANQGSLFAFSVGAKEETI-KKLEG-VIGIDCLGTRTYSCV-VVGVCRGNGGHVEII-AA-NND-Q-GGNRITPSWVAFTDSEYLIGEAAKNQAAVNP-ERTIFDVMDGRRGCAFL---LGVLLANQGSLFAFSVGAKEETIGKKLEGLVIGIDMLGTRTYSCV-VVGVCRGNGGHVEII-RAQNND-Q-GGNRITPSWVAFTDSENLIGEAAKNQAAVNP-ERTIFDVMDDRRGCAFLL--LGVLLANQGSLFAFSVGAKEETIGKKLEGLVIGIDMLGTRTYSCVVVVGVCRGNGGHVEII-RAQNND-QEQGGNITPSWVAFTDSENLIGEAAKNQAAVNP-ERTIFDVD-DRRGCAFLL--LGVLSANQGSL-AFSVGAKEETIGKKLEGLVIGIDMLGTRTYSVV-VVGVCRGNGGHVEII-RAQNND-QDQGGNITPSWVAFTDSENLIGEAAKNQAAVNPTERTIFDVD-DRRGCASILLELGVASANQGSL-AFSVGAKEETIGKKLEGLVIGIDMLGTRTYSVV-VVGVCRGNGGHVEVIIRAQNNDNQDQGQNITPSWVAFTDSENLIGEAAKNQAAVNPTERTIFDV5--*****5+---***75**1***-****-******-***4*-*****1*******9*--***715******9*-4---**-*-0*67************5**************-******* MDDRRGCAFLLLELGVLLANQGSLFAFSVGAKEETIGKKLEGLVIGIDMLGTRTYSCVVVVGVCRGNGGHVEIIIRAQNNDNQD+GNRITPSWVAFTDSENLIGEAAKNQAAVNPTERTIFDV

Figure 2. TcoffeeWS auto alignment
 
P1/1-104P2/1-107P3/1-112P4/1-115P5/1-118P6/1-116P7/1-120ConservationQualityConsensus
102030405060708090100110120MD-RRGCA-FL--LGVLLAN-GSLFAFSV-AKEETI-KKLGGTVIGID-LGTRTYSCV--VGV-YKNGGHVE-II-A--ND-Q--G-NRITPSWVAFTDSER-LIGEAAKNQAAVNP-ERTIFDVMD-RRGCA-FL--LGVLLANQGSLFAFSV-AKEETI-KKLEG-VIGIDCLGTRTYSCV--VGVCYGNGGHVE-II-A--ND-Q--GGNRITPSWVAFTDSER-LIGEAAKNQAAVNP-ERTIFDVMDGRRGCA-FL--LGVLLANQGSLFAFSVGAKEETI-KKLEG-VIGIDCLGTRTYSCV-VVGVCRGNGGHVE-II-AANND-Q--GGNRITPSWVAFTDSEY-LIGEAAKNQAAVNP-ERTIFDVMDGRRGCA-FL--LGVLLANQGSLFAFSVGAKEETIGKKLEGLVIGIDMLGTRTYSCV-VVGVCRGNGGHVE-IIRAQNND-Q--GGNRITPSWVAFTDSE-NLIGEAAKNQAAVNP-ERTIFDVMDDRRGCA-FLL-LGVLLANQGSLFAFSVGAKEETIGKKLEGLVIGIDMLGTRTYSCVVVVGVCRGNGGHVE-IIRAQNND-QEQGGN-ITPSWVAFTDSE-NLIGEAAKNQAAVNP-ERTIFDV-DDRRGCA-FLL-LGVLSANQGSL-AFSVGAKEETIGKKLEGLVIGIDMLGTRTYSVV-VVGVCRGNGGHVE-IIRAQNND-QDQGGN-ITPSWVAFTDSE-NLIGEAAKNQAAVNPTERTIFDV-DDRRGCASILLELGVASANQGSL-AFSVGAKEETIGKKLEGLVIGIDMLGTRTYSVV-VVGVCRGNGGHVEVIIRAQNNDNQDQGQN-ITPSWVAFTDSE-NLIGEAAKNQAAVNPTERTIFDV-*-*****-8*--***75**1***-****-******-***4*-*****1*******9*--***265******-**-*--**-*--*0*-************--**************-******* MDDRRGCASFLLELGVLLANQGSLFAFSVGAKEETIGKKLEGLVIGIDMLGTRTYSCVVVVGVCRGNGGHVEVIIRAQNNDNQDQGGNRITPSWVAFTDSERNLIGEAAKNQAAVNPTERTIFDV

Figure 3. Modified alignment

What positions have been changed:

  1. p6-p7, position 1: D shift to the right (to make 100% D column + metionine isn't close to aspartic acid by biochemical properties)
  2. p1-p6, position 9-10: F and L shift to the right (to make 100% F and L column; according to EBLOSUM62 matrix, is more likely that F changed to I instead of S
  3. p1, position 64: Y shift to the right (there is another Y in p2 so it seems logically to move p1 Y there too)
  4. p1-p6, positions 73-74: I shift to the right (to make 100% I column)
  5. p1-p3, position 76 (to make 100% A column); p4, positions 76-77 (A->Q mutation is more likely than A->R mutation. Despite the same score -1 in EBLOSUM62, glutamine is closer to alanine by some biochemical properties)
  6. p2-p4, position 85: G shift to the right (to make 100% G column)
  7. p1-p4, position 87: N shift to the right (to make 100% n column)
  8. p5-p7, position 88: R deletion (=> shift of all columns from 89 to 101)
  9. p4-p7, position 101: Y deletion; position 102: N insertion (each line should include 7 mutations. If I haven't changed it, there would be less than 7 of them + EBLOSUM62 score -2)

First 10 mutations

  1. Position 1: Met deletion
  2. Position 3: Glu insertion, Glu -> Asp
  3. Position 9: Ser insertion
  4. Position 12: Leu insertion
  5. Position 13: Glu insertion
  6. Position 17: Leu -> Ala
  7. Position 18: Leu -> Ser
  8. Position 21: Gln insertion
  9. Position 25: Phe deletion
  10. Position 30: Glu insertion

References

  1. A. Lenindger. Principles of Biochemistry.
  2. Kodomo Thesaurus


Back to term2 page 🚶

© Sophia Veselova, 2017.