< 2nd term

Multiple alignment and domains

Last update on the 4th of May, 2017

Here I study algorithms of multiple alignment and domain architectures. I worked with differing algorithms of multiple alignment in Jalview, observed several domains in Pfam database, made out and inserted JavaScript code for displaying domain organisations as in Pfam site.

List of downloads
File Link
Jalview project with all alignments of this task project.jvp

Comparing multiple alignment algorithms

project.jvp

To align homologous proteins I chose these HSP70[1] proteins with following Uniprot ACs: B5ZWQ2, A9A135, O24581, Q7V1H4, Q18GZ4, A0T0H7. I fetched sequences from Uniprot database and aligned them with T-Coffee and Mafft algorithms. T-Coffee uses progressive approach with library of pairwise alignments and takes into account known motifs and structural data[2], Mafft uses fast Fourier transform[3]. Then I put both alignments in one window and aligned them relatively to each other and mapped mismatches and gaps. The result is shown in the fig 1.

Fig. 1. Multiple alignment of some HSP70 proteins. Upper — T-Coffee, lower — Mafft. Added gaps are labeled with G, mismatches in alignments — with X. Coloured ClustalX.

There are several mismatches. In alignment positions 81-82 in O24581 Asp is put by T-Coffee in column 81 with Asp and Lys, by Mafft in column 82 with Asp, Glu, Asn and Lys. Then, there ambiguity in columns 226-227 and 229-230: residues are varying on their properties, Mafft put them into more congregated block. Significant differences are observed at the end of alignments: T-Coffee avoid end indels and Mafft do not.

There are more differences but all of them are about minor shifts. In total, both algorithms align sequences almost in the same way, presenting putative blocks of homology. Differences are not so important especially at the end of alignments: it comes from differing algorithms' approaches.

Domains

Domain — is a consistent motif in secondary/tertiary protein structure which stays for some structural, catalytic or binding function. Pfam database provides information about all known protein domains (i.e. protein families) and allows to observe almost each protein organisation.

MerA[4] protein consists of two domains[5] (fig. 2): pyridine nucleotide-disulphide oxidoreductase (Pyr_redox_2, PF07992) and pyridine nucleotide-disulphide oxidoreductase, dimerisation domain (Pyr_redox_dim, PF02852). Former[6] is a small NADH binding domain within a larger FAD binding domain, latter[7] provides dimerisation of subunit.

Fig. 2. A0A126V644 domain organisation. Interactive scheme from Pfam site.

I chose Pyr_redox_2 domain and studied 3 out of 905 domain organisations presented in nature. The information is gathered in the table 1.

Table 1. Several domain organisations with Pyr_redox_2 domain.
Organisation Domains Number of
sequences
Sample protein
(Uniprot ID)
Fer4_20, Pyr_redox_2 3818 5XT03_9GAMM
Molybdop_Fe4S4, Molybdopterin, Molydop_binding, Pyr_redox_2, Fer2_BFD 58 W6TQ69_9SPHI
Pyr_redox_2, AIF_C x 2 65 B4D4D7_9BACT

Observation of domain functions is shown in the table 2 with information from Pfam site.

Table 2. Observation of several domains from table 1.
Domain Pfam ID Name Description
Fer4_20 PF14691 Dihydroprymidine dehydrogenase domain II, 4Fe-4S cluster
  1. Binds FAD;
  2. Catalyses the first and rate-limiting step of pyrimidine degradation.
Molybdop_Fe4S4 PF04879 Molybdopterin oxidoreductase Fe4S4 domain
  1. Is found in a number of reductase/dehydrogenase families;
  2. Molybdenium (Mo) or tungsten (W) containing.
Molybdopterin PF00384 Molybdopterin
  1. Is found in a number of molybdopterin-containing oxidoreductases, tungsten formylmethanofuran dehydrogenase subunit d (FwdD) and molybdenum formylmethanofuran dehydrogenase subunit (FmdD).
Molydop_binding PF01568 Molydopterin dinucleotide binding domain
  1. Is found in a number of molybdopterin-containing oxidoreductases, tungsten formylmethanofuran dehydrogenase subunit d (FwdD) and molybdenum formylmethanofuran dehydrogenase subunit (FmdD).
Fer2_BFD PF04324 BFD-like [2Fe-2S] binding domain
  1. Coordination of two Fe ions with two conserved cysteine residues;
  2. May be a general redox and/or regulatory component involved in the iron storage or mobilisation functions of bacterioferritin in bacteria
  3. Is found in several reductase/oxidases coping with N-O compounds.
AIF_C PF14721 Apoptosis-inducing factor, mitochondrion-associated, C-term
  1. A dimerisation domain of the mitochondrial apoptosis-inducing factor 1
  2. Appears at the C-terminus of FAD-dependent pyridine nucleotide-disulfide oxidoreductases;
  3. On reduction with NADH, AIF undergoes dimerisation and forms tight, long-lived FADH2-NAD charge-transfer complexes proposed to be functionally important
  4. Bifunctional mitochondrial flavoprotein critical for energy metabolism and induction of caspase-independent apoptosis.

References

  1. Wikipedia article about HSP70 family;
  2. Wikipedia article about T-Coffee;
  3. Wikipedia article about Mafft;
  4. MerA article in this site;
  5. MerA article in Pfam site;
  6. Pfam page of Pyr_redox_2;
  7. Pfam page of Pyr_redox_dim.