< 2^nd term

Multiple alignment and domains

Last update on the 4^th of May, 2017

Here I study algorithms of multiple alignment and domain architectures. I worked with differing algorithms of multiple alignment in Jalview, observed several domains in Pfam database, made out and inserted JavaScript code for displaying domain organisations as in Pfam site.

List of downloads
File	Link
Jalview project with all alignments of this task	project.jvp

Comparing multiple alignment algorithms

project.jvp

To align homologous proteins I chose these HSP70^[1] proteins with following Uniprot ACs: B5ZWQ2, A9A135, O24581, Q7V1H4, Q18GZ4, A0T0H7. I fetched sequences from Uniprot database and aligned them with T-Coffee and Mafft algorithms. T-Coffee uses progressive approach with library of pairwise alignments and takes into account known motifs and structural data^[2], Mafft uses fast Fourier transform^[3]. Then I put both alignments in one window and aligned them relatively to each other and mapped mismatches and gaps. The result is shown in the fig 1.

There are several mismatches. In alignment positions 81-82 in O24581 Asp is put by T-Coffee in column 81 with Asp and Lys, by Mafft in column 82 with Asp, Glu, Asn and Lys. Then, there ambiguity in columns 226-227 and 229-230: residues are varying on their properties, Mafft put them into more congregated block. Significant differences are observed at the end of alignments: T-Coffee avoid end indels and Mafft do not.

There are more differences but all of them are about minor shifts. In total, both algorithms align sequences almost in the same way, presenting putative blocks of homology. Differences are not so important especially at the end of alignments: it comes from differing algorithms' approaches.

Domains

Domain — is a consistent motif in secondary/tertiary protein structure which stays for some structural, catalytic or binding function. Pfam database provides information about all known protein domains (i.e. protein families) and allows to observe almost each protein organisation.

MerA^[4] protein consists of two domains^[5] (fig. 2): pyridine nucleotide-disulphide oxidoreductase (Pyr_redox_2, PF07992) and pyridine nucleotide-disulphide oxidoreductase, dimerisation domain (Pyr_redox_dim, PF02852). Former^[6] is a small NADH binding domain within a larger FAD binding domain, latter^[7] provides dimerisation of subunit.

Fig. 2. A0A126V644 domain organisation. Interactive scheme from Pfam site.

I chose Pyr_redox_2 domain and studied 3 out of 905 domain organisations presented in nature. The information is gathered in the table 1.

Table 1. Several domain organisations with Pyr_redox_2 domain.
Domains	Number of sequences	Sample protein (Uniprot ID)
Fer4_20, Pyr_redox_2	3818	5XT03_9GAMM
Molybdop_Fe4S4, Molybdopterin, Molydop_binding, Pyr_redox_2, Fer2_BFD	58	W6TQ69_9SPHI
Pyr_redox_2, AIF_C x 2	65	B4D4D7_9BACT

Observation of domain functions is shown in the table 2 with information from Pfam site.

Table 2. Observation of several domains from table 1.
Domain	Pfam ID	Name	Description
Fer4_20	PF14691	Dihydroprymidine dehydrogenase domain II, 4Fe-4S cluster	Binds FAD; Catalyses the first and rate-limiting step of pyrimidine degradation.
Molybdop_Fe4S4	PF04879	Molybdopterin oxidoreductase Fe4S4 domain	Is found in a number of reductase/dehydrogenase families; Molybdenium (Mo) or tungsten (W) containing.
Molybdopterin	PF00384	Molybdopterin	Is found in a number of molybdopterin-containing oxidoreductases, tungsten formylmethanofuran dehydrogenase subunit d (FwdD) and molybdenum formylmethanofuran dehydrogenase subunit (FmdD).
Molydop_binding	PF01568	Molydopterin dinucleotide binding domain	Is found in a number of molybdopterin-containing oxidoreductases, tungsten formylmethanofuran dehydrogenase subunit d (FwdD) and molybdenum formylmethanofuran dehydrogenase subunit (FmdD).
Fer2_BFD	PF04324	BFD-like [2Fe-2S] binding domain	Coordination of two Fe ions with two conserved cysteine residues; May be a general redox and/or regulatory component involved in the iron storage or mobilisation functions of bacterioferritin in bacteria Is found in several reductase/oxidases coping with N-O compounds.
AIF_C	PF14721	Apoptosis-inducing factor, mitochondrion-associated, C-term	A dimerisation domain of the mitochondrial apoptosis-inducing factor 1 Appears at the C-terminus of FAD-dependent pyridine nucleotide-disulfide oxidoreductases; On reduction with NADH, AIF undergoes dimerisation and forms tight, long-lived FADH2-NAD charge-transfer complexes proposed to be functionally important Bifunctional mitochondrial flavoprotein critical for energy metabolism and induction of caspase-independent apoptosis.

References

Wikipedia article about HSP70 family;
Wikipedia article about T-Coffee;
Wikipedia article about Mafft;
MerA article in this site;
MerA article in Pfam site;
Pfam page of Pyr_redox_2;
Pfam page of Pyr_redox_dim.