Multiple alignment and domains
Last update on the 4th of May, 2017Here I study algorithms of multiple alignment and domain architectures. I worked with differing algorithms of multiple alignment in Jalview, observed several domains in Pfam database, made out and inserted JavaScript code for displaying domain organisations as in Pfam site.
File | Link |
---|---|
Jalview project with all alignments of this task | project.jvp |
Comparing multiple alignment algorithms
project.jvpTo align homologous proteins I chose these HSP70[1] proteins with following Uniprot ACs: B5ZWQ2, A9A135, O24581, Q7V1H4, Q18GZ4, A0T0H7. I fetched sequences from Uniprot database and aligned them with T-Coffee and Mafft algorithms. T-Coffee uses progressive approach with library of pairwise alignments and takes into account known motifs and structural data[2], Mafft uses fast Fourier transform[3]. Then I put both alignments in one window and aligned them relatively to each other and mapped mismatches and gaps. The result is shown in the fig 1.
There are several mismatches. In alignment positions 81-82 in O24581 Asp is put by T-Coffee in column 81 with Asp and Lys, by Mafft in column 82 with Asp, Glu, Asn and Lys. Then, there ambiguity in columns 226-227 and 229-230: residues are varying on their properties, Mafft put them into more congregated block. Significant differences are observed at the end of alignments: T-Coffee avoid end indels and Mafft do not.
There are more differences but all of them are about minor shifts. In total, both algorithms align sequences almost in the same way, presenting putative blocks of homology. Differences are not so important especially at the end of alignments: it comes from differing algorithms' approaches.
Domains
Domain — is a consistent motif in secondary/tertiary protein structure which stays for some structural, catalytic or binding function. Pfam database provides information about all known protein domains (i.e. protein families) and allows to observe almost each protein organisation.
MerA[4] protein consists of two domains[5] (fig. 2): pyridine nucleotide-disulphide oxidoreductase (Pyr_redox_2, PF07992) and pyridine nucleotide-disulphide oxidoreductase, dimerisation domain (Pyr_redox_dim, PF02852). Former[6] is a small NADH binding domain within a larger FAD binding domain, latter[7] provides dimerisation of subunit.
I chose Pyr_redox_2 domain and studied 3 out of 905 domain organisations presented in nature. The information is gathered in the table 1.
Organisation | Domains | Number of sequences |
Sample protein (Uniprot ID) |
---|---|---|---|
Fer4_20, Pyr_redox_2 | 3818 | 5XT03_9GAMM | |
Molybdop_Fe4S4, Molybdopterin, Molydop_binding, Pyr_redox_2, Fer2_BFD | 58 | W6TQ69_9SPHI | |
Pyr_redox_2, AIF_C x 2 | 65 | B4D4D7_9BACT |
Observation of domain functions is shown in the table 2 with information from Pfam site.
Domain | Pfam ID | Name | Description |
---|---|---|---|
Fer4_20 | PF14691 | Dihydroprymidine dehydrogenase domain II, 4Fe-4S cluster |
|
Molybdop_Fe4S4 | PF04879 | Molybdopterin oxidoreductase Fe4S4 domain |
|
Molybdopterin | PF00384 | Molybdopterin |
|
Molydop_binding | PF01568 | Molydopterin dinucleotide binding domain |
|
Fer2_BFD | PF04324 | BFD-like [2Fe-2S] binding domain |
|
AIF_C | PF14721 | Apoptosis-inducing factor, mitochondrion-associated, C-term |
|
References
- Wikipedia article about HSP70 family;
- Wikipedia article about T-Coffee;
- Wikipedia article about Mafft;
- MerA article in this site;
- MerA article in Pfam site;
- Pfam page of Pyr_redox_2;
- Pfam page of Pyr_redox_dim.