Domain evolution
Last update on the 12th of April, 2018Here I reconstruct evolution of TUDOR domain within Metazoa kingdom.
File | Link |
---|---|
Jalview project with all proteins containing TUDOR domain | full.jvp |
ODS table with information about proteins | main.ods |
Jalview project with alignment of chosen proteins | chosen.jvp |
Jalview project with trimmed alignment of chosen proteins | chosen_cut.jvp |
Python script for ITOL alignment annotation | alignment2itol.py |
Python script for ITOL colour range annotation | ranges.py |
Tree in Newick formula | tree_ML_bootstrap.nwk |
Original ML tree (27Mb) | tree_boot_ranges_align.png |
Consensus tree (30Mb) | tree_consensus_ranges_align.png |
TUDOR domain
The TUDOR domain (PFAM AC: PF00567, ID: TUDOR) was first discovered in Drosophila melanogaster Tudor protein. Tudor domain proteins function as molecular adaptors, binding methylated arginine or lysine residues on their substrates to promote physical interactions and the assembly of macromolecular complexes[1]. The domain is included in many proteins with 166 different domain architectures with 6269 sequences. The table 1 represents two domain architectures chosen for subsequent analysis.
Architecture | Amount of proteins | Ecdysozoa representatives | Chordata representatives |
---|---|---|---|
TUDOR_x1 | 842 | 526 | 159 | TUDOR_x5 | 94 | 31 | 59 |
The evolution analysis was done within Metazoa taxon (Animals) in two subtaxa: Ecdysozoa (protostome animals posessing three-layered cuticle)
and Chordata (deuterostome animals posessing notochord). Distribution of two domain architectures between these subtaxa can also be observed in
table 1. All proteins containing two domain architectures were taken into Jalview project full.jvp
with 3D structure of RNF17_HUMAN protein appended.
Then, 30 proteins for each architecture and each subtaxa were taken. Information about them is combined into main.ods
file
(Note: TUDOR_x1 is named arch_1 and TUDOR_x5 is named arch_4). The alignment of these proteins were put into chosen.jvp
project with leading
codes N_W, where N = {1, 5} for amount of TUDOR domains and W = {C, E} for the first letters of corresponding subtaxa.
Domain phylogeny
The quick observation of alignment (suppl. fig. 1) tells us that 1C group is distinct from the others: most domains of 1C group lack the N-part.
The multiple alignment was cleared of columns with many gaps not corresponding to certain secondary structure annotation and put in chosen_cut.jvp
project. Then laignment was extracted in fasta format and the tree was build with Maximum Likelyhood algorithm in MEGA 7. Bootstrap analysis was also conducted.
The Newick formula is presented in tree_ML_bootstrap.nwk file.
The original ML and consensus trees were next uploaded in ITOL for further visualization. Two improvements were done: labels were coloured based on the group affilation and lines of alignment with ClustalX colours were added. Annotation files were obtained with python scripts (ranges.py for colour ranges and alignment2itol.py for alignment adaptation) from the alignment in fasta format. Images were then exported in png format: original tree (tree_boot_ranges_align.png) with bootstrap scores ranged from 0 to 1 and bootstrap consensus tree (tree_consensus_ranges_align.png). The images are large so they were put in distinct files.
Original tree
First, 5-domain sequences are tightly clustered together; only such branches were widely supported in bootstrap consensus. However, 5W from the same species are rarely found (5E_Q5TN74_ANOGA for instance). 5ะก-proteins are more conservative than 5E-proteins in same clades based on alignment. 1-domain proteins are scattered on the tree with two clades of 1C-proteins. Two Ecdysozoans ONCVO and 9BILA possesses many 1-domain proteins in various branches, others do not show same species 1-domain proteins in same clades. Rarely 1W and 5W proteins from same species were found in same clades (Chordates APTFO and COLST).
Bootstrap tree
The same observations were made with strong evidence of 5E and 5C clustering together. However, the tree is less informative in terms of integration as proteins from different taxa and shuffled.
Conclusions
To explain observations several propositons were made:
- Very-very old common ancestor of Chordata and Ecdysozoa possessed only 1-domain proteins (parsimony);
- The last common ancestor of Chordata and Ecdysozoa (probably, Bilaterian) has already possessed 1- and 5-domain proteins as 5E and 5C are tightly clustered together;
- Each domain in 5-domain proteins radiated separately from others;
- Chordata is a more tight taxon then Ecdysozoa as their domains are more conservative and there are several 1C clades;
- 1 and 5-domain proteins from same species rarely clustered together due to small size of same species in 1W and 5W group.
The investigation of smaller events is difficult due to sample size.
References
- Jun Wei Pek, Amit Anand, Toshie Kai. Tudor domain proteins in development, Development 2012 139: 2255-2266; doi: 10.1242/dev.073304.