Paralogs and orthologs
Last update on the 8th of March, 2018In eight bacteria all homologs of E. coli ATP-dependent Clp protease ATP-binding subunit ClpX (CLPX_ECOLI) protein, the member of AAA ATPase family, were found and their relationships were established using tree building software.
File | Link |
---|---|
Uniprot report | report.ods |
Tree in Newick formula | tree.tre |
Processing the data
tree.tre
The fasta file with CLPX_ECOLI sequence was obtained. Then, proteomes of eight bacteria were concatenated and transformed into BLAST+
database with makeblastdb -in proteomes.fasta -dbtype prot
. Then the query was processed:
blastp -query clpx.fasta -db ../proteomes/proteomes.fasta -evalue 0.001 -outfmt 7
. 30 unique proteins were found.
Their sequences were retrieved with seqret
and multiple alignment was perfomed by muscle
.
The tree was bulit with PHYLIP package. fprotdist
was used to calculate distance matrix and
fneighbor -data aligned.fprotdist -treetype u
has built the tree with UPGMA method. Visualization and editing
was performed in MEGA 7 (fig. 1).
Evolutionary relationships
Orthologs and paralogs are presented in fig. 2. Regarding evolution, orthology means protein radiation through speciation, whereas paralogy indicates gene duplication in same host organism.
Proteins under scrutiny
report.odsProtein accession codes were put into Uniprot and corresponding records were obtained. Several meaningful fields were included and report was downloaded as spreadsheet table. Then rows were sorted to reflect tree topology to ease the matching of terms with subtrees.
Large subtrees perfectly reflect protein names and protein families (fig. 3) thus proving determination of orthologs and paralogs in fig. 2. All proteins are capable of binding ATP but do various functions in cell such as protein folding and DNA recombination according to GO terms. For HslUV proteases and ATP-dependent zinc metalloproteases the cellular location is defined (cytoplasm and membrane, respectively). 27 out of 30 proteins were inferred from homology and the rest were only predicted.
Molecular phylogeny can display the uncompleteness of Uniprot database. The magnesium chelatase branch is tight but Uniprot records on these three proteins differ on pairwisely lacking several fields. The assembled description consists of protein name and family, ATP and DNA binding capabilities and participance in DNA replication initiation. Other clades only slightly vary in GO terms and keywords inside them.