< 4th term

Paralogs and orthologs

Last update on the 8th of March, 2018

In eight bacteria all homologs of E. coli ATP-dependent Clp protease ATP-binding subunit ClpX (CLPX_ECOLI) protein, the member of AAA ATPase family, were found and their relationships were established using tree building software.

Table of downloads
File Link
Uniprot report report.ods
Tree in Newick formula tree.tre

Processing the data

tree.tre

The fasta file with CLPX_ECOLI sequence was obtained. Then, proteomes of eight bacteria were concatenated and transformed into BLAST+ database with makeblastdb -in proteomes.fasta -dbtype prot. Then the query was processed: blastp -query clpx.fasta -db ../proteomes/proteomes.fasta -evalue 0.001 -outfmt 7. 30 unique proteins were found. Their sequences were retrieved with seqret and multiple alignment was perfomed by muscle.

The tree was bulit with PHYLIP package. fprotdist was used to calculate distance matrix and fneighbor -data aligned.fprotdist -treetype u has built the tree with UPGMA method. Visualization and editing was performed in MEGA 7 (fig. 1).

Fig. 1. The tree of CLPX_ECOLI homologs in certain range of bacteria.

Evolutionary relationships

Orthologs and paralogs are presented in fig. 2. Regarding evolution, orthology means protein radiation through speciation, whereas paralogy indicates gene duplication in same host organism.

Fig. 2. Relationships between proteins.
Two orthology groups are bounded in brackets. Pairs of paralogs are coloured in shades of green.

Proteins under scrutiny

report.ods

Protein accession codes were put into Uniprot and corresponding records were obtained. Several meaningful fields were included and report was downloaded as spreadsheet table. Then rows were sorted to reflect tree topology to ease the matching of terms with subtrees.

Fig. 3. Protein names (left) and corresponding families (right) in the tree.

Large subtrees perfectly reflect protein names and protein families (fig. 3) thus proving determination of orthologs and paralogs in fig. 2. All proteins are capable of binding ATP but do various functions in cell such as protein folding and DNA recombination according to GO terms. For HslUV proteases and ATP-dependent zinc metalloproteases the cellular location is defined (cytoplasm and membrane, respectively). 27 out of 30 proteins were inferred from homology and the rest were only predicted.

Molecular phylogeny can display the uncompleteness of Uniprot database. The magnesium chelatase branch is tight but Uniprot records on these three proteins differ on pairwisely lacking several fields. The assembled description consists of protein name and family, ATP and DNA binding capabilities and participance in DNA replication initiation. Other clades only slightly vary in GO terms and keywords inside them.