Orthologs and paralogs

← Term 4

Last updated: 15-03-2018.

Process of data preparation

The bacteria for this task were taken from previous task

The following steps were taken:

File containing genomes of all bacteria was created:

cat /P/y16/term4/Proteomes/BRADU.fasta >> united_genomes 
cat /P/y16/term4/Proteomes/BURCA.fasta >> united_genomes 
cat /P/y16/term4/Proteomes/RALSO.fasta >> united_genomes 
cat /P/y16/term4/Proteomes/NEIMA.fasta >> united_genomes 
cat /P/y16/term4/Proteomes/SALTY.fasta >> united_genomes 
cat /P/y16/term4/Proteomes/PASMU.fasta >> united_genomes 
cat /P/y16/term4/Proteomes/PROMH.fasta >> united_genomes 
cat /P/y16/term4/Proteomes/ECOLI.fasta >> united_genomes 

Blast database was created:

makeblastdb -in united_genomes.fasta  -dbtype prot

Blast search was performed:

blastp -query CLPX_ECOLI.fasta -db united_genomes.fasta -evalue 0.001 -outfmt 7

File for seqret was created:

echo united_genomes.fasta:Q8ZRC0 >> file.fasta 
echo united_genomes.fasta:B4EU54 >> file.fasta 
echo united_genomes.fasta:Q1BH84 >> file.fasta 
echo united_genomes.fasta:Q8XYP6 >> file.fasta 
echo united_genomes.fasta:P57981 >> file.fasta 
echo united_genomes.fasta:Q89KG2 >> file.fasta 
echo united_genomes.fasta:Q9JTX8 >> file.fasta 
echo united_genomes.fasta:Q8Y3D8| >> file.fasta 
echo united_genomes.fasta:Q8Y3D8 >> file.fasta 
echo united_genomes.fasta:B4F171 >> file.fasta 
echo united_genomes.fasta:P0A6H5 >> file.fasta 
echo united_genomes.fasta:P57968 >> file.fasta 
echo united_genomes.fasta:Q89WN2 >> file.fasta 
echo united_genomes.fasta:O30911 >> file.fasta 
echo united_genomes.fasta:Q1BSM8 >> file.fasta 
echo united_genomes.fasta:B4F2B3 >> file.fasta 
echo united_genomes.fasta:P0AAI3 >> file.fasta 
echo united_genomes.fasta:P63343 >> file.fasta 
echo united_genomes.fasta:Q89U80 >> file.fasta 
echo united_genomes.fasta:A0A0H2XMS5 >> file.fasta
echo united_genomes.fasta:H7C810 >> file.fasta 
echo united_genomes.fasta:Q8XZ78 >> file.fasta 
echo united_genomes.fasta:Q9CNJ2 >> file.fasta 
echo united_genomes.fasta:|Q9JUB0 >> file.fasta 
echo united_genomes.fasta:Q9JUB0 >> file.fasta 
echo united_genomes.fasta:Q8XY02 >> file.fasta 
echo united_genomes.fasta:P57015 >> file.fasta 

Fasta file containing sequences was generated:

seqret -seq @file.fasta -out seqs.fasta

Muscle alignment:

muscle -in seqs.fasta -out align.fasta

MEGA

Tree of found homologous proteins was created using UPGMA method by MEGA7 program. Tree is presented in Fig. 1. You can also observe some examples of orthologs and paralogs in this figure.

Figure 1. Tree image. Paralogs are marked red, orthologs are green.

© Simon Galkin, 2016