Home page
Term 1
Term 2
Term 3
About me
Faculty website

Nucleotide BLAST

Taxonomy and function of a Sanger sequencing read

In this task the sequence from the 6th practical was used. It was not a consensus sequence generated by the consambig program because in that practical both reads (forward and reverse) were corrected using the other one of the two, so there was no point of generating a consensus sequence (the sequence used here is more trustworthy than the consensus sequence would be).
BLASTN showed that the sequence is a part of the mitochondrial CDS coding for cytochrome oxidase subunit 1 (COI):



99% identity of the sequence with that of Polycirrus medusa, compared to 89% with that of Polycirrus phosphoreus sets the taxonomy level to species, but, bearing in mind that cytochrome oxidase is a conserved protein, whose function is fundamental in the metabolism of all aerobic organisms, it is not surprising that the identities of the query sequence with those of organisms from other genera, and even other phyla, are as high as 80%.

Comparison of MegaBLAST with BLASTN with different search parameters

Three BLAST searches were conducted with the query being the same sequence from the first task. The first search was performed via MegaBLAST, and it markedly differed from the other two searches by not finding a lot of homologous sequences, e.g. that of Polycirrus carolinensis. The other two searches were conducted via BLASTN with different parameters. The first BLASTN search had default algorithm parameters, while the second one had a shorter word size (7 instead of 11) and a more sensitive matrix (1,-1 instead of 1,-3). As might be expected, the second, more sensitive algoritm yielded more results, e.g. it gave out the sequence from Pista cristata's mitochondrion. Below one can find pictures of BLAST's search results.
1) MegaBLAST:



2) BLASTN with default parameters, compared to MegaBLAST:



3) BLASTN with default parameters:



4) BLASTN with the more sensitive parameters, compared to BLASTN with default parameters:



The same procedure was repeated with a serine tRNA coding gene named "trnS(gcu)" from Rhodomonas salina's mitochondrial genome.
1) MegaBLAST (3 results):



2) BLASTN with default parameters (19 results):



3) BLASTN with the more sensitive parameters (46 results):



Here, again, the number of search results increased from 1) to 3). Interestingly, BLASTN with default parameters found a sequence from zebrafish (second picture, bottom), while the sensitive BLASTN didn't find it. Yet, the overall number of homologous sequences increased from 1) to 3).

Protein homologs in Amoeboaphelidium protococcarum's genome

The program used was tblastn with default parameters.
1) Result for HSP71_YEAST: there definitely is a homolog in Amoeboaphelidium protococcarum's genome. Presumably, it has a similar function.
Name of the homolog: scaffold-199 (coordinates within the scaffold: 1109256-1107430).
Parameters of the alignment: 79.89% identity, e-value 0.0, bit score 920. Alignment covers more than 80% of the query sequence, so it is not a homologous domain, but a whole homologous protein.
2) Result for EIF3G_SCHPO: there is a homolog. Bearing in mind that eIF3G is just a huge scaffold molecule, the homolog most likely has the same function, despite the e-value of 2e-21 (which is quite low, though).
Name of the homolog: scaffold-20
Parameters of the alignment: 37.98% identity, e-value 2e-21, bit score 95.5. Alignment covers the whole query sequence, so it is not a homologous domain, but a whole homologous protein.
3) Result for TERT_SCHPO: there is a homolog, but, considering the length of the alignment, it is roughly half the initial sequence. The logical assumption would be a homologous domain.
Name of the homolog: scaffold-17
Parameters of the alignment: 25.05% identity, e-value 1e-23, bit score 108.

Search for a gene within a contig from Amoeboaphelidium protococcarum's genome

For this task a contig named "unplaced-5" was chosen, because it is long. BLASTN search gave out a lot of alpha-tubulin sequences from different species. The fact that the contig contains a gene coding for alpha-tubulin with coordinates of the gene within the contig being approximately 3200-4600 can be inferred from the BLASTN search result (see below).





Dot matrix view of the alignment of two Chlamydia genomes



Chlamydia muridarum's genome is represented by the vertical axis, while Chlamydia trachomatis's genome is represented by the horizontal one. It is clear that the beginning of Chlamydia muridarum's genome is homologous to the end of Chlamydia trachomatis's genome, while the major fraction of the two genomes is more or less the same.

© Stanislav Tikhonov, 2018