Home page
Term 1
Term 2
Term 3
About me
Faculty website

Nucleic Acid Sequence Databases

Assembly quality analysis

Species: Homo sapiens (russian name: человек разумный)

There are 206 assemblies overall.
For the following analysis the assembly called GRCh38.p12 was chosen.

Total sequence length	3,257,319,537
Number of contigs	1,535
Contig N50	56,413,054
Contig L50	19
Number of scaffolds	874
Scaffold N50	59,364,414
Scaffold L50	17
Number of annotated proteins	119294
Publication containing a description of the project	There are 3420 publications containing references to this project, but no specific publications describing the project. Therefore, here I provide a link to a webpage on the NCBI website describing the project: Link
Link to the sequence of one of the contigs in RefSeq	I arrived at this sequence by clicking on an ID of a RefSeq sequence in the "Global Protein assembly" section on the webpage of the assembly. Then I switched to the ID of one of the joined bits forming the aforementioned Refseq sequence. Afterwards, I repeated the procedure again two times, and only then I got a normal sequence in the "CONTIG" field. Unfortunately, its page contains a link to another BioProject, but I think it fits the description of the desired sequence from the task, since I arrived at it this way. There is a link to it below: Link

Feature keys

Below there is a list of feature keys with links to sequence annotations containing them and coordinates of the corresponding features in the corresponding sequences.
1)Centromere
Link
Position:1..305
Description: region of biological interest identified as a centromere and which has been experimentally characterized.
2)Exon
Link
Position:101..1036
Description: region of genome that codes for portion of spliced mRNA, rRNA and tRNA; may contain 5'UTR, all CDSs and 3' UTR.
3)Mat_peptide
Link
Position:join(1598..1694,2244..2386)
Description: mature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post-translational modification; the location does not include the stop codon (unlike the corresponding CDS).
4)Intron
Link
Position:332..1589
Description: a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it.
5)Sig_peptide
Link
Position:join(277..331,1590..1597)
Description: signal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane leader sequence.
6)Regulatory
Link
Position:215..223
Description: any region of sequence that functions in the regulation of transcription, translation, replication or chromatin structure.
7)mRNA
Link
Position:join(248..331,1590..1694,2244..2386)
Description: messenger RNA; includes 5'untranslated region (5'UTR), coding sequences (CDS, exon) and 3'untranslated region (3'UTR).

Genome project

Name: The 100,000 Genomes Project
Aims: Better understanding of rare genetic diseases and cancer, paving the way for future therapeutic methods.
Launch year: 2012
Link to the webpage: Link
Organisation: Genomics England (in collaboration with NHS)
Country: United Kingdom
Total number of sequenced genomes (as planned): 100 000
Number of currently sequenced genomes: 87 231 (October 1, 2018)
Last publication (link): The latest publication has not been placed on PubMed. Here is the link

Mitochondrial Genes of a cryptomonad

The search was conducted on the ENA website.
Search query text:

 
tax_tree(3027) AND mol_type="genomic DNA" AND topology="CIRCULAR" AND organelle="mitochondrion" AND dataclass="STD"

There was 8 results in Release and 0 in Update.
The species chosen was Rhodomonas salina, Russian name is "Родомонас солевой".

AC: AF288090
The following table (link below) was obtained by parsing the ENA entry with a Python script .
Table of mitochondrial CDSes of Rhodomonas salina (.xlsx)