Mini review of Methanothermococcus okinawensis genome
Мне ужасно стыдно за этот обзор, по-хорошему его надо переделывать и сильно.Abstract
Methanothermococcus okinawensis is a thermophilic methane-producing archaeon. In this review there is an attempt to retrieve and analyze some information from its genome such as nucleotide composition, statistical data about its proteome and ribosomal genes.
- Introduction
- Methods
- Results
- Standard data about Metanothermococcus okinawensis genome
- Statistical data about M. okinawensis proteome
- Distribution of proteins by length
- Number of genes, coded on direct and complementary DNA chains
- Number of ribosomal, hypothetical and transport proteins
- Statistical data about ribosomal genes
- Cumulative GC skew
- Ken Takai et al. (2002) Methanothermococcus okinawensis sp. nov., a thermophilic, methane-producing archaeon isolated from a Western Pacific deep-sea hydrothermal vent system doi: 10.1099/ijs.0.02106-0
- Ruth-Sophie Taubner et al. (2018) Biological methane production under putative Enceladus-like conditions doi: 10.1038/s41467-018-02876-y
- Lisa-Maria Mauerhofer et al. (2021) Hyperthermophilic methanogenic archaea act as high-pressure CH4 cell factories doi: 10.1038/s42003-021-01828-5
- Akira Sassa et al. (2016) M. Mutagenic consequences of cytosine alterations site-specifically embedded in the human genome doi: 10.1186/s41021-016-0045-9
- CeciliaGuerrier-Takada et al. (1983) The RNA moiety of ribonuclease P is the catalytic subunit of the enzyme doi: 10.1016/0092-8674(83)90117-4
- Kiyoshi Nagai (2003) Structure, function and evolution of the signal recognition particle https://doi.org/10.1093/emboj/cdg337
Methanothermococcus okinawensis is a thermophilic, methaneproducing archaeon first isolated from a deep-sea hydrothermal vent chimney at the Iheya Ridge, in the Okinawa Trough, Japan, in 2000. It appeared to be strictly an anaerobic, mesophilic autotroph that uses hydrogen as a source of electrons, carbon dioxide or formate as a source of carbon and electron acceptor and ammonium as a nitrogen source.[1]
Such organisms with unique biochemical pathways are important in fundamental research, including the ones about extraterrestrial life, [2] and can be applied in biotechnology.[3]
Python programming: Google Colab notebook with python codes used here.
Excel: table with histogram showing distribution of proteins by length.
Metanothermococcus okinawensis has 2 DNA sequences: a chromosome and a plasmid pMETOK01. Frequencies of nucleotides were counted and are shown in Table 1. According to them one can see that Chargaff’s rules work but are more accurate for longer sequences.
Sequence name | Sequence length, bp | Nucleotide frequencies | GC content | |||
---|---|---|---|---|---|---|
A | T | G | C | |||
Methanothermococcus okinawensis IH1, complete sequence | 1662525 | 0.35321 | 0.35380 | 0.14595 | 0.14703 | 0.29299 |
Methanothermococcus okinawensis IH1 plasmid pMETOK01, complete sequence | 14930 | 0.39699 | 0.33463 | 0.14856 | 0.11983 | 0.26839 |
Also, frequencies of nucleotides and GC content were counted separately for CDS and not CDS – see Table 2. GC content is much higher in CDS, but still less than 0.5, which would be expected if nucleotides appeared randomly with equal probability 0.25. One of the reasons is different probability of mutations from one nucleotide to another. Quite often occurs such mutation as methylcytosine deamination that results in forming thymine. Cytosine can be modified in CpG dinucleotides including methylation and this mutation is difficult for DNA repair systems.[4] As a result, frequencies of cytosine and guanine are lower than those of adenine and thymine and GC content is less than 0.5. However, in CDS stabilizing selection is more present than in not CDS, so GC content is higher in CDS.
Sequence name | CDS/not CDS | Nucleotide frequencies | GC content | |||
---|---|---|---|---|---|---|
A | T | G | C | |||
Methanothermococcus okinawensis IH1, complete sequence | CDS | 0.34268 | 0.34416 | 0.15585 | 0.15731 | 0.31316 |
Not CDS | 0.40748 | 0.40349 | 0.09494 | 0.09409 | 0.18903 | |
Methanothermococcus okinawensis IH1 plasmid pMETOK01, complete sequence | CDS | 0.38823 | 0.33424 | 0.15203 | 0.12550 | 0.27753 |
Not CDS | 0.35295 | 0.35395 | 0.14591 | 0.14719 | 0.29310 |
As can be seen from Fig. 2 most of proteins have length between 50 and 450 amino acid residues.
The null hypothesis (H0) is that genes are randomly distributed between direct and complementary chains with equal probabilities. Chi square equals 0.1402 with p-value 0.7081 that is much more than 0.05, so it is very likely that the null hypothesis is correct.
Sequence name | + | - |
---|---|---|
Methanothermococcus okinawensis IH1, complete sequence | 795 | 810 |
Methanothermococcus okinawensis IH1 plasmid pMETOK01, complete sequence | 4 | 6 |
Genes are approximately evenly distributed between + and – DNA chains.
Ribosomal, hypothetical and transport proteins were counted distinctly for Methanothermococcus okinawensis chromosome and plasmid
Type of proteins | Number of proteins | Proportion of proteins in total number of proteins | |
---|---|---|---|
Ribosomal proteins | M. okinawensis IH1 chromosome | 63 | 0.0383 |
IH1 plasmid pMETOK01 | 0 | 0.0 | |
Hypothetical proteins | M. okinawensis IH1 chromosome | 276 | 0.1680 |
IH1 plasmid pMETOK01 | 6 | 0.5455 | |
Transport proteins | M. okinawensis IH1 chromosome | 67 | 0.0408 |
IH1 plasmid pMETOK01 | 0 | 0.0 |
The number of RNA coding genes is much lower than the number of protein coding genes. Except for tRNA coding genes there are two other non-coding RNA (ncRNA) genes: ribonuclease P (RNase P) – a ribozyme that cleaves tRNA precursor molecules,[5] and SRP RNA that recognizes signal peptide of membrane or secretory proteins and then associates with SRP receptor anchored to endoplasmic reticulum membrane.[6]
Type of gene product | Number of genes | |
---|---|---|
Protein coding genes | 1615 | |
All RNA coding genes | 47 | |
rRNA coding genes | 7 | |
tRNA coding genes | 38 |
Cumulative GC skew for the chromosome and for the plasmid are shown in Fig. 3 and Fig. 4 respectively. Minimum in cumulative GC skew for the plasmid corresponds with minimum in GC content that is shown in Fig. 5. Peaks on this graph must be referring to genes, while the lowest GC content probably marks origin of replication.