← Назад к 1 семестру

Genomic and proteomic overview of Pyrococcus abyssi GE5

Bogdanova Agatha Vladimirovna

Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University


ABSTRACT. Pyrococcus abyssi GE5 is an anaerobic hyperthermophilic archaeon isolated from deep-sea hydrothermal vents and serves as a valuable model for studying molecular mechanisms of replication and adaptation under extreme conditions.

Keywords: Pyrococcus abyssi GE5, hyperthermophilic archaeon, genome analysis, proteome analysis, codon usage bias.

1. INTRODUCTION

Pyrococcus abyssi GE5 is a hyperthermophilic, anaerobic archaeon in the family Thermococcaceae (genus Pyrococcus), domain Archaea. It was first isolated from a deep-sea hydrothermal vent in the North Fiji Basin, at a depth of approximately 2000 meters [1]. The organism thrives under extreme conditions, with an optimal growth temperature around 95 °C and an optimal pH near 6.5. The proteome of Pyrococcus abyssi has been extensively studied as a model for understanding protein stability under extreme temperature and pressure conditions. For instance, many enzymes derived from this archaeon (e.g., DNA polymerases) exhibit remarkable thermostability, making them valuable tools in biotechnology and molecular biology [2].

Taxonomic classification (NCBI Taxonomy)

[3]

  • Domain: Archaea
  • Phylum: Euryarchaeota (or Methanobacteriota, in newer classifications)
  • Class: Thermococci
  • Order: Thermococcales
  • Family: Thermococcaceae
  • Genus: Pyrococcus
  • Species: Pyrococcus abyssi
  • Strain: GE5 (also known as strain Orsay)

The complete genome of Pyrococcus abyssi GE5 has been fully sequenced and annotated. It comprises approximately 1.76 Mb and encodes 1919 protein-coding genes, distributed between a single circular chromosome and a small plasmid. This genomic repertoire displays hallmark features of hyperthermophilic archaea, including a relatively high GC content, numerous genes encoding thermostable enzymes, and robust DNA repair systems [1,4].

This mini-review provides a concise overview of the molecular characteristics of Pyrococcus abyssi GE5, with a focus on the structural organization of its genome and the distribution of encoded protein lengths.

2. MATERIALS AND METHODS

Genome assembly Pyrococcus abyssi GE5 (GCF_000195935.2_ASM19593v2) and associated sequence/annotation files (CDS FASTA and feature table) were downloaded from NCBI Genomes/Genes. Protein lengths and CDS lengths were derived from the CDS FASTA using sequence-statistics output and summarized in Google Sheets. Start codons were defined as the first three nucleotides of each CDS and were collected and organized for all CDS and for pseudogene CDS separately (records annotated as pseudo/pseudogene), followed by frequency calculation in Google Sheets.

Intergenic distances and strand-orientation of neighboring CDS were computed from annotation coordinates using Bash-based parsing of CDS features; analyses were performed both on the full CDS set and on a filtered “strict” set excluding RNA features and pseudogenes. Terminal stop codons (TAA/TAG/TGA) were identified as the last codon of each CDS sequence and their counts were aggregated and visualized in Google Sheets.

All tables supporting the results and analyses discussed in this mini-review are accessible in the Supplementary materials section.

3. RESULTS

3.1 Distribution of protein lengths encoded in the genome of Pyrococcus abyssi GE5

Histogram of protein lengths in Pyrococcus abyssi GE5
Figure 1. Histogram of protein lengths in Pyrococcus abyssi GE5.

The histogram of protein lengths in Pyrococcus abyssi GE5 shows a unimodal distribution, with most proteins concentrated in the ~100–450 amino acid range and a clear maximum around ~150–300 amino acids. This predominance is consistent with typical prokaryotic and archaeal proteomes [5–7]. Proteins shorter than 100 amino acids represent a minority and may include bona fide small proteins/peptides, noting that small proteins have historically been under-detected [8–10].

3.2 Number of protein-coding and RNA genes per replicon in Pyrococcus abyssi GE5

Table 1. Distribution of protein-coding and RNA genes between the chromosome and the plasmid.

Chromosome/Plasmid CDS Gene ncRNA rRNA tRNA Total
Chromosome 1 917 1 988 21 4 46 3 976
Plasmid 2 2 0 0 0 4
Total 1 919 1 990 21 4 46 3 980

The complete localization of rRNA and tRNA genes on the chromosome indicates that the core translational machinery is fully chromosome-encoded, whereas the plasmid functions as a small accessory replicon. This interpretation aligns with general observations that plasmids typically encode accessory functions and participate in horizontal gene transfer [13–14]. External annotations designate the plasmid as pGT5 (accession U49503) [15], reported to replicate via a rolling-circle mechanism [16].

3.3 Analysis of inter-CDS distances in Pyrococcus abyssi GE5

Overall distribution of inter-CDS gaps (plus strand)
Figure 2. Overall distribution of inter-CDS gaps (plus strand).

The distribution of inter-CDS gaps on the plus strand displays strong positive skew: many adjacent CDSs are separated by very short intergenic regions, while long gaps are uncommon [11,18]. This is consistent with compact prokaryotic genome organization.

Short inter-CDS gaps (≤500 nt)
Figure 3. Short inter-CDS gaps (≤500 nt).

Within ≤500 nt, the peak occurs at 0–50 nt. Such ultrashort distances are often associated with translational coupling; longer intervals may contain promoters or regulatory elements [19–24].

3.4 Analysis of neighboring coding sequences (CDS) in Pyrococcus abyssi GE5

Table 2. Simple analysis (all consecutive CDS).

Next CDS: + Next CDS: -
Current CDS: + 644 285
Current CDS: - 285 687

Total CDS analyzed = 1901.

Table 3. Strict analysis (excluding RNA and pseudogenes).

Next CDS: + Next CDS: -
Current CDS: + 626 270
Current CDS: - 272 677

Total genomic features = 3976. Number of CDS = 1917. A total of 1845 adjacent CDS pairs were identified without intervening RNA genes or pseudogenes; 71 pairs were excluded due to intervening RNA/pseudogenes.

Both analyses show ~70% same-strand adjacency, consistent with runs of co-oriented genes (“directons”) that often correspond to operons or operon-like units [11].

3.5 Start codon usage in Pyrococcus abyssi GE5

The dataset includes 1919 total CDS, including 16 pseudogenes, leaving 1903 normal protein-coding CDS.

Start codon usage (normal CDS)
Figure 4. Start codon usage in Pyrococcus abyssi GE5 (normal CDS).
Start codon comparison (percent)
Figure 5. Start codon usage in all, pseudo, and normal CDS (percent).
Start codon comparison (counts)
Figure 6. Start codon usage in all, pseudo, and normal CDS (counts).

In normal CDS, ATG dominates (83.92%), with GTG (10.30%) and TTG (5.25%) as minor alternatives. Pseudogene profiles appear variable due to the very small sample size (N=16). Overall patterns match common prokaryotic initiation hierarchies [26–28].

3.6 Stop codon usage in Pyrococcus abyssi GE5

Stop codon distribution
Figure 7. Distribution of stop codons in Pyrococcus abyssi GE5.

TGA predominates (958; 50.1%), followed by TAA (560; 29.3%) and TAG (393; 20.6%). The observed hierarchy is consistent with known stop-codon biases; functional interpretation would require additional evidence [29–30].

3.7 Tail characterization: the 20 longest coding sequences

Rank–size plot of 20 longest CDS
Figure 8. Rank–size plot of the 20 longest CDS in Pyrococcus abyssi GE5.

The rank–size plot shows a steep drop from the longest CDS to the rest of the TOP-20, consistent with a heavy-tailed gene-length distribution typical for prokaryotic/archaeal genomes.

4. SUPPLEMENTARY MATERIALS

  1. Pyrococcus abyssi GE5 genome features tables
  2. Start codons in Pyrococcus abyssi GE5
  3. Stop codons in Pyrococcus abyssi GE5
  4. Scripts for stop and start codons

5. REFERENCES

  1. Erauso G, et al. Pyrococcus abyssi sp. nov. Arch Microbiol. 1993;160(5):338–349.
  2. BacDive – The Bacterial Diversity Metadatabase: Pyrococcus abyssi GE5 (DSM 25543).
  3. NCBI Taxonomy Browser: Pyrococcus abyssi GE5.
  4. Cohen GN, et al. Integrated analysis of genome annotation and proteome data of Pyrococcus abyssi. Mol Microbiol. 2003.
  5. Milo R, Phillips R. BioNumbers Book: How big is the “average” protein?
  6. Zhang J. Protein-length distributions. Trends Genet. 1999.
  7. Brocchieri L, Karlin S. Protein length in proteomes. Nucleic Acids Res. 2005.
  8. Ardern Z. Small proteins. Nat Rev Microbiol. 2021.
  9. van der Feltz C, et al. Small proteins. FEMS Microbiol Rev. 2023.
  10. van Wolferen M, et al. Small proteins in Archaea. J Bacteriol. 2021.
  11. Koonin EV, Wolf YI. Genomics of bacteria and archaea. Nucleic Acids Res. 2008.
  12. UniProt Proteomes: Pyrococcus abyssi GE5 (UP000000810).
  13. KEGG genome entry for Pyrococcus abyssi GE5.
  14. Stevenson C, et al. Plasmid transfer. Trends Microbiol. 2022.
  15. Brockhurst MA, et al. Role of plasmids. Nat Rev Microbiol. 2019.
  16. Muskhelishvili G, Travers A. Rep75 from pGT5. Methods Enzymol. 2001.
  17. Macario AJL, Conway de Macario E. Protein-folding systems. Nat Rev Microbiol. 2002.
  18. Sharma A, et al. Genomic attributes of thermophiles. World J Microbiol Biotechnol. 2022.
  19. Blombach F, et al. Archaeal transcription. J Mol Biol. 2019.
  20. Huber M, et al. Translational coupling. Front Microbiol. 2023.
  21. Werner F, Grohmann D. Regulation in archaea. FEMS Microbiol Rev. 2011.
  22. Santangelo TJ, Reeve JN. Regulatory elements. J Bacteriol. 1999.
  23. Brown KM, Wade JT. Translational coupling. J Bacteriol. 2025.
  24. Stöckl R, et al. Intergenic terminators. Front Microbiol. 2023.
  25. Zhang W, et al. Internal termination. Nucleic Acids Res. 2023.
  26. NCBI Genetic Codes (translation tables).
  27. Kearse MG, Wilusz JE. Non-AUG start codons. Trends Biochem Sci. 2017.
  28. NHGRI Genetics Glossary: Pseudogene.
  29. Trexler M, et al. TAG–TGA paradox. Sci Rep. 2023.
  30. Ho AT, Hurst LD. Stop codon usage. Genome Biol Evol. 2022.