Overview of the Planococcus halocryophilus genome and proteome

Aleksandra S. Parfenova

Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia

KEY WORDS

Planococcus halocryophilus, python, google sheets, genome, proteome.

1 INTRODUCTION

Planococcus halocryophilus is a species of non-spore forming gram-positive aerobic heterotrophic bacteria. Isolated from high Arctic permafrost, P. halocryophilus most likely inhabits the subzero brine veins surrounding soil particles and ice crystals. This species is notable for its ability to grow at temperatures as low as - 15°C and withstand NaCl concentrations up to 19% (Mykytczuk et al., 2013).

P. halocryophilus classification:

Kingdom: Bacteria

Phylum: Firmicutes

Class: Bacilli

Order: Bacillaes

Family:Planococcaceae

Genus: Planococcus

Previous studies of the genomes of eurypsychrophilic bacteria (i.e. those bacteria that grow in low temperatures, but can also tolerate a wide temperature range) such as the subject of this review have been focused on determining the mechanisms these bacteria employ in order to survive conditions of low temperature and high salinity. For example the Mykytczuk et al., 2013 study found that a special encrustation forms around individual cells at low temperatures (Figure 1). Raymond‐Bouchard et al., 2017 found that protein make-up of the cells varied between -10 °C with 12 w/v NaCl, 23 °C and 23 °C with 12% w/v NaCl conditions. Some proteins were abundant both at -10 °C and 23 °C+NaCl conditions thus indicating a link between low temperature and high salinity adaptations. These studies have furthered our understanding of the genetic mechanisms P. halocryophilus and other cryophilic bacteria use in order to withstand low temperature and high salinity conditions. The purpose of this review is to determine the basic structure and makeup of the P. halocryophilus genome and proteome.

Figure 1 Cells grown at 25oC (a) versus dividing cells at -15oC (c). (b) Cells grown at -15oC in 18% NaCl and 7% glycerol, encrusted in dense nodular material (Mykytczuk et al., 2013).

2 METHODS

This overview is based on genome and CDS files available at NCBI database. Genome length, nucleotide frequency, start-codon, stop-codon and amino acid coding codon usage was calculated using Python scripts, written with Google Colaboratory.

Proteome data was calculated using Google Sheets functions COUNTIFS (protein length distribution, tRNA strand distribution), COUNTIF (RNA strand distribution) MAX (maximal protein length), MIN (minimal protein length), VLOOKUP (codon usage table, longest and shortest protein), AVAGARE (average protein length), MEDIAN (madian protein length) SUM (tRNA strand distribution). The protein length distribution histogram (Figure 2), codon usage table (Table 4), tRNA genes table (Table 5) were made using Google Sheets build-in functions. The Cumulative GC-skew plot was made using Webskew.

3 RESULTS AND DISCUSSION

3.1 GENOME LENGTH AND NUCLEOTIDE FREQUENCY

The total length of the P. halocryophilus genome amounts to 3424893 base pairs. Only adenine, thymine, guanine and cytosine nucleotides were found. The GC-content and the AT-content are 40.5% and 59.95% respectively. Since the frequency of guanine is only 0.17% higher than that of cytosine and the frequency of thymine only 0.05% higher than that of adenine it can be confidently stated that the second Chargaff’s rule applies to the P. halocryophilus genome. Precise data on nucleotide frequency is shown in Table 1.

Table 1. Nucleotide frequency
Nucleotide Total amount Frequency, %
A 1025615 29.95
T 1027570 30.00
G 688770 20.11
C 682938 19.94

3.2 VARIATION IN START- AND STOP-CODON USAGE

P. halocryophilus has 11 different start-codons. Out of those the most frequent are ATG, TTG and GTG in descending order.

99.08% of the P. halocryophilus sequences end with typical TAA, TGA and TAG stop-codon, TAA being the most common, comprising 64.92% of all stop-codons. However, 0.92% of the analysed sequences end with nontypical nucleotide sequences such as A, AA, GA, AG, G and CAG. The presence of those can most likely be explained by either a deletion of one or two nucleotides in the stop-codon itself or, alternatively, a frameshift mutation somewhere along the rest of the coding sequence. Precise data on start- and stop-codon usage is shown in Table 2 and Table 3 respectively.

Table 2. Star-codon usage
Start-codon Amount Frequency, %
ATG 2669 81.65
TTG 306 9.361
GTG 245 7.495
ATT 22 0.673
ATA 12 0.367
other 15 0.454
Table 3. Stop-codon usage
Stop-codon Amount Frequency, %
TAA 2122 64.91
TGA 712 21.78
TAG 405 12.39
other 30 0.92

3.3 CUMULATIVE GC SKEW, OriC AND TER LOCATION

The Cumulative GC-skew plot (Figure 3) was made using Webskew application. Stepsize and windowsize were specified as 1000 and 20000 respectively. The cumulative GC-skew minimal value resides over the replication origin and maximal value – over the terminus (Grigoriev, 1998). Hence, it can be concluded that the replication origin of P. halocryophilus lies around the 2532000 mark and the terminus around 817000.

Figure 2 Cumulative GC skew plot with stepsize: 1000 and windowsize: 20000.

3.4 PROTEOME STATISTICS

There are a total of 3219 proteins encoded by the P. halocryophilus genome. The average protein length is 299 amino acids long, which is 6.56% less than the average length of bacterial proteins (Tiessen, A., Pérez-Rodríguez, P. & Delaye-Arredondo, L.J.). The median length is 266 aa.See Figure 2 for a histogram of protein length distribution.

The shortest protein found is stressosome-associated protein Prli42. This protein is composed of 31 amino acids and is a membrane-bound mini-proteine, a part of the stressosome activation mechanism (Williams et al., 2019). The longest protein is a phage tail protein. It is 1921 amino acids long and is probably a prophage.

Figure 3 Protein length distribution (pocket size 40)

3.4 AMINO ACID CODON USAGE AND RNA GENES STRAND DISTRIBUTION

All 61 amino acid coding codons were found. For each amino encoded by 2 or more codons there is a clear difference in frequencies of the codons (see Table 4). Hence, the P. halocryophilus genome shows codon bias.

Although all codons are present, it seems that P. halocryophilus has a nontypical “translation table”. Notably, arginine, cysteine, proline and serine have fewer tRNA genes than the standard number of codons (as shown in Table 5). This lack of tRNA genes is probably due to wobble base pairing. Especially since arginine, proline and serine are all encoded by NN_ template codons (Cox et al., 2013).

Another possibility is that since several amino acids have more tRNA genes than the standard number of codons, it may be that some of those abundances can be explained by codons being assigned to different amino acids. However, for some amino acids extra tRNA genes presence can probably be explained by their high commonness in proteins (e.g. methionine, being the most frequent start-codon).

Table 4.Codon usage table with standard amino acid codon indicators (template by Khandokhin M) *aa — amino acid
aa* 1st codon 2nd codon 3rd codon 4th codon 5th codon 6th codon
codon usage codon usage codon usage codon usage codon usage codon usage
Ala GCT 23113 GCA 28657 GCC 9205 GCG 14944 - - - -
Arg AGA 5745 AGG 1182 CGA 6568 CGT 13609 CGG 3482 CGC 8556
Asn AAT 24073 AAC 14960 - - - - - - - -
Asp GAT 35273 GAC 15488 - - - - - - - -
Cys TGC 2237 TGT 3541 - - - - - - - -
Gln CAA 28878 CAG 8347 - - - - - - - -
Glu GAA 60871 GAG 14215 - - - - - - - -
Gly GGA 21114 GGG 8299 GGT 21293 GGC 17369 - - - -
His CAT 13594 CAC 6017 - - - - - - - -
Ile ATA 9165 ATT 42696 ATC 21191 - - - - - -
Leu TTG 23913 CTT 16470 TTA 33060 CTC 5765 CTA 9842 CTG 7206
Lys AAG 11565 AAA 49643 - - - - - - - -
Met ATG 27854 - - - - - - - - - -
Phe TTC 13512 TTT 31278 - - - - - - - -
Pro CCT 9623 CCA 14404 CCC 2343 CCG 9020 - - - -
Ser AGC 7970 AGT 11210 TCA 13977 TCT 13041 TCC 4307 TCG 8532
Thr ACT 12695 ACA 21893 ACG 13011 ACC 7398 - - - -
Trp TGG 9872 - - - - - - - - - -
Tyr TAT 20921 TAC 10974 - - - - - - - -
Val GTT 25276 GTA 19068 GTG 13530 GTC 12381 - - - -
Table 5.tRNA genes distribution (row indicators: tRNA: total number of genes, tRNA neg: genes on negative strand, tRNA pos: genes on positive strand)
Ala Arg Asn Asp Cys Gln Glu Gly His Ile
tRNA 5 5 3 5 1 3 4 5 2 4
tRNA neg 0 0 0 0 0 0 0 0 0 0
tRNA pos 5 5 3 5 1 3 4 5 2 4
Leu Lys Met Phe Pro Ser Thr Trp Tyr Val
tRNA 7 3 5 2 2 4 4 1 2 5
tRNA neg 0 0 1 0 0 1 0 0 0 0
tRNA pos 7 3 4 2 2 3 4 1 2 5

ACKNOWLEDGEMENTS

I extend my gratitude to our amazing teachers for all the knowledge and skills they have given us this semester. As well as to my classmates for their encouragement and advice in writing this overview. Special thanks to Mikhail Khandokhin, who kindly provided a template for the codon usage table. May your Hirsch index increase exponentially.

REFERENCES

Google Drive folder with related material i.e. Python script, spreadsheets, genome and CDS files.

Mykytczuk, N., Foote, S., Omelon, C. et al. Bacterial growth at −15 °C; molecular insights from the permafrost bacterium Planococcus halocryophilus Or1. ISME J 7, 1211–1226 (2013).

Raymond‐Bouchard, I. et al. Mechanisms of Subzero Growth in the Cryophile Planococcus halocryophilus Determined through Proteomic Analysis. Freshwater Biology, Wiley/Blackwell (10.1111), 13 Oct. 2017

Tiessen, A., Pérez-Rodríguez, P. & Delaye-Arredondo, L.J. Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes. BMC Res Notes 5, 85 (2012).

Williams AH, Redzej A, Rolhion N, Costa TRD, Rifflet A, Waksman G, Cossart P. The cryo-electron microscopy supramolecular structure of the bacterial stressosome unveils its mechanism of activation. Nat Commun. 2019 Jul 8;10(1):3005. doi: 10.1038/s41467-019-10782-0. PMID: 31285450; PMCID: PMC6614362.

Andrei Grigoriev, Analyzing genomes with cumulative skew diagrams, Nucleic Acids Research, Volume 26, Issue 10, 1 May 1998, Pages 2286–2290.

Cox, Michael M.; Nelson, David L. (2013). "Protein Metabolism: Wobble Allows Some tRNA's to Recognize More than One Codon". Lehninger Principles of Biochemistry (6th ed.). New York: W.H. Freeman. pp. 1108–1110. ISBN 9780716771081