< 4th term

Transmembrane proteins

Last update on the 16th of May, 2018

The task is dedicated to studying transmembrane proteins and related databases and services.

Table of downloads
File Link
Count mean number of residues per one transmembrane subunit aver_count.py
list of real and predicted alpha helices helices.csv
TMHMM output tmhmm.txt
Phobius output phobius.txt
Compute JI and OC helices.py

OPM database

aver_count.py

Two proteins were chosen for the following analysis: Gamma-secretase complex, structure 1 (alpha-helical polytopic protein, PDB ID: 5A63) and Cytolysin and hemolysin HlyA Pore-forming toxin (beta-barrel transmembrane protein, PDB ID: 3O44). Protein models with membrane surronding and their properites are shown in table 1.

Table 1. Properties of two transmembrane proteins.
Protein Gamma-secretase complex, structure 1 Cytolysin and hemolysin HlyA Pore-forming toxin
PDB ID 5A63 3O44
Type Alpha-helical polytopic Beta-barrel transmembrane
Hydrophobic thickness 29.8Å 24.6Å
Transmembrane subunits' coordinates A chain: 669-690
B chain: 85-102, 171-186, 195-214, 221-240, 243-260, 383-399, 403-421, 436-458
C chain: 4-26, 32-56, 70-91, 118-139, 157-179, 187-207, 211-231
D chain: 58-79
A - G chains: 291-297, 304-310
Mean number of residues per one transmembrane subunit 20.7 7
Location Human ER membrane [3] Incorporates into cellular membrane of Mammalians[4]
Image

To count mean number of residues per one transmembrane subunit a python script aver_count.py was developed.

Prediction of transmembrane helices

helices.csv, helices.py, phobius.txt, tmhmm.txt

There are two services to predict transmembrane helices in given protein sequence: TMHMM and Phobius. Both count posterior probability of given residue to be included in several areas of cell. TMHMM areas are transmembrane, outside cell and inside cell, Phobius areas are transmembrane, cytoplasmic, non cytoplasmic and signal peptide. Both services take protein sequence as input and provide text and graphical output. Raw text output of both services for 5A63 protein is available at phobius.txt and tmhmm.txt, the comparison of predicted and real data is provided in tables 2 and 3, graphical output of both services is shown in fig. 1 and 2.

Fig. 1. Posterior probabilities for all 4 chains in TMHMM output.
Fig. 2. Posterior probabilities for all 4 chains in Phobius output.

Graphs of posterior probabilites are very similar between two services.

Table 2. Comparison of transmembrane helices in real data and predicted one in 5A63 protein.
chain OPM TMHMM Phobius
A 669-690 670-692 670-690
B 85-102 82-100 82-100
B - 132-154 133-154
B 171-186 161-183 161-183
B 195-214 193-215 195-213
B 221-240 224-241 225-241
B 243-260 246-268 247-263
B - 281-298 -
B 383-399 - -
B 403-421 404-426 408-428
B 436-458 431-453 434-453
C 4-26 4-26 6-25
C 32-56 33-55 32-55
C 70-91 65-82 67-86
C 118-139 119-141 117-135
C 157-179 156-178 155-180
C 187-207 187-209 187-209
C 211-231 214-236 215-236
D - 19-41 18-38
D 58-79 56-78 58-81

Both services predicted almost all real transmembrane helices except for one missing in chain B. Phobius predicted 2 additional helices, TMHMM — one more. Ranges of predicted regions seem to be more concordant between predictions rather than between reality and prediction. To measure this effect a python script helices.py was developed that computes pairwise Jaccard index and overlap coefficient between all sets of data. The result is given in table 3.

Table 3. Jaccard indexes and overlap coefficients of transmembrane helix residues between real data and predicted by TMHMM and Phobius
OPM TMHMM Phobius
OPM
TMHMM 0.64 0.85
Phobius 0.67 0.83 0.83 0.96

Predictive tools are indeed more concordant between them than with real data. Both algorithm implement an HMM[1, 2] to define which regions belong to several considered types but differ in the layout of the model (fig. 3). Phobius developers claim Phobius possesses reduced FDR of signal peptides than TMHMM[2]. However, observation of given transmembrane predictions doesn't proove Phobius deals with TM regions better than TMHMM.

Fig. 3. HMM layouts of TMHMM and Phobius.

BIological functions of observed proteins

The information provided from given databases.

TCDB

The transporter classification database is a curated database that classifies membrane transport proteins via Transporter Classification system (TC). It is quite similar to well-known EC system in marking proteins with codes of TC V.W.X.Y.Z type. The letters stand for:

  • V (a number): transporter class;
  • W (a letter): transporter subclass (energy source);
  • X (a number): transporter family (superfamily);
  • Y (a number): transporter subfamily;
  • Z: specific transporter with a particular range of substrates transported.

The protein 5A63 is encoded as 1.A.54.1.1, which stands for:

  • 1: Channels/pores;
  • 1.A: α-type channels;
  • 1.A.54: The rresenilin ER Ca2+ leak channel (presenilin) family;
  • 1.A.54.1.1: Presenilin-1 Ca2+ leak channel (part of the γ-secretase complex).

The protein 3O44 is encoded as 1.C.14.1.1, which stands for:

  • 1: Channels/pores;
  • 1.C: Pore-forming toxins (proteins and peptides);
  • 1.C.14 The cytohemolysin (CHL) family;
  • 1.C.14.1.1: Cytohemolysin precursor, HlyA (Vibrio cholerae cytolysin, VCC), a beta-barrel pore-forming toxin (beta-PFT).

KEGG

The 5A63 protein is noted in KEGG under 5663 accession. It participates in several signalling pathways and Alzheimer's disease pathway (hsa05010) and human papillomavirus infection (hsa05165).

The 3O44 protein is noted in KEGG under VCA0219 accession and is included in Vibrio cholerae infection (vch05110).

CDD

No COGs were found for both proteins.

GO

5A63 protein is associated with many GO terms like GO:0042987 amyloid precursor protein catabolic process and GO:0008624 induction of apoptosis by extracellular signalling.

3O44 protein is associated with restricted number of GO terms especially with GO:0020002 host cell plasma membrane and GO:0019836 hemolysis by symbiont of host erythrocytes.

Wikipedia

The 5A63 protein is a gamma-secretase protein that cleaves single-pass transmembrane proteins at residues buried in membrane[3]. The most well-known substrate is amyloid precursor protein that being cleaved produces amyloid beta whose abnormally folded fibrillar form is a primary component of amyloid plaques found in brains of Alzheimer's disease patients. The gamma secretase also processes several integral membrane proteins such as Notch and E-cadherin.

The 3O44 protein is a beta pore-forming toxin (β-PFT)[4] of Vibrio cholerae. When the pore is formed the regulation of celullar up- and downtake is disrupted that is followed by cell lysis. β-PFT also induce host response beneficial to bacteria proliferation.

References

  1. Anders Krogh, Björn Larsson, Gunnar von Heijne, Erik L.L Sonnhammer, Predicting transmembrane protein topology with a hidden markov model: application to complete genomes. Journal of Molecular Biology, Volume 305, Issue 3, 2001, Pages 567-580, ISSN 0022-2836, doi.org/10.1006/jmbi.2000.4315.
  2. Lukas Käll Anders Krogh Erik L.L. Sonnhammer, Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server. Nucleic Acids Research, Volume 35, Issue suppl_2, 1 July 2007, Pages W429–W432, doi.org/10.1093/nar/gkm256.
  3. Gamma secretase Wikipedia article;
  4. β-PFT Wikipedia article.