In [58]:
import prody as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
import pandas

Перед работой в jupyter notebook pdb-файл был предварительно обработан в PyMol (в файле prot.pdb присутствует только молекула белка

Задание 1. Prody и B-факторы часть 1

In [2]:
prot = pd.parsePDB('/home/julybel/study/3d/prak-03/protein-only.pdb')
@> 1510 atoms and 1 coordinate set(s) were parsed in 0.03s.
In [19]:
sp = [(res, np.mean(res.getBetas())) for res in prot.iterResidues()]
print('MIN:\t', sorted(sp, key=lambda x: x[1])[0])
print('MAX:\t', sorted(sp, key=lambda x: x[1], reverse=True)[0])
MIN:	 (<Residue: ALA 28 from Chain A from protein-only (5 atoms)>, 12.394000000000002)
MAX:	 (<Residue: ARG 41 from Chain A from protein-only (11 atoms)>, 46.408181818181816)

Окраска в PyMol по b-factor:
> spectrum b, deepteal_white_firebrick, minimum=10, maximum=50

Задание 2. Prody и B-факторы часть 2

In [23]:
prot_mcenter = pd.calcCenter(prot, weights=prot.getMasses())
betas, distances = [], []

for res in prot.iterResidues():
    mean_beta = np.mean(res.getBetas())
    mcenter = pd.calcCenter(res, weights=res.getMasses())
    dist = pd.calcDistance(prot_mcenter, mcenter)
    betas.append(mean_beta)
    distances.append(dist)

2.1. Посмотрим на корреляцию расстояний (центр масс аминокислот - центр масс всего белка) и средних b-факторов тех же аминокислот

In [38]:
print('Pearson correlation (rho, p-value):\n', stats.pearsonr(distances, betas), '\n')
print('Spearman correlation (rho, p-value):\n', stats.spearmanr(distances, betas), '\n')
Pearson correlation (rho, p-value):
 (0.4877180421102397, 3.156693261315456e-13) 

Spearman correlation (rho, p-value):
 SpearmanrResult(correlation=0.5151144122396323, pvalue=8.199349558927063e-15) 

2.2. Построим графики

In [40]:
fig, ax = plt.subplots(figsize=(12, 8))
sns.regplot(distances, betas, color='forestgreen')
ax.set_title('Dependancy of B-factor on distance between mass centers\nattempt of linear approximation\n', 
            fontdict={'fontsize' : 20})
ax.set_xlabel('Distance between mass centers (Å)', fontdict={'fontsize' : 16})
ax.set_ylabel('B-factor', fontdict={'fontsize' : 16})
Out[40]:
Text(0, 0.5, 'B-factor')
In [52]:
fig, ax = plt.subplots(figsize=(12, 8))
sns.regplot(distances, betas, color='forestgreen', 
           order=3, ci=None)
ax.set_title('Dependancy of B-factor on distance between mass centers\nfit with a higher-order polynomial regression (order=3)\n', 
            fontdict={'fontsize' : 20})
ax.set_xlabel('Distance between mass centers (Å)', fontdict={'fontsize' : 16})
ax.set_ylabel('B-factor', fontdict={'fontsize' : 16})
Out[52]:
Text(0, 0.5, 'B-factor')
In [56]:
fig, ax = plt.subplots(figsize=(12, 8))
sns.regplot(distances, betas, color='forestgreen', 
           x_estimator=np.mean, logx=True)
ax.set_title('Dependancy of B-factor on distance between mass centers\nfit the regression model using log(x)\n', 
            fontdict={'fontsize' : 20})
ax.set_xlabel('Distance between mass centers (Å)', fontdict={'fontsize' : 16})
ax.set_ylabel('B-factor', fontdict={'fontsize' : 16})
Out[56]:
Text(0, 0.5, 'B-factor')