Докинг низкомолекулярных лигандов в структуру белка

Цель данного занятия - ознакомиться с возможностями докинга низкомолекулярного лиганда в структуру белка. Белок - предсказанный в прошлом практикуме с помощью гомологичного моделирования. Лиганд - измененный NAG, где метильный радикал СH3C(=O)NH группы заменен на:

  • OH
  • NH3+
  • H
  • Ph
  • COO- </ul>

In [2]:
import numpy as np
import copy

# Отображение структур
import IPython.display
import ipywidgets
from IPython.display import display,display_svg,SVG,Image

# Open Drug Discovery Toolkit
import oddt
import oddt.docking
import oddt.interactions

# Органика
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole

import pmx # Модуль для манипулирования pdb 

Подготовим белок. Посмотрим на структуру предсказанного белка лизоцима жабы и выпишем его остатки

In [5]:
pdb=pmx.Model('new.B99990001.pdb')

for r in pdb.residues[135:]:
    print r #посмотрим остатки
<Molecule: id = 136 name = ASN chain_id =    natoms = 8>
<Molecule: id = 137 name = LEU chain_id =    natoms = 8>
<Molecule: id = 138 name = SER chain_id =    natoms = 6>
<Molecule: id = 139 name = GLN chain_id =    natoms = 9>
<Molecule: id = 140 name = TRP chain_id =    natoms = 14>
<Molecule: id = 141 name = THR chain_id =    natoms = 7>
<Molecule: id = 142 name = GLN chain_id =    natoms = 9>
<Molecule: id = 143 name = GLY chain_id =    natoms = 4>
<Molecule: id = 144 name = CYS chain_id =    natoms = 6>
<Molecule: id = 145 name = LYS chain_id =    natoms = 9>
<Molecule: id = 146 name = LEU chain_id =    natoms = 9>
<Molecule: id = 147 name = NAG chain_id =    natoms = 14>
<Molecule: id = 148 name = NAG chain_id =    natoms = 14>
<Molecule: id = 149 name = NDG chain_id =    natoms = 15>
In [6]:
# создание объектов белок и лиганда
newpdb = pdb.copy()
for r in newpdb.residues[-3:]:
    newpdb.remove_residue(r)
lig = pdb.copy()   
del lig.residues[:-3]
In [7]:
for r in newpdb.residues:
    print r #посмотрим остатки
print    
for r in lig.residues:
    print r #посмотрим остатки    
<Molecule: id = 1 name = SER chain_id =    natoms = 6>
<Molecule: id = 2 name = GLY chain_id =    natoms = 4>
<Molecule: id = 3 name = LYS chain_id =    natoms = 9>
<Molecule: id = 4 name = TYR chain_id =    natoms = 12>
<Molecule: id = 5 name = ILE chain_id =    natoms = 8>
<Molecule: id = 6 name = SER chain_id =    natoms = 6>
<Molecule: id = 7 name = TRP chain_id =    natoms = 14>
<Molecule: id = 8 name = GLU chain_id =    natoms = 9>
<Molecule: id = 9 name = ASP chain_id =    natoms = 8>
<Molecule: id = 10 name = SER chain_id =    natoms = 6>
<Molecule: id = 11 name = CYS chain_id =    natoms = 6>
<Molecule: id = 12 name = SER chain_id =    natoms = 6>
<Molecule: id = 13 name = TYR chain_id =    natoms = 12>
<Molecule: id = 14 name = LEU chain_id =    natoms = 8>
<Molecule: id = 15 name = GLN chain_id =    natoms = 9>
<Molecule: id = 16 name = LEU chain_id =    natoms = 8>
<Molecule: id = 17 name = GLN chain_id =    natoms = 9>
<Molecule: id = 18 name = LYS chain_id =    natoms = 9>
<Molecule: id = 19 name = TYR chain_id =    natoms = 12>
<Molecule: id = 20 name = GLU chain_id =    natoms = 9>
<Molecule: id = 21 name = ARG chain_id =    natoms = 11>
<Molecule: id = 22 name = CYS chain_id =    natoms = 6>
<Molecule: id = 23 name = GLU chain_id =    natoms = 9>
<Molecule: id = 24 name = LEU chain_id =    natoms = 8>
<Molecule: id = 25 name = ALA chain_id =    natoms = 5>
<Molecule: id = 26 name = LYS chain_id =    natoms = 9>
<Molecule: id = 27 name = ALA chain_id =    natoms = 5>
<Molecule: id = 28 name = LEU chain_id =    natoms = 8>
<Molecule: id = 29 name = LYS chain_id =    natoms = 9>
<Molecule: id = 30 name = LYS chain_id =    natoms = 9>
<Molecule: id = 31 name = GLY chain_id =    natoms = 4>
<Molecule: id = 32 name = GLY chain_id =    natoms = 4>
<Molecule: id = 33 name = LEU chain_id =    natoms = 8>
<Molecule: id = 34 name = ALA chain_id =    natoms = 5>
<Molecule: id = 35 name = ASP chain_id =    natoms = 8>
<Molecule: id = 36 name = PHE chain_id =    natoms = 11>
<Molecule: id = 37 name = LYS chain_id =    natoms = 9>
<Molecule: id = 38 name = GLY chain_id =    natoms = 4>
<Molecule: id = 39 name = TYR chain_id =    natoms = 12>
<Molecule: id = 40 name = SER chain_id =    natoms = 6>
<Molecule: id = 41 name = LEU chain_id =    natoms = 8>
<Molecule: id = 42 name = GLU chain_id =    natoms = 9>
<Molecule: id = 43 name = ASN chain_id =    natoms = 8>
<Molecule: id = 44 name = TRP chain_id =    natoms = 14>
<Molecule: id = 45 name = ILE chain_id =    natoms = 8>
<Molecule: id = 46 name = CYS chain_id =    natoms = 6>
<Molecule: id = 47 name = THR chain_id =    natoms = 7>
<Molecule: id = 48 name = ALA chain_id =    natoms = 5>
<Molecule: id = 49 name = PHE chain_id =    natoms = 11>
<Molecule: id = 50 name = HIS chain_id =    natoms = 10>
<Molecule: id = 51 name = GLU chain_id =    natoms = 9>
<Molecule: id = 52 name = SER chain_id =    natoms = 6>
<Molecule: id = 53 name = GLY chain_id =    natoms = 4>
<Molecule: id = 54 name = TYR chain_id =    natoms = 12>
<Molecule: id = 55 name = ASN chain_id =    natoms = 8>
<Molecule: id = 56 name = THR chain_id =    natoms = 7>
<Molecule: id = 57 name = ALA chain_id =    natoms = 5>
<Molecule: id = 58 name = SER chain_id =    natoms = 6>
<Molecule: id = 59 name = THR chain_id =    natoms = 7>
<Molecule: id = 60 name = ASN chain_id =    natoms = 8>
<Molecule: id = 61 name = TYR chain_id =    natoms = 12>
<Molecule: id = 62 name = ASN chain_id =    natoms = 8>
<Molecule: id = 63 name = PRO chain_id =    natoms = 7>
<Molecule: id = 64 name = PRO chain_id =    natoms = 7>
<Molecule: id = 65 name = ASP chain_id =    natoms = 8>
<Molecule: id = 66 name = LYS chain_id =    natoms = 9>
<Molecule: id = 67 name = SER chain_id =    natoms = 6>
<Molecule: id = 68 name = THR chain_id =    natoms = 7>
<Molecule: id = 69 name = ASP chain_id =    natoms = 8>
<Molecule: id = 70 name = TYR chain_id =    natoms = 12>
<Molecule: id = 71 name = GLY chain_id =    natoms = 4>
<Molecule: id = 72 name = ILE chain_id =    natoms = 8>
<Molecule: id = 73 name = PHE chain_id =    natoms = 11>
<Molecule: id = 74 name = GLN chain_id =    natoms = 9>
<Molecule: id = 75 name = ILE chain_id =    natoms = 8>
<Molecule: id = 76 name = ASN chain_id =    natoms = 8>
<Molecule: id = 77 name = SER chain_id =    natoms = 6>
<Molecule: id = 78 name = ARG chain_id =    natoms = 11>
<Molecule: id = 79 name = TRP chain_id =    natoms = 14>
<Molecule: id = 80 name = TRP chain_id =    natoms = 14>
<Molecule: id = 81 name = CYS chain_id =    natoms = 6>
<Molecule: id = 82 name = ASN chain_id =    natoms = 8>
<Molecule: id = 83 name = ASP chain_id =    natoms = 8>
<Molecule: id = 84 name = TYR chain_id =    natoms = 12>
<Molecule: id = 85 name = LYS chain_id =    natoms = 9>
<Molecule: id = 86 name = THR chain_id =    natoms = 7>
<Molecule: id = 87 name = PRO chain_id =    natoms = 7>
<Molecule: id = 88 name = ARG chain_id =    natoms = 11>
<Molecule: id = 89 name = SER chain_id =    natoms = 6>
<Molecule: id = 90 name = LYS chain_id =    natoms = 9>
<Molecule: id = 91 name = ASN chain_id =    natoms = 8>
<Molecule: id = 92 name = THR chain_id =    natoms = 7>
<Molecule: id = 93 name = CYS chain_id =    natoms = 6>
<Molecule: id = 94 name = ASN chain_id =    natoms = 8>
<Molecule: id = 95 name = ILE chain_id =    natoms = 8>
<Molecule: id = 96 name = ASP chain_id =    natoms = 8>
<Molecule: id = 97 name = CYS chain_id =    natoms = 6>
<Molecule: id = 98 name = LYS chain_id =    natoms = 9>
<Molecule: id = 99 name = VAL chain_id =    natoms = 7>
<Molecule: id = 100 name = LEU chain_id =    natoms = 8>
<Molecule: id = 101 name = LEU chain_id =    natoms = 8>
<Molecule: id = 102 name = GLY chain_id =    natoms = 4>
<Molecule: id = 103 name = ASP chain_id =    natoms = 8>
<Molecule: id = 104 name = ASP chain_id =    natoms = 8>
<Molecule: id = 105 name = ILE chain_id =    natoms = 8>
<Molecule: id = 106 name = SER chain_id =    natoms = 6>
<Molecule: id = 107 name = PRO chain_id =    natoms = 7>
<Molecule: id = 108 name = ALA chain_id =    natoms = 5>
<Molecule: id = 109 name = ILE chain_id =    natoms = 8>
<Molecule: id = 110 name = LYS chain_id =    natoms = 9>
<Molecule: id = 111 name = CYS chain_id =    natoms = 6>
<Molecule: id = 112 name = ALA chain_id =    natoms = 5>
<Molecule: id = 113 name = LYS chain_id =    natoms = 9>
<Molecule: id = 114 name = ARG chain_id =    natoms = 11>
<Molecule: id = 115 name = VAL chain_id =    natoms = 7>
<Molecule: id = 116 name = VAL chain_id =    natoms = 7>
<Molecule: id = 117 name = SER chain_id =    natoms = 6>
<Molecule: id = 118 name = ASP chain_id =    natoms = 8>
<Molecule: id = 119 name = PRO chain_id =    natoms = 7>
<Molecule: id = 120 name = ASN chain_id =    natoms = 8>
<Molecule: id = 121 name = GLY chain_id =    natoms = 4>
<Molecule: id = 122 name = MET chain_id =    natoms = 8>
<Molecule: id = 123 name = GLY chain_id =    natoms = 4>
<Molecule: id = 124 name = ALA chain_id =    natoms = 5>
<Molecule: id = 125 name = TRP chain_id =    natoms = 14>
<Molecule: id = 126 name = VAL chain_id =    natoms = 7>
<Molecule: id = 127 name = ALA chain_id =    natoms = 5>
<Molecule: id = 128 name = TRP chain_id =    natoms = 14>
<Molecule: id = 129 name = LYS chain_id =    natoms = 9>
<Molecule: id = 130 name = LYS chain_id =    natoms = 9>
<Molecule: id = 131 name = TYR chain_id =    natoms = 12>
<Molecule: id = 132 name = CYS chain_id =    natoms = 6>
<Molecule: id = 133 name = LYS chain_id =    natoms = 9>
<Molecule: id = 134 name = GLY chain_id =    natoms = 4>
<Molecule: id = 135 name = LYS chain_id =    natoms = 9>
<Molecule: id = 136 name = ASN chain_id =    natoms = 8>
<Molecule: id = 137 name = LEU chain_id =    natoms = 8>
<Molecule: id = 138 name = SER chain_id =    natoms = 6>
<Molecule: id = 139 name = GLN chain_id =    natoms = 9>
<Molecule: id = 140 name = TRP chain_id =    natoms = 14>
<Molecule: id = 141 name = THR chain_id =    natoms = 7>
<Molecule: id = 142 name = GLN chain_id =    natoms = 9>
<Molecule: id = 143 name = GLY chain_id =    natoms = 4>
<Molecule: id = 144 name = CYS chain_id =    natoms = 6>
<Molecule: id = 145 name = LYS chain_id =    natoms = 9>
<Molecule: id = 146 name = LEU chain_id =    natoms = 9>

<Molecule: id = 147 name = NAG chain_id =    natoms = 14>
<Molecule: id = 148 name = NAG chain_id =    natoms = 14>
<Molecule: id = 149 name = NDG chain_id =    natoms = 15>
In [8]:
help(pmx.Atom)
x_list = []
x_coord = []
y_coord = []
z_coord = []

for a in lig.atoms:
    x_list.append(a.x)# найдите геометрический центр лиганда

for coord in x_list:
    x_coord.append(coord[0])
    y_coord.append(coord[1])
    z_coord.append(coord[2])
    

center = (np.mean(x_coord),np.mean(y_coord),np.mean(z_coord))
print center
Help on class Atom in module pmx.atom:

class Atom
 |  class for storage of atom properties and methods
 |  
 |  Methods defined here:
 |  
 |  __init__(self, line=None, mol2line=None, **kwargs)
 |  
 |  __str__(self)
 |      prints the atom in PDB format
 |  
 |  __sub__(self, other)
 |      Overloading of the '-' operator for using
 |      atom1-atom2 instead of atom1.dist(atom2)
 |  
 |  a2nm(self)
 |  
 |  angle(self, other1, other2, degree=None)
 |      Calcluates the angle between 3 atoms
 |      Usage: atom1.angle(atom2,atom3)
 |      The degree flag causes the function to return the angle
 |      in degrees.
 |      (Note: atom1 must be between 2 and 3)
 |  
 |  copy(self)
 |      copy atom
 |  
 |  dihedral(self, other1, other2, other3, degree=None)
 |      Calculates the dihedral between four atoms.
 |      Usage: atom1.dihedral(atom2,atom3,atom4)
 |      The degree flag causes the function to return the dihedral
 |      in degrees.
 |  
 |  dist(self, other)
 |      returns the distance between two atoms
 |      Usage: dist=atom1.dist(atom2)
 |      This function is also called by typing
 |      d=atom1-atom2
 |  
 |  dist2(self, other)
 |      returns the squared distance between two atoms
 |      Usage: dist=atom1.dist2(atom2)
 |  
 |  get_order(self)
 |      get the order (number of bonds to mainchain)
 |  
 |  get_symbol(self)
 |      get element
 |  
 |  make_long_name(self)
 |      make extended name to determine element
 |      and order
 |  
 |  nm2a(self)
 |  
 |  readPDBString(self, line)
 |      PDB String to Atom
 |  
 |  read_mol2_line(self, line)
 |  
 |  set_chain_id(self, chain_id)
 |      change chain identifier
 |  
 |  set_resname(self, resname)
 |  
 |  translate(self, v)

(37.505809760132337, 31.231344086021497, 22.49991894127378)
In [14]:
newpdb.writePDB('protein_no_lig.pdb')

Подготовим белок для докинга. Пишет, что в PDB файле первая молекула - не белок, но с массой 15274. Но если посмотреть собственно в файл, то там координаты аминокислот. Не верим выдаче, но файл проверить стоит.

In [9]:
prot = oddt.toolkit.readfile('pdb','/tmp/prot.pdb').next()

prot.OBMol.AddPolarHydrogens()
prot.OBMol.AutomaticPartialCharge()


print 'is the first mol in 1lmp is protein?',prot.protein,':) and MW of this mol is:', prot.molwt 
is the first mol in 1lmp is protein? False :) and MW of this mol is: 15274.26022

Подготовим лиганд

In [3]:
#smiles = ['c1cccc(O)c1', 'c1c(O)ccc(O)c1','c1(O)cc(c2ccccc2)cc(O)c1']
smiles =["CC(=O)NC1C(C(C(OC1O)CO)O)O","OC(=O)NC1C(C(C(OC1O)CO)O)O","[NH3+]C(=O)NC1C(C(C(OC1O)CO)O)O","C(=O)NC1C(C(C(OC1O)CO)O)O","C1=CC=CC=C1C(=O)NC1C(C(C(OC1O)CO)O)O","[O-]C(=O)C(=O)NC1C(C(C(OC1O)CO)O)O"]
mols= []
images =[]

for s in smiles:
    m = oddt.toolkit.readstring('smi', s)
    if not m.OBMol.Has3D(): 
        m.make3D(forcefield='mmff94', steps=150)
        m.removeh()
        m.OBMol.AddPolarHydrogens()

    mols.append(m)
    ###with print m.OBMol.Has3D() was found that:
    ### deep copy needed to keep 3D , write svg make mols flat
    images.append((SVG(copy.deepcopy(m).write('svg'))))
    
display_svg(*images)
***** - Open Babel Depiction O H H H H H O O O CH 3 O N O *****
***** - Open Babel Depiction O H H H H H H O O O O O N O *****
***** - Open Babel Depiction H H H H H H H H O O O N O O N O *****
***** - Open Babel Depiction O H H H H H O O O O N O *****
***** - Open Babel Depiction H H H H H O O O O O N O *****
***** - Open Babel Depiction H H H H H O O O O O O N O HO *****

Докинг

In [10]:
#create docking object
dock_obj= oddt.docking.AutodockVina.autodock_vina(
    protein=prot,size=(20,20,20),center=center,
    executable='/usr/bin/vina',autocleanup=True, num_modes=20)

print dock_obj.tmp_dir  #это папка докинга
print " ".join(dock_obj.params) # Опишите выдачу
/tmp/autodock_vina_UCdlOb
--center_x 37.5058097601 --center_y 31.231344086 --center_z 22.4999189413 --size_x 20 --size_y 20 --size_z 20 --cpu 1 --exhaustiveness 8 --num_modes 20 --energy_range 3

Вначале идут координаты центра лиганда, затем размеры grid-box, 1 ядро, точность расчетов, количество положений лиганда в выдаче, максимальная разница энергии с наилучшей находкой.

In [11]:
res = dock_obj.dock(mols,prot)

Результаты докинга

Покажем заместители от лучшего по энергии к худшему:

In [35]:
#print '#\tformula\tenergy\trmsd\tprot_hbonds\tlig_hbonds\tprot_stack\tlig_stack\tprot_phob\tlig_phob'
res_energy = {}
res_dict = {}
print '#\tenergy\tformula\t\trmsd\thbonds\tstack\tphob'
for i,r in enumerate(res):    
    hbs = oddt.interactions.hbonds(prot,r) #это массив массивов взаимодействий. первый массив - белок, второй - лиганд
    stack= oddt.interactions.pi_stacking(prot,r)
    phob = oddt.interactions.hydrophobic_contacts(prot,r)
    res_energy[i] = r.data['vina_affinity']
    res_dict[i] = [r.formula, r.data['vina_rmsd_ub'],str(len(hbs[0])),str(len(stack[0])),str(len(phob[0]))]
#    print '\t'.join([str(i),r.formula, r.data['vina_affinity'],  r.data['vina_rmsd_ub'],str(len(hbs[0])),str(len(stack[0])),str(len(phob[0]))])
#    print '\t'.join([str(i),r.formula, r.data['vina_affinity'],  r.data['vina_rmsd_ub'],str(len(hbs[0])),str(len(hbs[1])),str(len(stack[0])),str(len(stack[1])),str(len(phob[0])),str(len(phob[1]))])

for k, v in sorted(res_energy.items(), key=lambda x:x[1],reverse=True):   
    print '\t'.join([str(k)]+[str(v)]+res_dict[k])
#	energy	formula		rmsd	hbonds	stack	phob
36	-4.9	C13H17NO6	0.000	7	0	6
37	-4.8	C13H17NO6	11.268	6	1	9
38	-4.8	C13H17NO6	10.869	5	0	6
39	-4.8	C13H17NO6	9.157	3	0	3
40	-4.6	C13H17NO6	11.867	7	0	6
45	-4.5	C8H13NO8	0.000	6	0	0
41	-4.4	C13H17NO6	10.657	7	0	1
42	-4.4	C13H17NO6	12.744	4	0	5
43	-4.3	C13H17NO6	11.786	3	0	4
0	-4.2	C8H15NO6	0.000	6	0	1
18	-4.2	C7H16N2O6	0.000	10	0	0
44	-4.2	C13H17NO6	10.127	1	0	2
46	-4.2	C8H13NO8	3.209	6	0	0
47	-4.2	C8H13NO8	11.035	8	0	0
1	-4.1	C8H15NO6	9.416	7	0	1
2	-4.1	C8H15NO6	4.855	10	0	0
9	-4.1	C7H13NO7	0.000	6	0	0
48	-4.1	C8H13NO8	5.614	7	0	0
49	-4.1	C8H13NO8	6.280	7	0	0
50	-4.1	C8H13NO8	13.758	9	0	0
3	-4.0	C8H15NO6	4.649	8	0	1
4	-4.0	C8H15NO6	2.941	10	0	2
5	-4.0	C8H15NO6	3.112	7	0	0
10	-4.0	C7H13NO7	9.923	4	0	0
19	-4.0	C7H16N2O6	2.434	10	0	0
27	-4.0	C7H13NO6	0.000	6	0	0
51	-4.0	C8H13NO8	10.785	7	0	0
6	-3.9	C8H15NO6	10.347	5	0	1
7	-3.9	C8H15NO6	11.323	4	0	0
11	-3.9	C7H13NO7	2.950	6	0	0
12	-3.9	C7H13NO7	4.845	6	0	0
13	-3.9	C7H13NO7	15.268	9	0	0
14	-3.9	C7H13NO7	4.957	10	0	0
15	-3.9	C7H13NO7	7.712	8	0	0
20	-3.9	C7H16N2O6	2.204	5	0	0
28	-3.9	C7H13NO6	4.456	7	0	0
29	-3.9	C7H13NO6	4.511	10	0	0
52	-3.9	C8H13NO8	9.966	3	0	0
53	-3.9	C8H13NO8	9.216	6	0	0
8	-3.8	C8H15NO6	9.475	5	0	3
16	-3.8	C7H13NO7	13.494	10	0	0
17	-3.8	C7H13NO7	2.449	7	0	0
21	-3.8	C7H16N2O6	4.759	8	0	0
22	-3.8	C7H16N2O6	4.609	8	0	0
23	-3.8	C7H16N2O6	4.815	4	0	0
24	-3.8	C7H16N2O6	4.919	8	0	0
30	-3.7	C7H13NO6	2.446	10	0	0
31	-3.7	C7H13NO6	7.700	6	0	0
32	-3.7	C7H13NO6	9.629	7	0	0
33	-3.7	C7H13NO6	8.890	5	0	0
25	-3.6	C7H16N2O6	8.398	6	0	0
26	-3.6	C7H16N2O6	14.087	4	0	0
34	-3.5	C7H13NO6	2.964	4	0	0
35	-3.5	C7H13NO6	8.417	7	0	0

Визуализация:

In [25]:
for i,r in enumerate(res):
    r.write(filename='r%s.pdb' % i, format='pdb')

Это все лиганды. Они находятся не в том же месте, что NAG (синий).

In [30]:
from IPython.display import Image
Image(filename='all.png')
Out[30]:

Посмотрим на самую лучшую модель - №36, на вторую, потому что у нее единственное стэкинг-взаимодействие из всех лигандов, и на самую худшую - №35. Забавно, что они идут друг за другом по номерам.

Это лиганд №36. Он покрашен зеленым. Располагается далеко от исходного положения NAG (синий). Через фенильное кольцо проходит альфа-спираль белка, чего в реальности быть не может. Так что такое положение лиганда точно нельзя назвать удачным.

In [26]:
Image(filename='r36.png')
Out[26]:

Лиганд №37 расположен удачнее.

In [28]:
Image(filename='r37.png')
Out[28]:

Стэкинг так себе, но есть.

In [29]:
Image(filename='r37_1.png')
Out[29]:

Это положение самого высокого по энергии лиганда №35. Ничего особенного.

In [31]:
Image(filename='r35.png')
Out[31]: