Макромолекулярный докинг

Суть задания заключается в предсказани свзывания фрагмента антитела (VHH domain) c панкреатической альфа-амилазой с использованием программы ZDOCK.

Для успешной работы zrank и zdock из pdb файлов надо удалить строчки MODEL и TER Установлено, что:

  • программе zrank очень нужна пустая строка до начала записей об атомах.

  • полезно убрать именно вторую строчку из копии zdock.out (zdock.out.cp) и использовать её для работы zrank.

Укажем путь к ZDOCK:

In [1]:
%%bash
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/home/preps/golovin/progs/lib
export PATH=${PATH}:/home/preps/golovin/progs/bin

Скопируем в рабочую директорию файлы amilase.pdb,camelid.pdb и uniCHARMM из:

http://kodomo.cmm.msu.ru/~golovin/zdock/

In [53]:
%%bash
wget http://kodomo.cmm.msu.ru/~golovin/zdock/uniCHARMM
wget http://kodomo.cmm.msu.ru/~golovin/zdock/amylase.pdb
wget http://kodomo.cmm.msu.ru/~golovin/zdock/camelid.pdb
--2019-05-20 09:55:27--  http://kodomo.cmm.msu.ru/~golovin/zdock/uniCHARMM
Resolving kodomo.cmm.msu.ru (kodomo.cmm.msu.ru)... 93.180.63.127
Connecting to kodomo.cmm.msu.ru (kodomo.cmm.msu.ru)|93.180.63.127|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17376 (17K)
Saving to: ‘uniCHARMM.1’

     0K .......... ......                                     100% 36.3M=0s

2019-05-20 09:55:27 (36.3 MB/s) - ‘uniCHARMM.1’ saved [17376/17376]

--2019-05-20 09:55:27--  http://kodomo.cmm.msu.ru/~golovin/zdock/amylase.pdb
Resolving kodomo.cmm.msu.ru (kodomo.cmm.msu.ru)... 93.180.63.127
Connecting to kodomo.cmm.msu.ru (kodomo.cmm.msu.ru)|93.180.63.127|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 387504 (378K) [chemical/x-pdb]
Saving to: ‘amylase.pdb.1’

     0K .......... .......... .......... .......... .......... 13% 40.1M 0s
    50K .......... .......... .......... .......... .......... 26% 34.4M 0s
   100K .......... .......... .......... .......... .......... 39% 33.3M 0s
   150K .......... .......... .......... .......... .......... 52% 37.8M 0s
   200K .......... .......... .......... .......... .......... 66% 35.0M 0s
   250K .......... .......... .......... .......... .......... 79% 36.7M 0s
   300K .......... .......... .......... .......... .......... 92% 66.1M 0s
   350K .......... .......... ........                        100% 34.9M=0.01s

2019-05-20 09:55:27 (38.3 MB/s) - ‘amylase.pdb.1’ saved [387504/387504]

--2019-05-20 09:55:27--  http://kodomo.cmm.msu.ru/~golovin/zdock/camelid.pdb
Resolving kodomo.cmm.msu.ru (kodomo.cmm.msu.ru)... 93.180.63.127
Connecting to kodomo.cmm.msu.ru (kodomo.cmm.msu.ru)|93.180.63.127|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 73224 (72K) [chemical/x-pdb]
Saving to: ‘camelid.pdb.1’

     0K .......... .......... .......... .......... .......... 69% 25.2M 0s
    50K .......... .......... .                               100% 31.8M=0.003s

2019-05-20 09:55:27 (26.9 MB/s) - ‘camelid.pdb.1’ saved [73224/73224]

Добавим водороды к обоим pdb-файлам, исполльзуя pdb2gmx из GROMACS:

In [59]:
! gmx pdb2gmx -f amylase.pdb -o amylase_h.pdb -p -ff gromos53a6 -ignh -water none
                     :-) GROMACS - gmx pdb2gmx, 2018.1 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
    Par Bjelkmar    Aldert van Buuren   Rudi van Drunen     Anton Feenstra  
  Gerrit Groenhof    Aleksei Iupinov   Christoph Junghans   Anca Hamuraru   
 Vincent Hindriksen Dimitrios Karkoulis    Peter Kasson        Jiri Kraus    
  Carsten Kutzner      Per Larsson      Justin A. Lemkul    Viveca Lindahl  
  Magnus Lundborg   Pieter Meulenhoff    Erik Marklund      Teemu Murtola   
    Szilard Pall       Sander Pronk      Roland Schulz     Alexey Shvetsov  
   Michael Shirts     Alfons Sijbers     Peter Tieleman    Teemu Virolainen 
 Christian Wennberg    Maarten Wolf   
                           and the project leaders:
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2017, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS:      gmx pdb2gmx, version 2018.1
Executable:   /usr/bin/gmx
Data prefix:  /usr
Working dir:  /home/domain/eva.smorodina/Pracs/molecular modelling/docking/protein-protein
Command line:
  gmx pdb2gmx -f amylase.pdb -o amylase_h.pdb -p -ff gromos53a6 -ignh -water none


Using the Gromos53a6 force field in directory gromos53a6.ff

Opening force field file /usr/share/gromacs/top/gromos53a6.ff/aminoacids.r2b
Reading amylase.pdb...
Read 3898 atoms
Analyzing pdb file
Splitting chemical chains based on TER records or chain id changing.
There are 1 chains and 0 blocks of water and 495 residues with 3898 atoms

  chain  #res #atoms
  1 'A'   495   3898  

All occupancies are one
Opening force field file /usr/share/gromacs/top/gromos53a6.ff/atomtypes.atp
Atomtype 57
Reading residue database... (gromos53a6)
Opening force field file /usr/share/gromacs/top/gromos53a6.ff/aminoacids.rtp
Using default: not generating all possible dihedrals
Using default: excluding 3 bonded neighbors
Using default: generating 1,4 H--H interactions
Using default: removing proper dihedrals found on the same bond as a proper dihedral
Residue 108
Sorting it all out...
Opening force field file /usr/share/gromacs/top/gromos53a6.ff/aminoacids.hdb
Opening force field file /usr/share/gromacs/top/gromos53a6.ff/aminoacids.n.tdb
Opening force field file /usr/share/gromacs/top/gromos53a6.ff/aminoacids.c.tdb
Processing chain 1 'A' (3898 atoms, 495 residues)
Analysing hydrogen-bonding network for automated assignment of histidine
 protonation. 761 donors and 727 acceptors were found.
There are 1112 hydrogen bonds
Will use HISE for residue 15
Will use HISE for residue 101
Will use HISD for residue 201
Will use HISD for residue 215
Will use HISE for residue 299
Will use HISE for residue 305
Will use HISE for residue 331
Will use HISE for residue 386
Will use HISE for residue 491
Identified residue TYR2 as a starting terminus.
Identified residue LEU496 as a ending terminus.
8 out of 8 lines of specbond.dat converted successfully
Special Atom Distance matrix:
                   HIS15   CYS28   CYS70   MET82   CYS86  HIS101  MET102
                  NE2108   SG226   SG568   SD667   SG699  NE2817   SD824
   CYS28   SG226   1.477
   CYS70   SG568   2.005   2.128
   MET82   SD667   1.264   0.664   1.475
   CYS86   SG699   1.513   0.203   1.981   0.534
  HIS101  NE2817   1.466   2.637   1.962   2.164   2.578
  MET102   SD824   1.986   2.772   1.528   2.188   2.655   0.870
  CYS103   SG831   2.123   3.077   1.577   2.469   2.979   1.037   0.737
  CYS115   SG895   2.157   2.309   0.203   1.653   2.156   1.985   1.471
  CYS119   SG923   2.402   3.163   1.363   2.518   3.043   1.469   0.923
  CYS141  SG1096   2.888   4.102   3.204   3.640   4.040   1.497   1.749
  CYS160  SG1242   2.772   3.997   3.205   3.558   3.941   1.420   1.753
  MET178  SD1384   2.656   2.994   1.850   2.453   2.834   1.782   1.043
  HIS201 NE21563   1.740   3.049   2.737   2.701   3.028   0.792   1.580
  MET202  SD1570   1.804   2.720   2.352   2.331   2.649   0.825   1.111
  HIS215 NE21676   2.642   2.416   2.317   2.122   2.258   2.423   2.034
  MET274  SD2144   2.653   3.240   4.641   3.546   3.408   3.660   4.424
  MET287  SD2255   1.849   2.658   3.493   2.682   2.706   2.091   2.739
  HIS299 NE22352   0.536   1.992   2.086   1.692   2.008   1.065   1.743
  HIS305 NE22402   1.473   2.809   3.217   2.711   2.898   2.036   2.850
  MET328  SD2572   1.412   2.027   3.367   2.252   2.160   2.491   3.157
  HIS331 NE22596   2.492   2.925   4.402   3.239   3.060   3.331   4.014
  MET339  SD2662   0.507   1.636   2.455   1.598   1.730   1.845   2.456
  CYS378  SG2970   1.529   1.356   2.560   1.545   1.497   2.917   3.271
  CYS384  SG3017   1.340   1.315   2.498   1.476   1.456   2.742   3.127
  HIS386 NE23036   1.257   0.955   2.517   1.270   1.126   2.699   3.069
  MET394  SD3115   2.231   2.212   4.016   2.685   2.399   3.533   4.122
  CYS450  SG3564   3.335   2.796   4.811   3.404   2.991   4.742   5.228
  CYS462  SG3645   3.478   2.975   4.970   3.576   3.172   4.888   5.389
  HIS491 NE23860   2.628   2.179   4.197   2.766   2.367   3.964   4.451
                  CYS103  CYS115  CYS119  CYS141  CYS160  MET178  HIS201
                   SG831   SG895   SG923  SG1096  SG1242  SD1384 NE21563
  CYS115   SG895   1.515
  CYS119   SG923   0.491   1.251
  CYS141  SG1096   1.755   3.157   2.151
  CYS160  SG1242   1.802   3.169   2.218   0.203
  MET178  SD1384   1.689   1.750   1.635   2.374   2.384
  HIS201 NE21563   1.711   2.771   2.183   1.277   1.117   2.360
  MET202  SD1570   1.644   2.362   1.991   1.560   1.442   1.536   0.963
  HIS215 NE21676   2.738   2.313   2.748   3.305   3.242   1.362   2.862
  MET274  SD2144   4.540   4.803   4.922   4.537   4.358   5.048   3.318
  MET287  SD2255   3.121   3.590   3.516   2.799   2.607   3.134   1.690
  HIS299 NE22352   1.799   2.205   2.156   2.412   2.297   2.554   1.272
  HIS305 NE22402   2.743   3.343   3.150   2.914   2.779   3.729   1.791
  MET328  SD2572   3.385   3.522   3.729   3.596   3.426   3.691   2.323
  HIS331 NE22596   4.314   4.545   4.689   4.169   3.976   4.438   2.974
  MET339  SD2662   2.537   2.616   2.840   3.171   3.041   3.151   1.957
  CYS378  SG2970   3.289   2.761   3.425   4.358   4.256   3.837   3.236
  CYS384  SG3017   3.151   2.697   3.310   4.175   4.070   3.712   3.041
  HIS386 NE23036   3.203   2.714   3.369   4.145   4.028   3.557   2.979
  MET394  SD3115   4.341   4.196   4.630   4.718   4.551   4.578   3.443
  CYS450  SG3564   5.444   5.005   5.662   6.024   5.865   5.594   4.755
  CYS462  SG3645   5.589   5.165   5.811   6.160   6.000   5.774   4.887
  HIS491 NE23860   4.730   4.381   4.970   5.214   5.051   4.783   3.955
                  MET202  HIS215  MET274  MET287  HIS299  HIS305  MET328
                  SD1570 NE21676  SD2144  SD2255 NE22352 NE22402  SD2572
  HIS215 NE21676   1.936
  MET274  SD2144   3.747   4.709
  MET287  SD2255   1.755   2.839   2.285
  HIS299 NE22352   1.539   2.768   2.768   1.792
  HIS305 NE22402   2.481   3.921   2.025   2.016   1.238
  MET328  SD2572   2.515   3.322   1.406   1.306   1.624   1.553
  HIS331 NE22596   3.161   3.913   1.193   1.475   2.632   2.349   1.118
  MET339  SD2662   2.180   3.077   2.197   1.837   0.796   1.173   1.088
  CYS378  SG2970   3.300   3.606   2.788   3.050   1.970   2.271   1.998
  CYS384  SG3017   3.122   3.499   2.664   2.862   1.779   2.088   1.827
  HIS386 NE23036   2.958   3.195   2.565   2.598   1.765   2.188   1.590
  MET394  SD3115   3.569   4.001   1.309   2.309   2.583   2.394   1.131
  CYS450  SG3564   4.794   4.856   2.361   3.623   3.769   3.623   2.482
  CYS462  SG3645   4.954   5.052   2.384   3.762   3.901   3.703   2.607
  HIS491 NE23860   3.938   4.036   1.953   2.750   3.053   3.044   1.699
                  HIS331  MET339  CYS378  CYS384  HIS386  MET394  CYS450
                 NE22596  SD2662  SG2970  SG3017 NE23036  SD3115  SG3564
  MET339  SD2662   2.198
  CYS378  SG2970   3.007   1.364
  CYS384  SG3017   2.860   1.162   0.202
  HIS386 NE23036   2.587   1.119   0.563   0.470
  MET394  SD3115   1.364   1.855   1.999   1.918   1.664
  CYS450  SG3564   2.558   2.995   2.514   2.545   2.340   1.371
  CYS462  SG3645   2.655   3.121   2.629   2.663   2.483   1.486   0.202
  HIS491 NE23860   1.824   2.332   2.180   2.149   1.829   0.710   0.917
                  CYS462
                  SG3645
  HIS491 NE23860   1.091
Linking CYS-28 SG-226 and CYS-86 SG-699...
Linking CYS-70 SG-568 and CYS-115 SG-895...
Linking CYS-141 SG-1096 and CYS-160 SG-1242...
Linking CYS-378 SG-2970 and CYS-384 SG-3017...
Linking CYS-450 SG-3564 and CYS-462 SG-3645...
Start terminus TYR-2: NH3+
End terminus LEU-496: COO-
Checking for duplicate atoms....
Generating any missing hydrogen atoms and/or adding termini.
Now there are 495 residues with 5087 atoms
Making bonds...
Number of bonds was 5207, now 5202
Generating angles, dihedrals and pairs...

WARNING: WARNING: Residue 1 named TYR of a molecule in the input file was mapped
to an entry in the topology database, but the atom H used in
an interaction of type angle in that entry is not found in the
input file. Perhaps your atom and/or residue naming needs to be
fixed.



WARNING: WARNING: Residue 495 named LEU of a molecule in the input file was mapped
to an entry in the topology database, but the atom O used in
an interaction of type angle in that entry is not found in the
input file. Perhaps your atom and/or residue naming needs to be
fixed.


Before cleaning: 8104 pairs
Before cleaning: 10145 dihedrals
Making cmap torsions...
There are 2623 dihedrals, 2788 impropers, 7653 angles
          8104 pairs,     5202 bonds and     0 virtual sites
Total mass 55217.920 a.m.u.
Total charge -6.000 e
Writing topology

Writing coordinate file...
		--------- PLEASE NOTE ------------
You have successfully generated a topology from: amylase.pdb.
The Gromos53a6 force field is used.
		--------- ETON ESAELP ------------

GROMACS reminds you: "We'll celebrate a woman for anything, as long as it's not her talent." (Colleen McCullough)

In [61]:
! gmx pdb2gmx -f camelid.pdb -o camelid_h.pdb -p -ff gromos53a6 -ignh -water none
                     :-) GROMACS - gmx pdb2gmx, 2018.1 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
    Par Bjelkmar    Aldert van Buuren   Rudi van Drunen     Anton Feenstra  
  Gerrit Groenhof    Aleksei Iupinov   Christoph Junghans   Anca Hamuraru   
 Vincent Hindriksen Dimitrios Karkoulis    Peter Kasson        Jiri Kraus    
  Carsten Kutzner      Per Larsson      Justin A. Lemkul    Viveca Lindahl  
  Magnus Lundborg   Pieter Meulenhoff    Erik Marklund      Teemu Murtola   
    Szilard Pall       Sander Pronk      Roland Schulz     Alexey Shvetsov  
   Michael Shirts     Alfons Sijbers     Peter Tieleman    Teemu Virolainen 
 Christian Wennberg    Maarten Wolf   
                           and the project leaders:
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2017, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS:      gmx pdb2gmx, version 2018.1
Executable:   /usr/bin/gmx
Data prefix:  /usr
Working dir:  /home/domain/eva.smorodina/Pracs/molecular modelling/docking/protein-protein
Command line:
  gmx pdb2gmx -f camelid.pdb -o camelid_h.pdb -p -ff gromos53a6 -ignh -water none


Using the Gromos53a6 force field in directory gromos53a6.ff

Opening force field file /usr/share/gromacs/top/gromos53a6.ff/aminoacids.r2b
Reading camelid.pdb...
Read 903 atoms
Analyzing pdb file
Splitting chemical chains based on TER records or chain id changing.
There are 1 chains and 0 blocks of water and 123 residues with 903 atoms

  chain  #res #atoms
  1 'B'   123    903  

All occupancies are one
Opening force field file /usr/share/gromacs/top/gromos53a6.ff/atomtypes.atp
Atomtype 57
Reading residue database... (gromos53a6)
Opening force field file /usr/share/gromacs/top/gromos53a6.ff/aminoacids.rtp
Using default: not generating all possible dihedrals
Using default: excluding 3 bonded neighbors
Using default: generating 1,4 H--H interactions
Using default: removing proper dihedrals found on the same bond as a proper dihedral
Residue 108
Sorting it all out...
Opening force field file /usr/share/gromacs/top/gromos53a6.ff/aminoacids.hdb
Opening force field file /usr/share/gromacs/top/gromos53a6.ff/aminoacids.n.tdb
Opening force field file /usr/share/gromacs/top/gromos53a6.ff/aminoacids.c.tdb

Back Off! I just backed up topol.top to ./#topol.top.1#
Processing chain 1 'B' (903 atoms, 123 residues)
Identified residue GLN1 as a starting terminus.
Identified residue VAL111 as a ending terminus.
8 out of 8 lines of specbond.dat converted successfully
Special Atom Distance matrix:
                   CYS22   MET34   CYS45   MET82   CYS92
                   SG143   SD215   SG306   SD588   SG693
   MET34   SD215   0.667
   CYS45   SG306   1.465   1.705
   MET82   SD588   1.290   1.645   1.477
   CYS92   SG693   0.204   0.644   1.283   1.231
  CYS100   SG792   1.473   1.646   0.204   1.609   1.284
Linking CYS-22 SG-143 and CYS-92 SG-693...
Linking CYS-45 SG-306 and CYS-100 SG-792...
Start terminus GLN-1: NH3+
End terminus VAL-111: COO-
Checking for duplicate atoms....
Generating any missing hydrogen atoms and/or adding termini.
Now there are 123 residues with 1176 atoms
Making bonds...
Warning: Long Bond (201-203 = 0.338522 nm)
Number of bonds was 1202, now 1197
Generating angles, dihedrals and pairs...

WARNING: WARNING: Residue 1 named GLN of a molecule in the input file was mapped
to an entry in the topology database, but the atom H used in
an interaction of type angle in that entry is not found in the
input file. Perhaps your atom and/or residue naming needs to be
fixed.



WARNING: WARNING: Residue 123 named VAL of a molecule in the input file was mapped
to an entry in the topology database, but the atom O used in
an interaction of type angle in that entry is not found in the
input file. Perhaps your atom and/or residue naming needs to be
fixed.


Before cleaning: 1935 pairs
Before cleaning: 2304 dihedrals
Making cmap torsions...
There are  628 dihedrals,  603 impropers, 1748 angles
          1935 pairs,     1197 bonds and     0 virtual sites
Total mass 12878.351 a.m.u.
Total charge 2.000 e
Writing topology

Back Off! I just backed up posre.itp to ./#posre.itp.1#

Writing coordinate file...
		--------- PLEASE NOTE ------------
You have successfully generated a topology from: camelid.pdb.
The Gromos53a6 force field is used.
		--------- ETON ESAELP ------------

GROMACS reminds you: "We Look Pretty Sharp In These Clothes" (F. Zappa)

Параметры запуска:

  • gmx pdb2gmx: reads a .pdb (or .gro) file, reads some database files, adds hydrogens to the molecules and generates coordinates in GROMACS (GROMOS), or optionally .pdb, format and a topology in GROMACS format
  • -f: input
  • -o: output
  • -p: topology file
  • -ff: force field (GROMOS 53а6)
  • -ignh: ignore H atoms in the PDB file; especially useful for NMR structures; if H atoms are present, they must be in the named exactly how the force fields in GROMACS expect them to be
  • -water: water model to use: select, none, spc, spce, tip3p, tip4p, tip5p, tips3p

Утилита mark_sur для каждого атома определяет ASA (accesible surface area - поверхность, доступная для растворителя), которая обычно определяется "качением" молекулу воды радиуса 1.4 А по срезам структуры. Если атом имеет ASA больше 1 А, он отмечается утилитой mark_sur как поверхностный. Проведём препроцессинг файлов pdb:

Usage: mark_sur in_pdb_file out_pdb_file
Mark the surface residues of a PDB file

In [37]:
%%bash
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/home/preps/golovin/progs/lib
export PATH=${PATH}:/home/domain/data/prog/zdock/zdock3.0.2_linux_x64
mark_sur amylase_h.pdb amylase_m.pdb > amylase.txt
mark_sur camelid_h.pdb camelid_m.pdb > camelid.txt
In [19]:
# Формат файлов (последние 10 строк 1-го файла и перые 10 строк 2-го файла)

with open("amylase.txt") as rf1, open("camelid.txt") as rf2:
    f1 = rf1.readlines()
    f2 = rf2.readlines()
    for i in f1[-10:]: 
        print(i.strip())
        
    print("\n")
    for j in f2[:10]:
        print(j.strip())  
5048 GLU   H  unknown type.  rad set to 1.9, chg 0
5058 SER   H  unknown type.  rad set to 1.9, chg 0
5062 SER   HG unknown type.  rad set to 1.9, chg 0
5066 LYS   H  unknown type.  rad set to 1.9, chg 0
5073 LYS   HZ1unknown type.  rad set to 1.9, chg 0
5074 LYS   HZ2unknown type.  rad set to 1.9, chg 0
5075 LYS   HZ3unknown type.  rad set to 1.9, chg 0
5079 LEU   H  unknown type.  rad set to 1.9, chg 0
5086 LEU   O1 unknown type.  rad set to 1.9, chg 0
5087 LEU   O2 unknown type.  rad set to 1.9, chg 0


2 GLN   H1 unknown type.  rad set to 1.9, chg 0
3 GLN   H2 unknown type.  rad set to 1.9, chg 0
4 GLN   H3 unknown type.  rad set to 1.9, chg 0
11 GLN  1HE2unknown type.  rad set to 1.9, chg 0
12 GLN  2HE2unknown type.  rad set to 1.9, chg 0
16 VAL   H  unknown type.  rad set to 1.9, chg 0
24 GLN   H  unknown type.  rad set to 1.9, chg 0
31 GLN  1HE2unknown type.  rad set to 1.9, chg 0
32 GLN  2HE2unknown type.  rad set to 1.9, chg 0
36 LEU   H  unknown type.  rad set to 1.9, chg 0

Посмотрим, как работать с ZDOCK:

Usage: zdock [options] -R [receptor.pdb] -L [ligand.pdb]
Standard PDB format files must be processed by mark_sur to add atom radius, charge and atom type. If you know that some atoms are not in the binding site, you can block them by changing their atom type (column 55-56) to 19.
  Available options:    
   -o filename     : set the output filename (default to zdock.out)
   -N int          : set the number of output predictions, (default to 2000)
   -S int          : set the seed for the randomization of the starting PDBs (default to no randomization)
   -D              : set rotational sampling to be dense, used only if zdock output will be further processed by more detailed binding energy calculations (default to none)
   -F              : fix the receptor, preventing its rotation and switching with ligand during execution.
In [113]:
%%bash
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/home/preps/golovin/progs/lib
export PATH=${PATH}:/home/domain/data/prog/zdock/zdock3.0.2_linux_x64
zdock -N 10 -R camelid_m.pdb -L amylase_m.pdb
ZDOCK has successfully completed and zdock.out has been created.

In [58]:
! cat zdock.out
128	1.2	1
-2.617994	0.419640	2.974122
0.000000	0.000000	0.000000
amylase_m.pdb	78.264	78.684	4.322
camelid_m.pdb	27.065	-10.816	-16.094
1.832596	0.640596	-2.797608	7	26	13	1325.582
2.617994	2.073035	2.027405	120	30	121	1166.422
-2.356194	1.135834	-1.431087	116	29	3	1152.836
1.570796	2.295480	1.008277	4	24	123	1125.438
-2.617994	0.866955	-0.931351	118	29	3	1124.531
1.570796	2.115563	0.685730	5	26	127	1102.391
0.000000	0.080730	-2.622036	109	19	5	1098.537
-2.356194	1.128190	-2.288622	8	24	9	1094.670
2.356194	2.108018	1.311430	124	29	126	1080.031
1.308997	0.619654	-2.216909	8	25	16	1075.369

Вырезаем строчки, содержащие MODEL и TER, в файлах camelid_m.pdb и amylase_m.pdb и сохраняем в camelid_m.pdb и amylase_m.pdb:

In [118]:
with open("amylase_m.pdb") as wf1, open("camelid_m.pdb") as wf2:
    w1 = wf1.readlines()
    w2 = wf2.readlines()
    
with open("amylase_new.pdb", "w+") as wwf1, open("camelid_new.pdb", "w+") as wwf2:
    for l1 in w1:
        if l1.find("MODEL") == -1 and l1.find("TER") == -1:
            wwf1.write(l1)
    for l2 in w2:
        if l2.find("MODEL") == -1 and l1.find("TER") == -1:
            wwf2.write(l1)

Создаем файл-копию zdock.out.cp с заменой camelid_m.pdb и amylase_m.pdb на сamelid_new.pdb и amylase_new.pdb и удаленной второй строчкой:

In [119]:
with open("zdock.out") as rf, open("zdock.out.cp", "w+") as wf:
    new_z = []
    for z  in rf:
        z.replace("amylase_m.pdb", "amylase_new.pdb")
        z.replace("camelid_m.pdb", "camelid_new.pdb")
        new_z.append(z)
    z = new_z[:1] + new_z[2:]
    
    for n in z:
        wf.write(n)
In [120]:
! cat zdock.out.cp
128	1.2	1
0.000000	0.000000	0.000000
amylase_m.pdb	78.264	78.684	4.322
camelid_m.pdb	27.065	-10.816	-16.094
1.832596	0.640596	-2.797608	7	26	13	1325.582
2.617994	2.073035	2.027405	120	30	121	1166.453
-2.356194	1.135834	-1.431087	116	29	3	1152.852
1.570796	2.295480	1.008277	4	24	123	1125.406
-2.617994	0.866955	-0.931351	118	29	3	1124.531
1.570796	2.115563	0.685730	5	26	127	1102.391
0.000000	0.080730	-2.622036	109	19	5	1098.541
-2.356194	1.128190	-2.288622	8	24	9	1094.666
2.356194	2.108018	1.311430	124	29	126	1080.031
1.308997	0.619654	-2.216909	8	25	16	1075.373

Проведем предварительный анализ результатов докинга с помощью утилиты zrank:

In [121]:
%%bash
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/home/preps/golovin/progs/lib
export PATH=${PATH}:/home/domain/data/prog/zdock/zrank_linux_64bit
zrank zdock.out.cp 1 10

sort -n -k2 zdock.out.cp.zr.out | head
8	-0.00450067
10	2.13888
1	39.7335
6	122.573
4	390.311
3	1763.79
9	1764.08
5	1807.34
7	2057.38
2	2256.71
In [124]:
%%bash
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/home/preps/golovin/progs/lib
export PATH=${PATH}:/home/domain/data/prog/zdock/zdock3.0.2_linux_x64
ln -s /home/domain/data/prog/zdock/zdock3.0.2_linux_x64/create_lig create_lig
create.pl zdock.out

Скачаем структуру 1KXT из банка PDB:

In [189]:
%%bash
wget https://files.rcsb.org/view/1KXT.pdb
--2019-05-22 19:42:07--  https://files.rcsb.org/view/1KXT.pdb
Resolving files.rcsb.org (files.rcsb.org)... 128.6.244.12
Connecting to files.rcsb.org (files.rcsb.org)|128.6.244.12|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: ‘1KXT.pdb’

     0K .......... .......... .......... .......... ..........  212K
    50K .......... .......... .......... .......... ..........  423K
   100K .......... .......... .......... .......... .......... 82.3M
   150K .......... .......... .......... .......... ..........  425K
   200K .......... .......... .......... .......... .......... 57.1M
   250K .......... .......... .......... .......... .......... 78.6M
   300K .......... .......... .......... .......... ..........  431K
   350K .......... .......... .......... .......... .......... 76.2M
   400K .......... .......... .......... .......... .......... 69.4M
   450K .......... .......... .......... .......... .......... 56.7M
   500K .......... .......... .......... .......... .......... 65.2M
   550K .......... .......... .......... .......... .......... 76.4M
   600K .......... .......... .......... .......... ..........  438K
   650K .......... .......... .......... .......... .......... 50.2M
   700K .......... .......... .......... .......... .......... 81.1M
   750K .......... .......... .......... .......... .......... 49.6M
   800K .......... .......... .......... .......... .......... 55.2M
   850K .......... .......... .......... .......... .......... 63.6M
   900K .......... .......... .......... .......... .......... 67.1M
   950K .......... .......... .......... .......... .......... 75.5M
  1000K .......... .......... .......... .......... .......... 64.2M
  1050K .......... .......... .......... .......... .......... 79.7M
  1100K .......... .......... .......... .......... .......... 88.0M
  1150K .......... .......... .......... .......... .......... 89.7M
  1200K .......... .......... .......... .......... .......... 82.4M
  1250K .......... .......... .......... .......... ..........  455K
  1300K .......... .......... .......... .......... .......... 72.5M
  1350K .......... .......... .                                95.1M=0.8s

2019-05-22 19:42:08 (1.62 MB/s) - ‘1KXT.pdb’ saved [1404459]

Визуально сравним результаты с известной структурой 1kxt (модели расположены в порядке выдачи zrank):

In [187]:
from IPython.display import display,Image

a = Image(filename='vhh8.png')
b = Image(filename='vhh10.png')
c = Image(filename='vhh1.png')
d = Image(filename='vhh6.png')
e = Image(filename='vhh4.png')
f = Image(filename='vhh3.png')
g = Image(filename='vhh9.png')
h = Image(filename='vhh5.png')
i = Image(filename='vhh7.png')
j = Image(filename='vhh2.png')

display(a, b, c, d, e, f, g, h, i, j)

Посчитаем RMSD с помощью rms пакета GROMACS, чтобы найти модель наиболее близкую к результату РСА:

gmx rms -f complex.i.pdb -s 1KXT.pdb -o rmsd_i, i = 1, ..., 10 

RMSD считалось по группе "Protein", т. е. 1

Structure RMSD
1 4.7041268
2 4.3613758
3 4.7204866
4 4.3944025
5 4.6301455
6 4.4350629
7 4.4898782
8 4.6887312
9 4.3714395
10 4.8258491

Согласно данным RMSD наилучшей структурой является 2, однако по результатам zrank лидирует 8, а 2, наоборот, занимает последнее место в топ-10.

Ссылки