< 3rd term

RNA secondary structure and DNA-protein contacts

Last update on the 2nd of October, 2017

In this task we perform predictions of secondary RNA structure and compare them to the real one. We describe DNA-protein contacts and evaluate most valuable aminoacid residue for DNA recognition.

List of downloads
File Link
Bash loop for inverted repeats search einv.sh
Bash loop for RNA folding images zuker_color.sh
Jmol script with 3 sets sets.spt
Jmol script with various representations of 1KSX main_script.spt
Jmol script for calculating DNA-protein contacts contact.spt

Prediction of secondary RNA structure

einv.sh, zuker_color.sh

Secondary RNA structure relies on base interaction. In contrast to DNA, non-canonical base pairs are widespread and complementary RNA motifs can be directed in various ways. So, it looks interesting to compare the prediction power of non- and specialized programs. The given RNA is Gln-tRNA in 1QRT PDB structure.

First, an EMBOSS einverted program was tested. This program finds inverted repeats in nucleotide sequence. Gap and mismatch penalties were scrolled through two spans in bash for loop (einv. sh file) with threshold = 0 to achieve most appropriate result. The result (fig. 1) was achieved with einverted -seq 1QRT_B.fasta -match 3 -mismatch -1 -gap 1 -threshold 0 line. The program found all canonical pairs of acceptor and anticodon stems, but not of D- and T-stems, which is due to the used algorithm of mining the "alignment" with the biggest score, which do not perceive tRNA folding features.

Fig. 1. The most appropriate to the real structure result of searching inverted repeats in 1QRT RNA.
SEQUENCE: Score 59: 25/30 ( 83%) matches, 11 gaps
       1 tggggtatc---g---ccaagc--ggtaaggcaccggattc 34      
         ||||||| |   |   | || |  || || |  ||||| ||
      72 accccat-gctcctaagcttggagcc-ttac--ggccttag 36

Second program is RNAfold in ViennaRNA package. It is based on Zuker algorithm to calculate the minimum of free energy in RNA folding. --MEA and --pfScale parameters were scrolled through (zuker.sh file) to obtain structures with various pairs in terms of probability. However, all pictures were the same (fig. 2). Colored variant was obtained with Perl relplot.pl -p utility.

Fig. 2. 1QRT RNA secondary structure derieved with Zuker algorithm.
Hue from red to violet stands for increasing entropy (lowering probability) of base pairs being paired and unpaired bases of being unpaired.

It is clearly seen that specialized program copes with the task better. More pairs are predicted in all four stems, but only canonical. The real structure based on X-ray analysis is shown in the figure 3 and contains non-canonical pairs and simulacrum of pseudoknot.

Fig. 3. 1QRT secondary structure.
The image was done with PseudoViewer Web Application[1]. The data of pairing bases was obtained with 3DNA find_pair program, which relies on PDB structure.

To compare results of two programs, base pairs were count and included in table 1.

Table 1. Comparaitve analysis of program prediction power in terms of RNA folding.
In table cells range of bases involved in pairing is written in common notation.
RNA segment find_pair einverted RNAfold
Acceptor stem 5'-1-7-3'
5'-66-71-3'
7 pairs
all 7 pairs all 7 pairs
D-stem 5'-10-12-3'
5'-23-25-3'
3 pairs
0 pairs all 3 pairs
Anticodon stem 5'-26-33-3'
5'-37-44-3'
8 pairs
5'-27-31-3'
5'-39-43-3'
5'-33-3'
5'-37-3'
6/8 pairs
5'-27-31-3'
5'-39-43-3'
5/8 pairs
T-stem 5'-49-53-3'
5'-61-65-3'
5 pairs
0 pairs all 5 pairs
Total canonical pairs 22 12 20

As table 1 suggests, specialized program is better for RNA folding predictions. However none of the programs was able to predict non-canonical interactions and pseudoknots from the primary structure data.

It should be stated that PDB data contains no 1st and 17th bases, which is why the acceptor stem in figure 1 do not contain first U-A pair.

DNA-protein contacts

main_script.spt, sets.spt, contact.spt

The study object is 1KSX PDB structure. It contains two intermediaties of replication initiation complex. The first intermediate was taken. It consists of one dsDNA and tetrameric protein.

First, some Jmol stuff was done. Script with defined sets of: 1) oxygen in 2'-deoxyribose; 2) oxygen in phosphoric acid residues; 3) nitrogen in nucleotide bases - is named sets.spt (see list of downloads). main_script.spt serves views of DNA-protein model and three described sets.

DNA-protein contacts are determined as close proximity of corresponding atoms in both structures. Polar contact involves nitrogen or oxygen atoms at the distance 3.5 Å or less, nonpolar contact - sulfur or phosphorus, 4.5 Å. To count the number of contacts, cunning script contact.spt was developed. The results are presented in the table 2. Occasionally, DNA backbone contacts prevail over grooves interaction.

Table 2. Number of DNA-protein contacts in 1KSX model.
DNA structure Polar Nonpolar Total
2'-deoxyribose 2 78 80
Phosphoric acid residue 41 44 85
Major groove 0 6 6
Minor groove 0 0 0

To assess the most interacting and valuable for DNA recognition aminoacid residue, a program nucplot was run. *.bond file provided information of detected interactions: hydrogen bonds, covalent bonds and non-bonded contacts. As protein consists of four similar subunits, only B chain was under scrutiny. The most interacting residues are Phe 182 (5 contacts), Asn 184 (2) and Thr 187 (3). The image in figure 4 produced by nucplot reveals principles of these contacts. It is seen that all three residues recognise the same site in DNA backbone.

Fig. 4. Nucplot image of DNA-protein interactions.
Phe 182, Asn 184 and Thr 187 are bordered red. These three residues recognise backbone of 6-8 nucleotides.

Most of recognition contacts are primarily with phosphate residue due to polar interactions of Asn and Thr and (putative) π-system mediated Phe interaction. Jmol visualization (fig. 5) prooves the results. Indeed, residues almost cover the DNA backbone.

Fig. 5. Jmol representation of primary DNA-protein contacts in 1KSX model.
A) Wireframe and ball model of interacting atoms.
B) Spacefill model of interacting atoms. Number legend is given.

The most valuable residue was not simple to choose. *.bond file declares that all Phe residues form 16 contacts and all Thr form 17 contacts, others (Arg, Asn, Lys) form fewer contacts. However, Jmol select within() command revealed 4 Phe to interact with DNA and 10 Thr to do so (see fig. 6). Considered distance was 3.35 Å as nucplot takes for non-bonded contacts. Taken together, this data implies Phe to recognise only one specific site of DNA (1 residue per each subunit), whereas Thr not only does so, but also takes part in other interactions, primarily with DNA backbone from the side of major groove. So, according to the given structure, Thr is the most useful residue in terms of DNA recognition.

Fig. 6. Jmol 1KSX view with highlighted Thr and Phe.
Thr (blue and red) and Phe (green) are given in spacefill model. Blue Thrs are those which putatively interact with major groove.

References

  1. PseudoViewer — a web application program for visualizing RNA secondary structures with pseudoknots.