< 3^rd term

RNA secondary structure and DNA-protein contacts

Last update on the 2^nd of October, 2017

In this task we perform predictions of secondary RNA structure and compare them to the real one. We describe DNA-protein contacts and evaluate most valuable aminoacid residue for DNA recognition.

List of downloads
File	Link
Bash loop for inverted repeats search	einv.sh
Bash loop for RNA folding images	zuker_color.sh
Jmol script with 3 sets	sets.spt
Jmol script with various representations of 1KSX	main_script.spt
Jmol script for calculating DNA-protein contacts	contact.spt

Prediction of secondary RNA structure

einv.sh, zuker_color.sh

Secondary RNA structure relies on base interaction. In contrast to DNA, non-canonical base pairs are widespread and complementary RNA motifs can be directed in various ways. So, it looks interesting to compare the prediction power of non- and specialized programs. The given RNA is Gln-tRNA in 1QRT PDB structure.

First, an EMBOSS einverted program was tested. This program finds inverted repeats in nucleotide sequence. Gap and mismatch penalties were scrolled through two spans in bash for loop (einv. sh file) with threshold = 0 to achieve most appropriate result. The result (fig. 1) was achieved with einverted -seq 1QRT_B.fasta -match 3 -mismatch -1 -gap 1 -threshold 0 line. The program found all canonical pairs of acceptor and anticodon stems, but not of D- and T-stems, which is due to the used algorithm of mining the "alignment" with the biggest score, which do not perceive tRNA folding features.

Fig. 1. The most appropriate to the real structure result of searching inverted repeats in 1QRT RNA.

SEQUENCE: Score 59: 25/30 ( 83%) matches, 11 gaps
       1 tggggtatc---g---ccaagc--ggtaaggcaccggattc 34      
         ||||||| |   |   | || |  || || |  ||||| ||
      72 accccat-gctcctaagcttggagcc-ttac--ggccttag 36

Second program is RNAfold in ViennaRNA package. It is based on Zuker algorithm to calculate the minimum of free energy in RNA folding. --MEA and --pfScale parameters were scrolled through (zuker.sh file) to obtain structures with various pairs in terms of probability. However, all pictures were the same (fig. 2). Colored variant was obtained with Perl relplot.pl -p utility.

It is clearly seen that specialized program copes with the task better. More pairs are predicted in all four stems, but only canonical. The real structure based on X-ray analysis is shown in the figure 3 and contains non-canonical pairs and simulacrum of pseudoknot.

To compare results of two programs, base pairs were count and included in table 1.

Table 1. Comparaitve analysis of program prediction power in terms of RNA folding.
In table cells range of bases involved in pairing is written in common notation.
RNA segment	find_pair	einverted	RNAfold
Acceptor stem	5'-1-7-3' 5'-66-71-3' 7 pairs	all 7 pairs	all 7 pairs
D-stem	5'-10-12-3' 5'-23-25-3' 3 pairs	0 pairs	all 3 pairs
Anticodon stem	5'-26-33-3' 5'-37-44-3' 8 pairs	5'-27-31-3' 5'-39-43-3' 5'-33-3' 5'-37-3' 6/8 pairs	5'-27-31-3' 5'-39-43-3' 5/8 pairs
T-stem	5'-49-53-3' 5'-61-65-3' 5 pairs	0 pairs	all 5 pairs
Total canonical pairs	22	12	20

As table 1 suggests, specialized program is better for RNA folding predictions. However none of the programs was able to predict non-canonical interactions and pseudoknots from the primary structure data.

It should be stated that PDB data contains no 1^st and 17^th bases, which is why the acceptor stem in figure 1 do not contain first U-A pair.

DNA-protein contacts

main_script.spt, sets.spt, contact.spt

The study object is 1KSX PDB structure. It contains two intermediaties of replication initiation complex. The first intermediate was taken. It consists of one dsDNA and tetrameric protein.

First, some Jmol stuff was done. Script with defined sets of: 1) oxygen in 2'-deoxyribose; 2) oxygen in phosphoric acid residues; 3) nitrogen in nucleotide bases - is named sets.spt (see list of downloads). main_script.spt serves views of DNA-protein model and three described sets.

DNA-protein contacts are determined as close proximity of corresponding atoms in both structures. Polar contact involves nitrogen or oxygen atoms at the distance 3.5 Å or less, nonpolar contact - sulfur or phosphorus, 4.5 Å. To count the number of contacts, cunning script contact.spt was developed. The results are presented in the table 2. Occasionally, DNA backbone contacts prevail over grooves interaction.

Table 2. Number of DNA-protein contacts in 1KSX model.
DNA structure	Polar	Nonpolar	Total
2'-deoxyribose	2	78	80
Phosphoric acid residue	41	44	85
Major groove	0	6	6
Minor groove	0	0	0

To assess the most interacting and valuable for DNA recognition aminoacid residue, a program nucplot was run. *.bond file provided information of detected interactions: hydrogen bonds, covalent bonds and non-bonded contacts. As protein consists of four similar subunits, only B chain was under scrutiny. The most interacting residues are Phe 182 (5 contacts), Asn 184 (2) and Thr 187 (3). The image in figure 4 produced by nucplot reveals principles of these contacts. It is seen that all three residues recognise the same site in DNA backbone.

Most of recognition contacts are primarily with phosphate residue due to polar interactions of Asn and Thr and (putative) π-system mediated Phe interaction. Jmol visualization (fig. 5) prooves the results. Indeed, residues almost cover the DNA backbone.

The most valuable residue was not simple to choose. *.bond file declares that all Phe residues form 16 contacts and all Thr form 17 contacts, others (Arg, Asn, Lys) form fewer contacts. However, Jmol select within() command revealed 4 Phe to interact with DNA and 10 Thr to do so (see fig. 6). Considered distance was 3.35 Å as nucplot takes for non-bonded contacts. Taken together, this data implies Phe to recognise only one specific site of DNA (1 residue per each subunit), whereas Thr not only does so, but also takes part in other interactions, primarily with DNA backbone from the side of major groove. So, according to the given structure, Thr is the most useful residue in terms of DNA recognition.

References

PseudoViewer — a web application program for visualizing RNA secondary structures with pseudoknots.