RNA secondary structure and DNA-protein contacts
Last update on the 2nd of October, 2017In this task we perform predictions of secondary RNA structure and compare them to the real one. We describe DNA-protein contacts and evaluate most valuable aminoacid residue for DNA recognition.
File | Link |
---|---|
Bash loop for inverted repeats search | einv.sh |
Bash loop for RNA folding images | zuker_color.sh |
Jmol script with 3 sets | sets.spt |
Jmol script with various representations of 1KSX | main_script.spt |
Jmol script for calculating DNA-protein contacts | contact.spt |
Prediction of secondary RNA structure
einv.sh, zuker_color.shSecondary RNA structure relies on base interaction. In contrast to DNA, non-canonical base pairs are widespread and complementary RNA motifs can be directed in various ways. So, it looks interesting to compare the prediction power of non- and specialized programs. The given RNA is Gln-tRNA in 1QRT PDB structure.
First, an EMBOSS einverted
program was tested. This program finds inverted repeats in nucleotide
sequence. Gap and mismatch penalties were scrolled through two spans in bash for
loop
(einv. sh file) with threshold = 0 to achieve most appropriate result. The result (fig. 1) was achieved with
einverted -seq 1QRT_B.fasta -match 3 -mismatch -1 -gap 1 -threshold 0
line. The program found all
canonical pairs of acceptor and anticodon stems, but
not of D- and T-stems, which is due to the used algorithm of mining the "alignment" with the biggest score, which
do not perceive tRNA folding features.
SEQUENCE: Score 59: 25/30 ( 83%) matches, 11 gaps 1 tggggtatc---g---ccaagc--ggtaaggcaccggattc 34 ||||||| | | | || | || || | ||||| || 72 accccat-gctcctaagcttggagcc-ttac--ggccttag 36
Second program is RNAfold
in ViennaRNA package. It is based on Zuker algorithm to calculate the
minimum of free energy in RNA folding. --MEA
and --pfScale
parameters were
scrolled through (zuker.sh file) to obtain structures with various pairs in terms of probability. However, all pictures
were the same (fig. 2). Colored variant was obtained with Perl relplot.pl -p
utility.
It is clearly seen that specialized program copes with the task better. More pairs are predicted in all four stems, but only canonical. The real structure based on X-ray analysis is shown in the figure 3 and contains non-canonical pairs and simulacrum of pseudoknot.
To compare results of two programs, base pairs were count and included in table 1.
RNA segment | find_pair | einverted | RNAfold |
---|---|---|---|
Acceptor stem | 5'-1-7-3' 5'-66-71-3' 7 pairs |
all 7 pairs | all 7 pairs |
D-stem | 5'-10-12-3' 5'-23-25-3' 3 pairs |
0 pairs | all 3 pairs |
Anticodon stem | 5'-26-33-3' 5'-37-44-3' 8 pairs |
5'-27-31-3' 5'-39-43-3' 5'-33-3' 5'-37-3' 6/8 pairs |
5'-27-31-3' 5'-39-43-3' 5/8 pairs |
T-stem | 5'-49-53-3' 5'-61-65-3' 5 pairs |
0 pairs | all 5 pairs |
Total canonical pairs | 22 | 12 | 20 |
As table 1 suggests, specialized program is better for RNA folding predictions. However none of the programs was able to predict non-canonical interactions and pseudoknots from the primary structure data.
It should be stated that PDB data contains no 1st and 17th bases, which is why the acceptor stem in figure 1 do not contain first U-A pair.
DNA-protein contacts
main_script.spt, sets.spt, contact.sptThe study object is 1KSX PDB structure. It contains two intermediaties of replication initiation complex. The first intermediate was taken. It consists of one dsDNA and tetrameric protein.
First, some Jmol stuff was done. Script with defined sets of: 1) oxygen in 2'-deoxyribose; 2) oxygen in phosphoric
acid residues; 3) nitrogen in nucleotide bases - is named sets.spt
(see list of downloads).
main_script.spt
serves views of DNA-protein model and three described sets.
DNA-protein contacts are determined as close proximity of corresponding atoms in both structures. Polar contact
involves nitrogen or oxygen atoms at the distance 3.5 Å or less, nonpolar contact - sulfur or phosphorus, 4.5 Å. To
count the number of contacts, cunning script contact.spt
was developed. The results are presented
in the table 2. Occasionally, DNA backbone contacts prevail over grooves interaction.
DNA structure | Polar | Nonpolar | Total |
---|---|---|---|
2'-deoxyribose | 2 | 78 | 80 |
Phosphoric acid residue | 41 | 44 | 85 |
Major groove | 0 | 6 | 6 |
Minor groove | 0 | 0 | 0 |
To assess the most interacting and valuable for DNA recognition aminoacid residue, a program
nucplot
was run. *.bond
file provided information of detected interactions:
hydrogen bonds, covalent bonds and non-bonded contacts. As protein consists of four similar subunits, only
B chain was under scrutiny. The most interacting residues are Phe 182 (5 contacts), Asn 184 (2)
and Thr 187 (3). The image in figure 4 produced by nucplot
reveals principles of these contacts.
It is seen that all three residues recognise the same site in DNA backbone.
Most of recognition contacts are primarily with phosphate residue due to polar interactions of Asn and Thr and (putative) π-system mediated Phe interaction. Jmol visualization (fig. 5) prooves the results. Indeed, residues almost cover the DNA backbone.
The most valuable residue was not simple to choose. *.bond
file declares that all Phe residues
form 16 contacts and all Thr form 17 contacts, others (Arg, Asn, Lys) form fewer contacts. However, Jmol
select within()
command revealed 4 Phe to interact with DNA and 10 Thr to do so (see fig. 6).
Considered distance was 3.35 Å as nucplot takes for non-bonded contacts.
Taken together, this data implies Phe to recognise only one specific site of DNA (1 residue per each subunit),
whereas Thr not only does so, but also takes part in other interactions, primarily with DNA backbone from the
side of major groove. So, according to the given structure, Thr is the most useful residue in terms of DNA
recognition.
References
- PseudoViewer — a web application program for visualizing RNA secondary structures with pseudoknots.