Motif search with MEME suite
Last update on the 29th of March, 2018Putative transcription factor binding motifs in E. coli upstream regions of genes related to purine biosynthesis were found and analyzed.
File | Link |
---|---|
Upstream sequences | upstream.fasta |
MEME report | meme.html |
TOMTOM report | tomtom.txt |
FIMO report, related python and R scripts, tables | fimo_out.zip |
Data desription
upstream.fastaThe chosen object is E. coli strain K12 (ECOLI in Uniprot and U00096.3 in EMBL). There were found 17 reviewed proteins related to purine biosynthesis in Uniprot. Only 10 proteins were taken for further processing (see table 1).
Entry | Entry name | Protein names | Gene names | Gene coordinates | Upstream 100 coordinates |
---|---|---|---|---|---|
P0ADG7 | IMDH_ECOLI | Inosine-5'-monophosphate dehydrogenase | guaB | complement(2632604..2634070) | complement(2634071..2634172) |
P04079 | GUAA_ECOLI | GMP synthase [glutamine-hydrolyzing] | guaA | complement(2630958..2632535) | complement(2634536..2634637) |
P0AB89 | PUR8_ECOLI | Adenylosuccinate lyase | purB | complement(1190616..1191986) | complement(1191987..1192088) |
P0ACP7 | PURR_ECOLI | HTH-type transcriptional repressor PurR | purR | 1737844..1738869 | 1737742..1737843 |
P15254 | PUR4_ECOLI | Phosphoribosylformylglycinamidine synthase | purL | complement(2691656..2695543) | complement(2695544..2695645) |
P0AG16 | PUR1_ECOLI | Amidophosphoribosyltransferase | purF | complement(2428721..2430238) | complement(2430239..24303240) |
P08179 | PUR3_ECOLI | Phosphoribosylglycinamide formyltransferase | purN | 2622234..2622872 | 2622132..2622233 |
P0A7D4 | PURA_ECOLI | Adenylosuccinate synthetase | purA | 4404687..4405985 | 4404585..4404686 |
P33221 | PURT_ECOLI | Formate-dependent phosphoribosylglycinamide formyltransferase | purT | 1930881..1932059 | 1930779..1930880 |
P37051 | PURU_ECOLI | Formyltetrahydrofolate deformylase | purU | complement(1287782..1288624) | complement(1288625..1288726) |
First, the upstream-100 sequences were extracted from EMBL file with descseq
and put into file genes.fasta
.
Then MEME program was run as follows: ememe -dataset genes.fasta -outdir result -nmotifs 3 -revcomp Y
. 3 found motifs
are pesented in table 2.
Number | Logo | E-value | Occurence |
---|---|---|---|
1 | 1.7E-2 | 10/10 | |
2 | 9.7E+2 | 6/10 | |
3 | 1.5E+3 | 10/10 |
The only plausible motif is the first one with E-value of 0.017 and occurence in all given sequences. This motif is a bitty one unlike the rest. It is also the longest one. The second motif with relatively small E-value of 97 is presented only in six sequences. The third motif has the highest E-value and occur in each sequence. This phenomenon stems from its E-value as the expected number of findings in a set of the same properties not worse the given one in terms of log likelihood ratio. So it does occur in each sequence, but only once.
The motifs also have P-value for each occurence, which is a measure of probability of a random string to have the same match score with the position specific scoring matrix or higher. The motif 1 showed significant divergence in P-values in Kruskal-Wallis rank test (p-value = 0.039 against motif 2) and in Dunn's test for multiple comparisons (p-value = 0.00027 against motif 3). The distribution of P-values of all 3 motifs is presented in fig. 1.
Comparison with real motif
The E. coli PurR DNA-binding transcriptional repressor was found in RegulonDB[1]. It is reported to regulate 8 out of 10 genes under survey. The remaining two are purT and purU. The former gives the second greatest p-value of motif whereas the latter gives the highest p-value, which strongly suggests the purT can be regulated by purR repressor (as well as purU as its p-value is still low).
The reported motif looks quite similar to the motif 1 (fig. 2). It is shorter (16 vs 21 nts) and possesses some variations in base scores at several positions.
Quering the motifs
tomtom.txt
One of MEME programs is a TOMTOM[2] tool used for motif comparison in given database. The online version[3] takes meme.txt
file as input and outputs all findings in database and its e-values and some other information. All 3 motifs were queried as single file against Swiss Regulon DB for E. coli, the text
output can be obsereved in tomtom.txt
file.
Motifs yieled 10, 11 and 19 reported motifs, respectively. The PurR_17_3 motif was found for motif 1 with the lowest E-value (2.3E-5). Other motifs are of high expectancy as for motifs 2 and 3. It has to be mentioned that the motif 3 yielded one motif with E-value = 0.05 but the reported motif is much more extended (26 vs 10 nts). Furthermore the 10-gram can occur frequently in the genome so it is not the plausible finding. The high expectancy of other findings might be explained by the weakness of queried motifs, low size of database (87 motifs) and high specificity of motifs.
In total, the distribution of p-values across queried motifs is quite similar (p-values in Dunn comparisons > 0.4), see fig. 3.
Genome-wide search of motifs
fimo_out.zipTo search for motifs 1-3 in the bacteria genome the FIMO program was used instead of MAST because of easy parsing of it (MAST output was vaguely organized). Moreover, FIMO looks for individual motifs and yields individual p-values for each finding unlike MAST. The genome file in fasta format and genomic annotation in gff format were taken from NCBI NC_000913.3 entry for E. coli strain K12.
The FIMO was run with fimo meme.txt ecoli.fasta
, fancy html output is in fimo.html
. To define which motifs occur in upstream-100 sequences,
the fimo.gff
output was intersected with upstream-100 annotations. The annotation was obtained with bedtools flank -i sequence.gff3 -l 100 -s -g genome.gff -r 0 > flank.gff
,
the fimo.gff
was corrected with python scrirpt to make start positions less than end ones.
The intersection was done with bedtools intersect -a fimo_corr.gff -b flank.gff -loj > inter.tab
, then filtered with python script and reorganized for good-looking report.
Yielded motifs were marked as "purine" in case the particular gene is regulated by purR (RegulonDB) or the GO process contain "purine" word, otherwise they were marked as "not purine". These two groups differ in p-values of obtained motifs (Wilcoxon test, p-value = 1.057e-07), see fig. 4.
As it is seen, there are at least one strong motif in "not purine" group thus giving the room for annotating it as involved in purine synthesis.
References
- RegulonDB record for purR repressor;
- Shobhit Gupta, JA Stamatoyannopolous, Timothy Bailey and William Stafford Noble, "Quantifying similarity between motifs", Genome Biology, 8(2):R24, 2007;
- TOMTOM online tool.