******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 4.3.0 (Release date: Sat Sep 26 01:51:56 PDT 2009) For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.nbcr.net. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.nbcr.net. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= upstream.fasta ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ orf1ab_upstream 1.0000 277 S_protein_upstream 1.0000 100 NS3_protein_upstream 1.0000 100 NS4A_protein_upstream 1.0000 100 NS4B_protein_upstream 1.0000 100 NS5_protein_upstream 1.0000 100 envelope_protein_upstrea 1.0000 100 membrane_protein_upstrea 1.0000 100 nucleoprotein_upstream 1.0000 100 ORF8b_protein 1.0000 100 ******************************************************************************** ******************************************************************************** COMMAND LINE SUMMARY ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. command: meme upstream.fasta -oc result -dna -mod zoops -nmotifs 3 -minsites 2 -maxsites 600 -minw 6 -maxw 50 model: mod= zoops nmotifs= 3 evt= inf object function= E-value of product of p-values width: minw= 6 maxw= 50 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 2 maxsites= 10 wnsites= 0.8 theta: prob= 1 spmap= uni spfuzz= 0.5 global: substring= yes branching= no wbranch= no em: prior= dirichlet b= 0.01 maxiter= 50 distance= 1e-05 data: n= 1177 N= 10 strands: + sample: seed= 0 seqfrac= 1 Letter frequencies in dataset: A 0.259 C 0.223 G 0.199 T 0.319 Background letter frequencies (from dataset with add-one prior applied): A 0.259 C 0.224 G 0.199 T 0.318 ******************************************************************************** ******************************************************************************** MOTIF 1 width = 10 sites = 10 llr = 103 E-value = 4.4e-010 ******************************************************************************** -------------------------------------------------------------------------------- Motif 1 Description -------------------------------------------------------------------------------- Simplified A 11a8::aa:1 pos.-specific C :1:1a1::82 probability G :::1:9:::: matrix T 98::::::27 bits 2.3 2.1 * 1.9 * **** 1.6 * **** Relative 1.4 * ***** Entropy 1.2 * ******* (14.9 bits) 0.9 * ******* 0.7 ********** 0.5 ********** 0.2 ********** 0.0 ---------- Multilevel TTAACGAACT consensus TC sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ---------- membrane_protein_upstrea 83 1.44e-06 ACGAGTGGGT TTAACGAACT CCTTCATA NS5_protein_upstream 91 1.44e-06 ATCCAGGATT TTAACGAACT NS4A_protein_upstream 89 1.44e-06 ACTCAGTTAA TTAACGAACT CT NS3_protein_upstream 87 1.44e-06 TGTTCACTAA TTAACGAACT ATTA S_protein_upstream 47 1.44e-06 GAGAGTCAAA TTAACGAACT CGTAATATCT orf1ab_upstream 60 1.44e-06 AACTTTGATT TTAACGAACT TAAATAAAAG nucleoprotein_upstream 77 1.52e-05 TTAATTGATT TTAACGAATC TCAATTTCAT envelope_protein_upstrea 91 4.14e-05 GGACATATGG AAAACGAACT ORF8b_protein 44 1.10e-04 TACACTGGGC TTACCCAACA CGGGAAAGTC NS4B_protein_upstream 72 1.17e-04 AGGACGCAGC TCAGCGAATC GCTTGGTTGC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- membrane_protein_upstrea 1.4e-06 82_[1]_8 NS5_protein_upstream 1.4e-06 90_[1] NS4A_protein_upstream 1.4e-06 88_[1]_2 NS3_protein_upstream 1.4e-06 86_[1]_4 S_protein_upstream 1.4e-06 46_[1]_44 orf1ab_upstream 1.4e-06 59_[1]_208 nucleoprotein_upstream 1.5e-05 76_[1]_14 envelope_protein_upstrea 4.1e-05 90_[1] ORF8b_protein 0.00011 43_[1]_47 NS4B_protein_upstream 0.00012 71_[1]_19 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=10 seqs=10 membrane_protein_upstrea ( 83) TTAACGAACT 1 NS5_protein_upstream ( 91) TTAACGAACT 1 NS4A_protein_upstream ( 89) TTAACGAACT 1 NS3_protein_upstream ( 87) TTAACGAACT 1 S_protein_upstream ( 47) TTAACGAACT 1 orf1ab_upstream ( 60) TTAACGAACT 1 nucleoprotein_upstream ( 77) TTAACGAATC 1 envelope_protein_upstrea ( 91) AAAACGAACT 1 ORF8b_protein ( 44) TTACCCAACA 1 NS4B_protein_upstream ( 72) TCAGCGAATC 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 10 n= 1087 bayes= 6.75087 E= 4.4e-010 -137 -997 -997 150 -137 -116 -997 133 195 -997 -997 -997 163 -116 -99 -997 -997 216 -997 -997 -997 -116 218 -997 195 -997 -997 -997 195 -997 -997 -997 -997 184 -997 -67 -137 -16 -997 114 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 10 nsites= 10 E= 4.4e-010 0.100000 0.000000 0.000000 0.900000 0.100000 0.100000 0.000000 0.800000 1.000000 0.000000 0.000000 0.000000 0.800000 0.100000 0.100000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.100000 0.900000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.800000 0.000000 0.200000 0.100000 0.200000 0.000000 0.700000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 regular expression -------------------------------------------------------------------------------- TTAACGAA[CT][TC] -------------------------------------------------------------------------------- Time 0.67 secs. ******************************************************************************** ******************************************************************************** MOTIF 2 width = 8 sites = 4 llr = 41 E-value = 1.3e+002 ******************************************************************************** -------------------------------------------------------------------------------- Motif 2 Description -------------------------------------------------------------------------------- Simplified A ::3::a:: pos.-specific C a83a:::a probability G ::::a:a: matrix T :35::::: bits 2.3 * * 2.1 * ** ** 1.9 * ***** 1.6 * ***** Relative 1.4 * ***** Entropy 1.2 ** ***** (14.7 bits) 0.9 ** ***** 0.7 ** ***** 0.5 ******** 0.2 ******** 0.0 -------- Multilevel CCTCGAGC consensus TA sequence C -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- -------- NS3_protein_upstream 56 8.21e-06 AGGAATACGA CCTCGAGC CGCATAAGGT envelope_protein_upstrea 3 1.40e-05 CG CCCCGAGC TCGCTTATCG ORF8b_protein 1 2.07e-05 . CCACGAGC TGCACCAAAT nucleoprotein_upstream 36 3.23e-05 ACTTGCATTG CTTCGAGC TTAGGCTCTT -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- NS3_protein_upstream 8.2e-06 55_[2]_37 envelope_protein_upstrea 1.4e-05 2_[2]_90 ORF8b_protein 2.1e-05 [2]_92 nucleoprotein_upstream 3.2e-05 35_[2]_57 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 2 width=8 seqs=4 NS3_protein_upstream ( 56) CCTCGAGC 1 envelope_protein_upstrea ( 3) CCCCGAGC 1 ORF8b_protein ( 1) CCACGAGC 1 nucleoprotein_upstream ( 36) CTTCGAGC 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 8 n= 1107 bayes= 8.10722 E= 1.3e+002 -865 216 -865 -865 -865 174 -865 -35 -5 16 -865 65 -865 216 -865 -865 -865 -865 233 -865 195 -865 -865 -865 -865 -865 233 -865 -865 216 -865 -865 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 8 nsites= 4 E= 1.3e+002 0.000000 1.000000 0.000000 0.000000 0.000000 0.750000 0.000000 0.250000 0.250000 0.250000 0.000000 0.500000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 regular expression -------------------------------------------------------------------------------- C[CT][TAC]CGAGC -------------------------------------------------------------------------------- Time 1.23 secs. ******************************************************************************** ******************************************************************************** MOTIF 3 width = 24 sites = 4 llr = 81 E-value = 1.4e+002 ******************************************************************************** -------------------------------------------------------------------------------- Motif 3 Description -------------------------------------------------------------------------------- Simplified A ::55:35::::38:a33:aa33:: pos.-specific C a53::35::3:33::533::53:: probability G :::::5:5a833:::::8:::::a matrix T :535a::5::83:a:35:::35a: bits 2.3 * * 2.1 * * * 1.9 * * * ** * 1.6 * * * ** ** ** Relative 1.4 * * ** ** *** ** Entropy 1.2 * * * ** *** *** ** (29.3 bits) 0.9 ** * ***** *** *** ** 0.7 ** ******** *** *** ** 0.5 *********** ************ 0.2 *********** ************ 0.0 ------------------------ Multilevel CCAATGAGGGTAATACTGAACTTG consensus TCT ACT CGCC AAC AA sequence T C G TC TC T -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------------------------ NS4B_protein_upstream 8 1.36e-10 CAGCTAT CCTTTGCTGGTTATACTGAATCTG CTGTTAATTC NS4A_protein_upstream 25 1.93e-10 TCAACTTCAA CTCATGATGGTCCTACCGAACATG TTACTAGTGT nucleoprotein_upstream 7 5.17e-10 GTCCGC CTATTACGGCGGATATTGAACTTG CATTGCTTCG NS5_protein_upstream 33 8.29e-10 ATACGGTCTT CCAATCAGGGTAATAAACAAATTG TTCATTCTTA -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- NS4B_protein_upstream 1.4e-10 7_[3]_69 NS4A_protein_upstream 1.9e-10 24_[3]_52 nucleoprotein_upstream 5.2e-10 6_[3]_70 NS5_protein_upstream 8.3e-10 32_[3]_44 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 3 width=24 seqs=4 NS4B_protein_upstream ( 8) CCTTTGCTGGTTATACTGAATCTG 1 NS4A_protein_upstream ( 25) CTCATGATGGTCCTACCGAACATG 1 nucleoprotein_upstream ( 7) CTATTACGGCGGATATTGAACTTG 1 NS5_protein_upstream ( 33) CCAATCAGGGTAATAAACAAATTG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 24 n= 947 bayes= 7.88111 E= 1.4e+002 -865 216 -865 -865 -865 116 -865 65 95 16 -865 -35 95 -865 -865 65 -865 -865 -865 165 -5 16 133 -865 95 116 -865 -865 -865 -865 133 65 -865 -865 233 -865 -865 16 191 -865 -865 -865 33 123 -5 16 33 -35 153 16 -865 -865 -865 -865 -865 165 195 -865 -865 -865 -5 116 -865 -35 -5 16 -865 65 -865 16 191 -865 195 -865 -865 -865 195 -865 -865 -865 -5 116 -865 -35 -5 16 -865 65 -865 -865 -865 165 -865 -865 233 -865 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 24 nsites= 4 E= 1.4e+002 0.000000 1.000000 0.000000 0.000000 0.000000 0.500000 0.000000 0.500000 0.500000 0.250000 0.000000 0.250000 0.500000 0.000000 0.000000 0.500000 0.000000 0.000000 0.000000 1.000000 0.250000 0.250000 0.500000 0.000000 0.500000 0.500000 0.000000 0.000000 0.000000 0.000000 0.500000 0.500000 0.000000 0.000000 1.000000 0.000000 0.000000 0.250000 0.750000 0.000000 0.000000 0.000000 0.250000 0.750000 0.250000 0.250000 0.250000 0.250000 0.750000 0.250000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 1.000000 0.000000 0.000000 0.000000 0.250000 0.500000 0.000000 0.250000 0.250000 0.250000 0.000000 0.500000 0.000000 0.250000 0.750000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.250000 0.500000 0.000000 0.250000 0.250000 0.250000 0.000000 0.500000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 regular expression -------------------------------------------------------------------------------- C[CT][ACT][AT]T[GAC][AC][GT]G[GC][TG][ACGTA][AC]TA[CAT][TAC][GC]AA[CAT][TAC]TG -------------------------------------------------------------------------------- Time 1.74 secs. ******************************************************************************** ******************************************************************************** SUMMARY OF MOTIFS ******************************************************************************** -------------------------------------------------------------------------------- Combined block diagrams: non-overlapping sites with p-value < 0.0001 -------------------------------------------------------------------------------- SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- orf1ab_upstream 3.90e-03 59_[1(1.44e-06)]_208 S_protein_upstream 5.27e-03 46_[1(1.44e-06)]_44 NS3_protein_upstream 3.11e-06 55_[2(8.21e-06)]_23_[1(1.44e-06)]_4 NS4A_protein_upstream 6.16e-10 24_[3(1.93e-10)]_40_[1(1.44e-06)]_2 NS4B_protein_upstream 3.11e-08 7_[3(1.36e-10)]_69 NS5_protein_upstream 2.38e-09 32_[3(8.29e-10)]_34_[1(1.44e-06)] envelope_protein_upstrea 5.69e-05 2_[2(1.40e-05)]_80_[1(4.14e-05)] membrane_protein_upstrea 5.46e-04 82_[1(1.44e-06)]_8 nucleoprotein_upstream 7.66e-11 6_[3(5.17e-10)]_5_[2(3.23e-05)]_33_[1(1.52e-05)]_14 ORF8b_protein 9.11e-05 [2(2.07e-05)]_92 -------------------------------------------------------------------------------- ******************************************************************************** ******************************************************************************** Stopped because nmotifs = 3 reached. ******************************************************************************** CPU: kodomo ********************************************************************************