FBB MSU site Main page About me Terms

Signals, motifs, PWM

Preliminary work

According the task 9 proteins engaged in purine biosynthesis were chosen:
keyword:"Purine biosynthesis [KW-0658]"
organism:"Yersinia pestis"

There are 45 reviewed and there are 5 popular organisms. Among them I chose Yersinia pestis bv. Antiqua (strain Antiqua) which have 9 reviewed proteins. Mnemonic: YERPA. EMBL: CP000308 Genomic DNA.

The list of reviewed proteins of Yersinia pestis bv. Antiqua you can see in the folloing table.

EntryEntry nameProtein namesGene namesCoordinates+ 100 nt upstream
Q1C5E7PUR4_YERPAPhosphoribosylformylglycinamidine synthase, FGAM synthase, FGAMS, EC 6.3.5.3purL YPA_2360complement(2646172..2650035)complement(2650036..2650136)
Q1C8V8PURT_YERPAFormate-dependent phosphoribosylglycinamide formyltransferasepurT YPA_1147complement(1300219..1301400)complement(1301401..1301501)
Q1C105PURA_YERPAFormate-dependent phosphoribosylglycinamide formyltransferasepurA YPA_3906complement(4398153..4399451)complement(4399452..4399552)
Q1C4U4FOLD_YERPABifunctional protein FolDfolD YPA_25652858239..28591052858138..2858238
Q1C5J6GUAA_YERPAGMP synthase [glutamine-hydrolyzing], EC 6.3.5.2guaA YPA_2311complement(2587010..2588587)complement(2588588..2588688)
Q1C1V9PUR9_YERPAGMP synthase [glutamine-hydrolyzing], EC 6.3.5.2purH YPA_36014038210..40397994038109..4038209
Q1C1V9PURR_YERPAHTH-type transcriptional repressor PurRpurR YPA_17321942437..19434621942336..1942436
Q1C5P2PUR5_YERPAPhosphoribosylformylglycinamidine cyclo-ligase, EC 6.3.3.1purM YPA_22652536453..25374962536352..2536452
Q1C5Q7PUR7_YERPAPhosphoribosylaminoimidazole-succinocarboxamide synthase, EC 6.3.2.6purC YPA_2250complement(2523727..2524440)complement(2524441..2524541)

Sequences of upstream regions was cut from FASTA file with genome and saved in 100upstream.fasta. (Syntax: to cut sequenses: 'descseq CP000308.fasta:[xxxx:xxxx:r?] -name **** -description YPA_**** -outseq up*.fasta', to join them: 'seqret up*.fasta 100upstream.fasta)

To find motifs in upstream regions I used 'ememe' installed on kodomo. Syntax: 'ememe -nmotifs 3 -revcomp 100upstream.fasta' (-nmotifs 3 = find 3 motifs, -revcomp = use both forward and reverse sequences). Output files here and the main one - meme.html.

Analysis of motifs

Width - width, sites - number of occurrences in the training set (number of sequences where motif was found), I - information content, llr - log likelihood ratio, E-value - you understand, it is reliability of our findings.

MOTIF 1
  • width = 10
  • sites = 9
  • I = 11.7 bits
  • llr = 76
  • E-value = 1.7e+002
MOTIF 2
  • width = 11
  • sites = 6
  • I = 15.4 bits
  • llr = 64
  • E-value = 4.2e+003
MOTIF 3
  • width = 11
  • sites = 2
  • I = 19.0 bits
  • lls = 29
  • E-value = 1.3e+004

Summary:
Motives 1-3 have E-value > 0.001 and low information content. Motif 1 only was found in all 9 sequenses. I think these motives are not reliable.

Term 4

← Block 1→ Pr 6


© Darya Potanina, 2017