Motif Alignment and Search Tool (MAST)

For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.nbcr.net.

If you use MAST in your research, please cite the following paper:
Timothy L. Bailey and Michael Gribskov, "Combining evidence using p-values: application to sequence homology searches", Bioinformatics, 14(1):48-54, 1998.

Inputs   |   Search Results   |   Program information   |   Documentation

Inputs

Top

Sequence Databases

The following sequence database was supplied to MAST.

Database Sequence Count Residue Count Last Modified
PF01361.fasta 28 1690 Thu May 16 12:22:58 2013
Total 28 1690  

Motifs

The following motifs were supplied to MAST from "meme.xml" last modified on Thu May 16 12:23:20 2013.

      Similarity
Motif Width Best possible match 1 2 3
1 29 GRSDEQKRQLIRRVTDAMSESYGAPPSSI - 0.18 0.19
2 21 HVWIHEMPKNHWGIGGESFSD 0.18 - 0.30
3 8 PFINCYCF 0.19 0.30 -

Search Results

Top

Top Scoring Sequences

Each of the following 28 sequences has an E-value less than 10.
The motif matches shown have a position p-value less than 0.0001.
Click on the arrow (↧) next to the E-value to view more information about a sequence.

Motif 1
Motif 2
Motif 3
Sequence E-value   Block Diagram
 
0
10
20
30
40
50
60
Q9R9T3_PSEPU/2-61 2.5e-35
4OT_PSEST/2-61 4.5e-34
4OT_PSEUF/2-61 1.8e-33
Y921_STRR6/2-60 1.1e-32
Y3814_BACHD/2-61 5.7e-32
4OT_COMTE/2-61 1.8e-31
Y1363_STAAM/2-61 6.2e-31
Y924_HELPY/2-64 1.5e-30
YWHB_BACSU/2-61 1.5e-30
Q9EVV8_STRTR/2-61 4e-30
Y2002_DICD3/2-62 4.1e-30
Q9A468_CAUCR/2-62 3e-26
Y574_LACLA/2-61 3.9e-25
O85975_SPHAR/2-61 1e-24
Y270_CAMJE/2-62 6.7e-22
Q9EV85_PSEPV/2-61 1.1e-21
O27643_METTH/2-60 4e-21
Q9L0G0_STRCO/2-63 4e-20
Q98JE6_RHILO/2-63 1e-18
Q9EV84_PSEPV/2-61 4.2e-18
Q98EB0_RHILO/2-65 2.4e-17
O29588_ARCFU/2-60 5.8e-17
Q984V7_RHILO/2-61 6.4e-15
Q9L029_STRCO/2-61 6.7e-14
YRDN_BACSU/2-61 1.2e-13
PPTA_ECOLI/2-58 1.9e-13
Y1725_XYLFA/2-61 4.7e-13
Q9K3M3_STRCO/2-63 1.1e-10
Motif 1
Motif 2
Motif 3
Top
MAST version
4.9.0 (Release date: Wed Oct 3 11:07:26 EST 2012)
Reference
Timothy L. Bailey and Michael Gribskov, "Combining evidence using p-values: application to sequence homology searches", Bioinformatics, 14(1):48-54, 1998.
Command line summary

Background letter frequencies (from non-redundant database):
A: 0.073   C: 0.018   D: 0.052   E: 0.062   F: 0.040   G: 0.069   H: 0.022   I: 0.056   K: 0.058   L: 0.092   M: 0.023   N: 0.046   P: 0.051   Q: 0.041   R: 0.052   S: 0.074   T: 0.059   V: 0.064   W: 0.013   Y: 0.033

Result calculation took 0.037 seconds
show model parameters...

Explanation of MAST Results

Top

The MAST results consist of

Inputs

MAST received the following inputs.

Sequence Databases

This table summarises the sequence databases specified to MAST.

Database
The name of the database file.
Sequence Count
The number of sequences in the database.
Residue Count
The number of residues in the database.
Motifs

Summary of the motifs specified to MAST.

Name
The name of the motif. If the motif has been removed or removal is recommended to avoid highly similar motifs then it will be displayed in red text.
Width
The width of the motif. No gaps are allowed in motifs supplied to MAST as it only works for motifs of a fixed width.
Best possible match
The sequence that would achieve the best possible match score and its reverse complement for nucleotide motifs.
Similarity
MAST computes the pairwise correlations between each pair of motifs. The correlation between two motifs is the maximum sum of Pearson's correlation coefficients for aligned columns divided by the width of the shorter motif. The maximum is found by trying all alignments of the two motifs. Motifs with correlations below 0.60 have little effect on the accuracy of the combined scores. Pairs of motifs with higher correlations should be removed from the query. Correlations above the supplied threshold are shown in red text.
Nominal Order and Spacing

This diagram shows the normal spacing of the motifs specified to MAST.

Search Results

MAST provides the following motif search results.

Top Scoring Sequences

This table summarises the top scoring sequences with a Sequence E-value better than the threshold (default 10). The sequences are sorted by the Sequence E-value from most to least significant.

Sequence
The name of the sequence. This maybe be linked to search a sequence database for the sequence name.
E-value
The E-value of the sequence. For DNA only; if strands were scored seperately then there will be two E-values for the sequence seperated by a "/". The score for the provided sequence will be first and the score for the reverse-complement will be second.
Click on this to show additional information about the sequence such as a description, combined p-value and the annotated sequence.
Block Diagram
The block diagram shows the best non-overlapping tiling of motif matches on the sequence.
  • The length of the line shows the length of a sequence relative to all the other sequences.
  • A block is shown where the positional p-value of a motif is less (more significant) than the significance threshold which is 0.0001 by default.
  • If a significant motif match (as specified above) overlaps other significant motif matches then it is only displayed as a block if its positional p-value is less (more significant) then the product of the positional p-values of the significant matches that it overlaps.
  • The position of a block shows where a motif has matched the sequence.
  • The width of a block shows the width of the motif relative to the length of the sequence.
  • The colour and border of a block identifies the matching motif as in the legend.
  • The height of a block gives an indication of the significance of the match as taller blocks are more significant. The height is calculated to be proportional to the negative logarithm of the positional p-value, truncated at the height for a p-value of 1e-10.
  • Hovering the mouse cursor over the block causes the display of the motif name and other details in the hovering text.
  • DNA only; blocks displayed above the line are a match on the given DNA, whereas blocks displayed below the line are matches to the reverse-complement of the given DNA.
  • DNA only; when strands are scored separately then blocks may overlap on opposing strands.
Additional Sequence Information

Clicking on the ↧ link expands a box below the sequence with additional information and adds two dragable buttons below the block diagram.

Description
The description appearing after the identifier in the fasta file used to specify the sequence.
Combined p-value
The combined p-value of the sequence. DNA only; if strands were scored seperately then there will be two p-values for the sequence seperated by a "/". The score for the provided sequence will be first and the score for the reverse-complement will be second.
Annotated Sequence
The annotated sequence shows a portion of the sequence with the matching motif sequences displayed above. The displayed portion of the sequence can be modified by sliding the two buttons below the sequence block diagram so that the portion you want to see is between the two needles attached to the buttons. By default the two buttons move together but you can drag one individually by holding shift before you start the drag. If the strands were scored seperately then they can't be both displayed at once due to overlaps and so a radio button offers the choice of strand to display.

Scoring

MAST scores sequences using the following measures.

Position score calculation

The score for the match of a position in a sequence to a motif is computed by by summing the appropriate entry from each column of the position-dependent scoring matrix that represents the motif. Sequences shorter than one or more of the motifs are skipped.

Position p-value

The position p-value of a match is the probability of a single random subsequence of the length of the motif scoring at least as well as the observed match.

Sequence p-value

The sequence p-value of a score is defined as the probability of a random sequence of the same length containing some match with as good or better a score.

Combined p-value

The combined p-value of a sequence measures the strength of the match of the sequence to all the motifs and is calculated by

  1. finding the score of the single best match of each motif to the sequence (best matches may overlap),
  2. calculating the sequence p-value of each score,
  3. forming the product of the p-values,
  4. taking the p-value of the product.
Sequence E-value

The E-value of a sequence is the expected number of sequences in a random database of the same size that would match the motifs as well as the sequence does and is equal to the combined p-value of the sequence times the number of sequences in the database.