MEME motifs are represented by position-specific probability matrices
that specify the probability of each possible letter appearing at each
possible position in an occurrence of the motif. These are displayed
as "sequence LOGOS", containing stacks of letters at each position
in the motif. The total height of the stack is the "information
content" of that position in the motif in bits. The height of the
individual letters in a stack is the probability of the letter at that
position multiplied by the total information content of the stack.
Note:
The MEME LOGO differs from those produced by the
Weblogo program
because a small-sample correction is NOT applied.
However, MEME LOGOs in PNG and encapsulated postscript (EPS) formats
with small-sample correction (SSC) are available by clicking
on one of the links named "With SSC" (EPS or PNG) under
Download LOGO.
The MEME LOGOs without small sample correction are similarly available.
Error bars are included in the LOGOs with small-sample correction.
The information content of each motif position is computed
as described in the paper by Schneider and Stephens,
"Sequence Logos: A New Way to Display Consensus Sequences" but
the small-sample correction, e(n),
is set to zero for the LOGO displayed in the MEME output.
The corrected information content of position i is given by
R(i) for amino acids = log2(20) - (H(i) + e(n)) (1a)
R(i) for nucleic acids = 2 - (H(i) + e(n)) (1b)
where H(i) is the entropy of position i,
H(l) = - (Sum f(a,i) * log2[ f(a,i) ]). (2)
Here, f(a,i) is the frequency of base or amino acid a
at position i, and e(n) is the
small-sample correction for an alignment of n letters.
The height of letter a in column i is given by
height = f(a,i) * R(i) (3)
The approximation for the small-sample correction, e(n),
is given by:
e(n) = (s-1) / (2 * ln(2) * n), (4)
where s is 4 for nucleotides, 20 for amino acids, and
n is the number of sequences in the alignment.
The letters in the logos are colored as follows.
For DNA sequences, the letter categories contain one letter
each. For proteins, the categories are based on the biochemical properties
of the various amino acids. The categories and their colors are:
NUCLEIC ACIDS |
COLOR |
A |
RED |
C |
BLUE |
G |
ORANGE |
T |
GREEN |
AMINO ACIDS |
COLOR |
PROPERTIES |
A, C, F, I, L, V, W and M |
BLUE |
Most hydrophobic[Kyte and Doolittle, 1982] |
NQST |
GREEN |
Polar, non-charged, non-aliphatic residues |
DE |
MAGENTA |
Acidic |
KR |
RED |
Positively charged |
H |
PINK |
G |
ORANGE |
P |
YELLOW |
Y |
TURQUOISE |
J. Kyte and R. Doolittle, 1982.
"A Simple Method for Displaying the Hydropathic Character of a Protein",
J. Mol Biol. 157, 105-132.
Note: the "text" output format of
MEME preserves the historical MEME format where LOGOS
are replaced by a simplified probability matrix,
a relative entropy plot, and a multi-level
consensus sequence.