MARGARITA VOROBEVA



Proteomes of Escherichia coli (strain K12) and Methylobacterium extorquens (strain ATCC 14718 / DSM 1338 / JCM 2805 / NCIMB 9133 / AM1)

Table 1. Proteomes information
Name Methylobacterium extorquens (strain ATCC 14718 / DSM 1338 / JCM 2805 / NCIMB 9133 / AM1) Escherichia coli (strain K12)
Proteome ID UP000009081 UP000000625
Amount of proteins 6233 4313
Total amount of amino acids residues 1855162 1351622
Table 2. Amino acids residues
Residue (one-letter code) Methylobacterium extorquens (strain ATCC 14718 / DSM 1338 / JCM 2805 / NCIMB 9133 / AM1) Escherichia coli (strain K12) Percentage difference, %
A 13.97% 9.511% 4,459
L 10.20% 10.67% -0.47
G 9.090% 7.370% 1,72
R 8.317% 5.518% 2.799
V 7.534% 7.073% 0.461
P 5.871% 4.428% 1,443
E 5.776% 5.765% 0.011
D 5.591% 5.149% 0,442
T 5.306% 5.394% -0.088
S 5.194% 5.796% -0.602
I 4.294% 6.009% -1.715
F 3.356% 3.892% -0,536
Q 2.775% 4.443% -1,668
K 2.541% 4.405% -1,864
N 2.091% 3.936% -1,845
M 2.048% 2.822% -0,774
Y 1.973% 2.844% -0,871
H 1.950% 2.267% -0,317
W 1.256% 1.531% -0,275
C 0.840% 1.160% -0,32
U 0 0.00022% -0.00022
Acording to the Table 2 it can be seen that 3 most widespread amino acids are Leucine, Alanine and Glycine. Most likely this is due to the fact that these amino acids play a "skeletal" role in almost every protein. The rarest amino acids are Histidine, Tryptophan and Cysteine. The biggest percentage difference in favour of Methylobacterium extorquens is observed in Alanine (4.459%); the biggest percentage difference in favour of E.coli - in Lysine (1.864%)

Wordcount and compseq comparison

The information for this part was obtained using the help command and other standard tools of the Kodomo machine.

Table 3. Wordcount and compseq comparison
Issue wordcount compseq
Function Count and extract unique words in molecular sequence(s) Calculate the composition of unique words in sequences
Description Counts and extracts all possible unique sequence words of a specified size in one or more DNA sequences. It writes an output file giving all possible words for that word size with a count of each word in the input sequences. Optionally, only words occuring a specified minimum number of times are reported.

Calculates the composition of words of a specified length (dimer, trimer etc) in the input sequence(s). The word length is user-specified. The unique sequences (words), their observed count, observed frequency, expected frequency and (observed / expected) frequency are written to the output file. The (observed / expected) frequency highlights any words with unusually high (or low) occurence in the input sequences.

How fast Relatively slow Fast (~7 times faster, more preferable for processing large amount of information)
Unique options -mincount - allows to select minimum word count to report (integer 1 or more)

-ignorebz (boolean) allows to ignore not commonly used codes for Asparagine or Aspartic acid (B) and Glutamine or Glutamic acid (Z); -reverse (boolean) allows to count words in the reverse complement of a nucleic sequence; -zerocount (boolean) helps to minimise output by not displaying the words with a zero count.