Proteomes

***Table 1.*** Proteomes information
Name	Methylobacterium extorquens (strain ATCC 14718 / DSM 1338 / JCM 2805 / NCIMB 9133 / AM1)	Escherichia coli (strain K12)
Proteome ID	UP000009081	UP000000625
Amount of proteins	6233	4313
Total amount of amino acids residues	1855162	1351622

***Table 2.*** Amino acids residues
Residue (one-letter code)	Methylobacterium extorquens (strain ATCC 14718 / DSM 1338 / JCM 2805 / NCIMB 9133 / AM1)	Escherichia coli (strain K12)	Percentage difference, %
A	13.97%	9.511%	4,459
L	10.20%	10.67%	-0.47
G	9.090%	7.370%	1,72
R	8.317%	5.518%	2.799
V	7.534%	7.073%	0.461
P	5.871%	4.428%	1,443
E	5.776%	5.765%	0.011
D	5.591%	5.149%	0,442
T	5.306%	5.394%	-0.088
S	5.194%	5.796%	-0.602
I	4.294%	6.009%	-1.715
F	3.356%	3.892%	-0,536
Q	2.775%	4.443%	-1,668
K	2.541%	4.405%	-1,864
N	2.091%	3.936%	-1,845
M	2.048%	2.822%	-0,774
Y	1.973%	2.844%	-0,871
H	1.950%	2.267%	-0,317
W	1.256%	1.531%	-0,275
C	0.840%	1.160%	-0,32
U	0	0.00022%	-0.00022

Acording to the Table 2 it can be seen that 3 most widespread amino acids are Leucine, Alanine and Glycine. Most likely this is due to the fact that these amino acids play a "skeletal" role in almost every protein. The rarest amino acids are Histidine, Tryptophan and Cysteine. The biggest percentage difference in favour of Methylobacterium extorquens is observed in Alanine (4.459%); the biggest percentage difference in favour of E.coli - in Lysine (1.864%)

The information for this part was obtained using the help command and other standard tools of the Kodomo machine.

***Table 3.*** Wordcount and compseq comparison
Issue	wordcount	compseq
Function	Count and extract unique words in molecular sequence(s)	Calculate the composition of unique words in sequences
Description	Counts and extracts all possible unique sequence words of a specified size in one or more DNA sequences. It writes an output file giving all possible words for that word size with a count of each word in the input sequences. Optionally, only words occuring a specified minimum number of times are reported.	Calculates the composition of words of a specified length (dimer, trimer etc) in the input sequence(s). The word length is user-specified. The unique sequences (words), their observed count, observed frequency, expected frequency and (observed / expected) frequency are written to the output file. The (observed / expected) frequency highlights any words with unusually high (or low) occurence in the input sequences.
How fast	Relatively slow	Fast (~7 times faster, more preferable for processing large amount of information)
Unique options	-mincount - allows to select minimum word count to report (integer 1 or more)	-ignorebz (boolean) allows to ignore not commonly used codes for Asparagine or Aspartic acid (B) and Glutamine or Glutamic acid (Z); -reverse (boolean) allows to count words in the reverse complement of a nucleic sequence; -zerocount (boolean) helps to minimise output by not displaying the words with a zero count.

MARGARITA VOROBEVA

Proteomes of Escherichia coli (strain K12) and Methylobacterium extorquens (strain ATCC 14718 / DSM 1338 / JCM 2805 / NCIMB 9133 / AM1)

Wordcount and compseq comparison