Proteome

Information table

Organism name Escherichia coli Clostridium thermocellum
Proteome ID UP000000625 UP000002145
Number of sequences 4313 3105
Number of aminoacids 1351630 (with unknown (X)) 1033401

Aminoacids table

Aminoacids Clostridium thermocellum Escherichia coli Difference
Total amount 100% 100% 0%
L 8,65% 10,68% -2,02%
I 8,52% 6,01% 2,51%
K 8,05% 4,41% 3,65%
E 7,60% 5,77% 1,83%
V 7,16% 7,07% 0,09%
G 6,72% 7,37% -0,65%
A 6,41% 9,51% -3,10%
S 6,05% 5,80% 0,25%
D 5,73% 5,15% 0,59%
N 5,39% 3,94% 1,45%
T 5,03% 5,39% -0,36%
R 4,28% 5,52% -1,24%
F 4,26% 3,89% 0,37%
Y 4,21% 2,84% 1,36%
P 3,40% 4,43% -1,02%
Q 2,55% 4,44% -1,89%
M 2,52% 2,82% -0,30%
H 1,41% 2,27% -0,86%
C 1,17% 1,16% 0,01%
W 0,87% 1,53% -0,66%

The most frequent aminoacid for every bacterium is Leucine. Second and third place in Clostridium thermocellum is for Isoleucine and Lysine, while in E.coli on the second place is Alanine, and on the third - Glycine. Alanine in E.coli occurs in 3,10% more often - and it's the biggest difference in occurrencing, and Lysine in Clostridium thermocellum occurs in 3,65% more often.

compseq vs. wordcount

Each programme has it's own pluses and minuses. 'compseq' is much more functional, it shows observed frequencies in absolute values and in percents, shows expected frequencies in percents and even show the ratio observed frequency to expected one. Also, it allows us to count frequencies of words with different sizes, and set up some tools, like reverse search of words. 'wordcount' has got less functions, it shows only absolute values of frequencies and i can't find an easy way to count frequencies of words with sizes >1, but it sorts words not by alphabet, like 'compseq', but by frequencies, which makes 'wordcount' tool more usefull for that practice.

python1

python2

excel


© Gumerov Ruslan, 2017