The information for this part was obtained using the help command and other standard tools of the Kodomo machine.
Table 3. Wordcount and compseq comparison
Issue |
wordcount |
compseq |
Function |
Count and extract unique words in molecular sequence(s) |
Calculate the composition of unique words in sequences |
Description |
Counts and extracts all possible unique sequence words of a specified size in one or more DNA sequences. It writes an output file giving all possible words for that word size with a count of each word in the input sequences. Optionally, only words occuring a specified minimum number of times are reported. |
Calculates the composition of words of a specified length (dimer, trimer etc) in the input sequence(s). The word length is user-specified. The unique sequences (words), their observed count, observed frequency, expected frequency and (observed / expected) frequency are written to the output file. The (observed / expected) frequency highlights any words with unusually high (or low) occurence in the input sequences. |
How fast |
Relatively slow |
Fast (~7 times faster, more preferable for processing large amount of information) |
Unique options |
-mincount - allows to select minimum word count to report (integer 1 or more) |
-ignorebz (boolean) allows to ignore not commonly used codes for Asparagine or Aspartic acid (B) and Glutamine or Glutamic acid (Z);
-reverse (boolean) allows to count words in the reverse complement of a nucleic sequence;
-zerocount (boolean) helps to minimise output by not displaying the words with a zero count. |