Home page
Term 1
Term 2
Term 3
About me
Faculty website

EMBOSS

Exercises targeting knowledge of EMBOSS commands

For these tasks random sequences were generated using makenucseq:
makenucseq -amount 1 -length 210 seq1.fasta

1)
Task: Compose a file containing sequences from multiple .fasta files
Input:
seq1.fasta
seq2.fasta
seq3.fasta
Code:
seqret "seq*.fasta" multiseq1.fasta
Output:
multiseq1.fasta

2)
Task: Split one .fasta file containing multiple sequences into corresponding single-sequence .fasta files
Input:
multiseq1.fasta
Code:
seqretsplit multiseq1.fasta seq1.fasta
Output:
seq1.fasta
seq2.fasta
seq3.fasta

4)
Task: Translate a nucleotide sequence (.fasta) into an amino acid sequence (.fasta) in the first reading frame and with a given table of genetic code
Input:
multiseq1.fasta
Code:
transeq multiseq1.fasta protseq1.fasta -frame 1 -table 0
Here other values for the -table option could have been used. 0 means standard code.
Output:
protseq1.fasta
Here, asterisks denote stop codons.

5)
Task: Display all ORFs with a given minimum size
Input:
multiseq1.fasta
Code:
getorf multiseq1.fasta orf1.fasta -minsize 10 -find 2
Here the minimum size was set to 10, but other values could have been used. -find 3 means that nucleic acid sequences including start codon but excluding stop codon are to be displayed.
Output:
orf1.fasta

For the following task, an an alignment was generated using water:
water seq1.fasta seq2.fasta -aformat fasta -outfile seq12.fasta

6)
Task: Change alignment file format from .fasta to .msf
Input:
seq12.fasta
Code:
aligncopy seq12.fasta -aformat msf seq12.msf
Output:
seq12.msf

7)
Task: Display only the names and the number of identical nucleotides between the second and all other aligned sequences in the form of a file.
Input:
seq12.fasta
Code:
infoalign seq12.fasta seq12.infoalign -only -name -idcount
Output:
seq12.infoalign

10)
Task: Shuffle a nucleotide sequence
Input:
seq1.fasta
Code:
shuffleseq seq1.fasta seq1_shuffled.fasta
Output:
seq1_shuffled.fasta

11)
Task: Create three random nucleotide sequences with the lengths of 100
Input:
N/A
Code:
makenucseq -amount 3 length 100 randseq3.fasta
Output:
randseq3.fasta

12)
Task: Display codon abundance analysis for a CDS
Input:
orf1.fasta
Code:
cusp orf1.fasta orf1.cusp
Output:
orf1.cusp

14)
Task: Delete all gaps from an alignment
Input:
seq12.fasta
Code:
degapseq seq12.fasta seq12_unaligned.fasta
Output:
seq12_unaligned.fasta

Python/bash script

In this section the script solving task 1 is presented. It uses this file as a database for blastn. It is the complete genome sequence of Chlamydia trachomatis. It requires two arguments: the first one is the amount of random sequences and the second one is their length. It is critical that the genome file be called FM872306.fasta and located in the same directory from where the script is launched. Python script

© Stanislav Tikhonov, 2018