Home page Term 1 Term 2 Term 3 About me Faculty website |
EMBOSSExercises targeting knowledge of EMBOSS commandsFor these tasks random sequences were generated using makenucseq:makenucseq -amount 1 -length 210 seq1.fasta 1) Task: Compose a file containing sequences from multiple .fasta files Input: seq1.fasta seq2.fasta seq3.fasta Code: seqret "seq*.fasta" multiseq1.fastaOutput: multiseq1.fasta 2) Task: Split one .fasta file containing multiple sequences into corresponding single-sequence .fasta files Input: multiseq1.fasta Code: seqretsplit multiseq1.fasta seq1.fastaOutput: seq1.fasta seq2.fasta seq3.fasta 4) Task: Translate a nucleotide sequence (.fasta) into an amino acid sequence (.fasta) in the first reading frame and with a given table of genetic code Input: multiseq1.fasta Code: transeq multiseq1.fasta protseq1.fasta -frame 1 -table 0Here other values for the -table option could have been used. 0 means standard code. Output: protseq1.fasta Here, asterisks denote stop codons. 5) Task: Display all ORFs with a given minimum size Input: multiseq1.fasta Code: getorf multiseq1.fasta orf1.fasta -minsize 10 -find 2Here the minimum size was set to 10, but other values could have been used. -find 3 means that nucleic acid sequences including start codon but excluding stop codon are to be displayed. Output: orf1.fasta For the following task, an an alignment was generated using water: water seq1.fasta seq2.fasta -aformat fasta -outfile seq12.fasta 6) Task: Change alignment file format from .fasta to .msf Input: seq12.fasta Code: aligncopy seq12.fasta -aformat msf seq12.msfOutput: seq12.msf 7) Task: Display only the names and the number of identical nucleotides between the second and all other aligned sequences in the form of a file. Input: seq12.fasta Code: infoalign seq12.fasta seq12.infoalign -only -name -idcountOutput: seq12.infoalign 10) Task: Shuffle a nucleotide sequence Input: seq1.fasta Code: shuffleseq seq1.fasta seq1_shuffled.fastaOutput: seq1_shuffled.fasta 11) Task: Create three random nucleotide sequences with the lengths of 100 Input: N/A Code: makenucseq -amount 3 length 100 randseq3.fastaOutput: randseq3.fasta 12) Task: Display codon abundance analysis for a CDS Input: orf1.fasta Code: cusp orf1.fasta orf1.cuspOutput: orf1.cusp 14) Task: Delete all gaps from an alignment Input: seq12.fasta Code: degapseq seq12.fasta seq12_unaligned.fastaOutput: seq12_unaligned.fasta Python/bash scriptIn this section the script solving task 1 is presented. It uses this file as a database for blastn. It is the complete genome sequence of Chlamydia trachomatis. It requires two arguments: the first one is the amount of random sequences and the second one is their length. It is critical that the genome file be called FM872306.fasta and located in the same directory from where the script is launched. Python script |