< 3rd term

Sanger sequencing

Last update on the 15th of October, 2017

Here we handle chromatograms derieved from Sanger sequencing proccess. The work and analysis was done with Chromas lite software.

List of downloads
File Link
Forward string Ae4_18SII_F_G05_WSBS-Seq-1-08-15.fasta
Reverse string Ae4_18SII_R_G06_WSBS-Seq-1-08-15.fasta
Jalview project project.jvp
Forward trace Ae4_18SII_F_G05_WSBS-Seq-1-08-15.ab1
Reverse trace Ae4_18SII_R_G06_WSBS-Seq-1-08-15.ab1

General features of traces

Traces of forward and reverse strings differ in quality. Forward string unreadable flanks are 5'-1—17-3' and 5'-792—931-3'. Readable flanking regions are also of stretched peaks, but easy for base calling. Along the whole readable region of trace the medium signal-to-noise ratio is observed with frequent unidentified bases. In several places polymorphisms are suggested, but ambiguity resolution was performed through comparison with reverse string. All these points state passable readability and suggest numerous polymerase errors or dye blobs to occur.

Reverse string trace is of better quality than the forward one. Unreadable endings are 3'-1—109-5' and 3'-930—948-5'. Along the readable part traces exhibit high signal-to-noise ratio and equally wide peaks. Few dye blobs are found and no polymorphisms observed.

Left (5') flanking regions of both chromatograms show high quality peaks at the point from the 18th base, which can help to call bases in right (3') flanking regions of complement strings. In both traces noise and signal levels were uniform, but purines exhibited occidentally more intensive peaks in several points.

Processing traces

Traces were aligned by eye and unidentified bases were resolved, so as other problem positions. Then, two reads were aligned through needle EMBOSS program and exported in Jalview (fig. 1).

Fig. 1. Alignment of two strings in Jalview coloured by nucleotide.

The consensus sequence (below) was obtained through water program.

>Consensus/1-908 Percentage Identity Consensus 
ATGCTTGTCTCAAGATTAAGCCATGCATGTCTAAGTACATACCTTTACACGGTGAAACCGCGAATGGCTCAT
TAAATCAGTTATGGTTCCTTAGATCGTACAATCCTACTTGGATAACTGTAGTAATTCTAGAGCTAATACATG
CAACCAAGCTCCGACCTTCTGGGGAAGAGCGCTTTTATTAGATCAAAGCCAATCGGGCCGCAAGGTCCGTCC
TATTGGTGACTCTGGATAACTTTGTGCTGATCGCATGGCCTTGTGCCGGCGACGTATCTTTCAAATGTCTGC
CCTATCAACTTTCGACGGTAAGTGATATGCTTACCGTGGTTGTAACGGGTAACGGGGAATTAGGGTTCGATT
CCGGAGAGGGAGCATGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCCTGG
CACGGGGAGGTAGTGACGAAAAATAACAATACGGGACTCTTTCGAGGCCCCGTAATTGGAATGAGTACACTT
TAAATCCTTTAACGAGGATCCATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAG
CGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATCTCGGGTCCAGGCTTGTGGTCCACCTCGCG
GTGGTTACTGCTCGTCCTGACCTACCTCCCAGTTTTCCCTTGGTGCTCTTGATTGAGTGTCTCGGGTGGCTG
GAACGTTTACTTTGAAAAAATTAGAGTGTTCAAAGCAGGCAGCTCGCCTGAATAATGGTGCATGGAATAATG
GAATAGGACCTCGGTTCTATTTTGTTGGTTTTCGGAACTTGAGGTAATGATTAAGAGGGACAGACGGGGGCA
TTCGTATTACGGTGTTAGAGGTGAAATTCTTGGATCGCCGTAAG

Problem positions

Numerous problems in base calling occured through automatic software. Most of them were solved through comparison to the complement string. Some of them are shown below (fig. 2 — 6).

Fig. 2. Peaks of indentical bases merged in wide one.
Right area of forward trace close to the unreadable region consists of stretched peaks. 6A-site was not recognised by automatic software (unintensive peaks, high noise), but identified due to complement string.
Fig. 3. Noise interfere with signal.
Two spots in forward string were called as N by basecaller. Alone they seem to present polymorphism, but in comparison with complement string bases are identified distinctly. Similarity of noise curves suggests similar polymerase errors to take place.
Fig. 4. Noise interfere with signal.
Two spots in forward and reverse string take place. The resolution is based on complement bases.
Fig. 5. Noise interfere with signal.
The problem occurs in reverse string and resolves the same way as in fig. 4.
Fig. 6. Two peaks merged in one.
The left region of reverse trace close to unreadable area shows G-plateu and T-pass along with broadened peaks and rather intensive noise. The fashion of peaks is common for 3'-close area.

Overall, the prevailing defects in particular traces are merged peaks and high noise in several spots. The forward trace is of more noise than the reverse one.

Low-quality traces

Fully unreadable traces take place in work. The causes are device errors, polymerase errors and hence PCR bugs, protocol breach and so on. Sample trace is shown in fig. 7.

Fig. 7. Trace of low quality, sample region.
NN_g10.ab1 file is an example of inaccurate sequencing. Base calling is mpossible as signal can't be distinguished from noise. Color traces superpose with uneven step, which suggests either device to be set up wrong or two different sequences to be present. Another point is high repetivity of A and C peaks along with differing peak intensities between bases. Taken together, these observations may imply device failure.