Resequencing. Search for polymorphisms in humans

← Term 3

Last updated: 27-11-2017.

Command table

CommandDescription
fastqc chr8.fastq FastQC reads a set of sequence files and produces from each one a quality control report consisting of a number of different modules, each one of which will help to identify a different potential type of problem in your data.
java -jar /usr/share/java/trimmomatic.jar SE -phred33 chr8.fastq chr8_trimmomatic.fastq TRAILING:20 MINLEN:50 This command cuts off parts with average quality less than 20 and removes reads that are less than 50 nucleotides long. Quality changes are presented in Fig.1 and 2
hisat2-build chr8.fasta processed Indexing of reference sequence
hisat2 -x processed -q chr8_trimmomatic.fastq -S chr8.sam --no-spliced-alignment --no-softclip --met-file hisat2.txt Alignment of reads and reference sequence
samtools view -b chr8.sam -o chr8.bam Conversion of alignment into binary format
samtools sort chr8.bam sorted_8.bam Alignment sorting by the coordinate in the reference
samtools index sorted_8.bam Indexing of sorted file
samtools mpileup -I sorted_8.bam -uf chr8.fasta -g -o poly.bcf Generating of file containing polymorphisms
bcftools call -cv poly.bcf -Ov -o snp.vcf Generating of file containing list of differences between reference and reads
perl /nfs/srv/databases/annovar/convert2annovar.pl -format vcf4 snp.vcf > snp.avinput Change file format to suitable for annovar
perl /nfs/srv/databases/annovar/annotate_variation.pl -out refgene -build hg19 snp.avinput /nfs/srv/databases/annovar/humandb/ Annotate SNPs in refgene
perl /nfs/srv/databases/annovar/annotate_variation.pl -filter -out dbtype -build hg19 -dbtype snp138 snp.avinput /nfs/srv/databases/annovar/humandb/ Annotate SNPs in dbsnp
perl /nfs/srv/databases/annovar/annotate_variation.pl -filter -dbtype 1000g2014oct_all -buildver hg19 -out 1000g snp.avinput /nfs/srv/databases/annovar/humandb/ Annotate SNPs in 1000 genomes
perl /nfs/srv/databases/annovar/annotate_variation.pl -regionanno -build hg19 -out Gwas -dbtype gwasCatalog snp.avinput /nfs/srv/databases/annovar/humandb/ Annotate SNPs in Gwas
perl /nfs/srv/databases/annovar/annotate_variation.pl snp.avinput /nfs/srv/databases/annovar/humandb/ -filter -dbtype clinvar_20150629 -buildver hg19 -out Clinvar Annotate SNPs in Clinvar

Table 1. Used commands and their description.

Part one. Reads preparation

Amount of reads didn't change dramatically: from 8367 to 8227 (140 were dropped). According to the command was used, reads whose length was less than 50 (after cutting off low quality parts) were dropped.

Figure 1. Per base sequence quality before trimming.

Figure 2. Per base sequence quality after trimming.

Part two. Reads mapping

Used commands are presented in Table 1. 8227 reads were unpaired, 32 of them were aligned 0 times, 8195 of them were aligned exactly one time (0 were aligned more than once).

8227 reads; of these:
  8227 (100.00%) were unpaired; of these:
    32 (0.39%) aligned 0 times
    8195 (99.61%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
99.61% overall alignment rate

Part three. SNP analysis

Used commands are presented in Table 1. Examples of polymorphisms are provided in Table 2.

Coordinate Polymorphism type REF/ALT DP Quality
27467821 SNP C/G 13 149.008
76402598 SNP A/G 2 6.19965
76453492 SNP T/A 1 6.98265

Table 2. Examples of polymorphisms.

[Download resulting table]

According to .vcf file 95 SNPs were found, but there were not any indels. Average quality of found SNPs is 66.7, average coverage is 14. Not bad indicators, I suggest. RefGene is dividing SNPs in four categories: intronic, exonic, intergenic, UTR3 and UTR5 (untranslated regions of mRNA 3' and 5' respectively). My ratios: 60 intronic, 5 exonic, 17 intergenic and 13 UTR3. SNPs affected such genes as CLU (Clusterin, Chaperone), HNF4G (Hepatocyte nuclear factor 4-gamma, zinc and DNA binding protein, involved in transcription regulation) and TRPS1 (provides instructions for making a protein that regulates the activity of many other genes, interacts with specific regions of DNA to turn off (repress) gene activity). SNPs caused 1 synonymous SNV in CLU gene, 1 synonymous and 2 nonsynonymous SNVs in HNF4G gene and 1 synonymous SNV in TPRS1 gene (list 'SNPs' in excel file). 77 SNPs have rs. The most of SNPs are frequent enough. According to clinical annotation, patient has high risks of Alzheimer's disease and maybe problems with urate levels.

© Simon Galkin, 2016