Resequencing and SNP search.

		
Task 0: New directory was created: /nfs/srv/databases/ngs/nikita/ Files: chr7.fastq, chr7.fasta
		

		
Task 1: FASTQC usage for .fastq file analyse: command - fastqc chr7.fastq. Result is presented below:
		

		
Task 2: Trimmomatic was used to cut off end low quality (<20) nucleotides; >50 length reads were left. Command: java -jar /usr/share/java/trimmomatic.jar SE -phred33 chr7.fastq chr7.1.fastq TRAILING:20 MINLEN:50
Reads shorter than 50 nucleotides have to be excluded because they may contain adapters, which can lead to mistakes and inaccuracy. The end low quality nucleotides also have to be removed, because the reaction product peaks are not well resolved and may cause precision fall.
		
Before (amount of reads - 3752)
After (amount of reads - 3650)
		
Commands:
		

fastqc chr7.fastq File analyse with FASTQC programm
java -jar /usr/share/java/trimmomatic.jar SE -phred33 chr7.fastq chr7.1.fastq TRAILING:20 MINLEN:50 Cut off end low quality (<20) nucleotides; >50 length reads are left
hisat2-build chr7.fasta chr7his Indexes reference sequence
hisat2 --no-spliced-alignment --no-softclip -x chr7his -U chr7.1.fastq -S alignment.sam Alignment of the reference sequence
samtools view alignment.sam -b -o alignment.bam SAM -> BAM (binary)
samtools sort alignment.bam -T alignment.txt -o samsorted.bam Sorts alignment file; writes temporary files to PREFIX.nnnn.bam
samtools index samsorted.bam Indexes samsorted.bam
samtools mpileup -uf chr7.fasta samsorted.bam -o snp.bcf Creates .bcf file (with polymorphisms)
bcftools call -cv snp.bcf -o SNP.vcf Creates file with diefferences between reference and read in .vcf format
perl /nfs/srv/databases/annovar/convert2annovar.pl -format vcf4 snp_vcf.vcf > chr7.avinput Creates file.avinput
perl /nfs/srv/databases/annovar/annotate_variation.pl -filter -out chr7.snp -build hg19 -dbtype snp138 chr7.avinput /nfs/srv/databases/annovar/humandb/ SNPs with rs
perl /nfs/srv/databases/annovar/annotate_variation.pl -out refgene -build hg19 chr7.avinput /nfs/srv/databases/annovar/humandb/ refGene annotation
perl /nfs/srv/databases/annovar/annotate_variation.pl -filter -out dbsnp -build hg19 -dbtype snp138 chr7.avinput /nfs/srv/databases/annovar/humandb/ dbsnp annotation
perl /nfs/srv/databases/annovar/annotate_variation.pl -filter -dbtype 1000g2014oct_all -buildver hg19 -out 1000g chr7.avinput /nfs/srv/databases/annovar/humandb/ 1000 genomes annotation
perl /nfs/srv/databases/annovar/annotate_variation.pl -regionanno -build hg19 -out GWAS -dbtype gwasCatalog chr7.avinput /nfs/srv/databases/annovar/humandb/ GWAS annotation
perl /nfs/srv/databases/annovar/annotate_variation.pl -filter -out clinvar -dbtype clinvar_20150629 -buildver hg19 chr7.avinput /nfs/srv/databases/annovar/humandb/ Clinvar annotation

		
Task 3: Mapping:
For this task hisat2 commands from the previous table were used. As a result: 8 reads were not aligned, 3458 were aligned once and 184 were aligned more times
		

		
Task 4: Alignment analysis:
Used commands:
  • samtools view alignment.sam -b -o alignment.bam
  • samtools sort alignment.bam -T alignment.txt -o samsorted.bam
  • samtools index samsorted.bam
Output file contains different information: read name, name of the reference sequence, mapping quality, coordinates of the read's alignment beginning on the reference sequence etc.
		

		
Task 5: SNP's and indels search:
Used commands:
  • samtools mpileup -uf chr7.fasta samsorted.bam -o snp.bcf
  • bcftools call -cv snp.bcf -o SNP.vcf
Polymorphysms received from the .vcf file are presented in the following table:
		
Coordinate Polymorphysm type In reference In read Coverage depth Quality
1 120979525 Insertion CCT CCTCTCT 4 143.973
2 134221694 Replacing G A 2 7.79993
3 134239978 Replacing C A 10 29.0123

		
Task 6: SNP annotation:
Used commands:
  • perl /nfs/srv/databases/annovar/convert2annovar.pl -format vcf4 snp_vcf.vcf > chr7.avinput. Result: A total of 31 locus in VCF file passed QC threshold, representing 31 SNPs (24 transitions and 7 transversions) and 0 indels/substitutions.
  • perl /nfs/srv/databases/annovar/annotate_variation.pl -filter -out chr7.snp -build hg19 -dbtype snp138 chr7.avinput /nfs/srv/databases/annovar/humandb/. Result: amount of rs: 28.
RefGene annotation:
		
Area Description SNP number
Inronic SNP in introns 25
Exonic SNP in exons 4
UTR3 SNP in 3'-noncoding area 2

		
dbSNP annotation: chr7.snp.hg19_snp138_dropped file contains the information about annotated SNP's. 28 SNP's have rs, 3 have not.
		
		
Clinvar annotation: There is no annotated SNP's :((.
		
		
1000 genomes annotation: the frequency of SNP's appearance was calculated: maximal one - 0.650759, minimal - 0.0239617.
		
		
GWAS annotation: some kind of SNPs can be associated with different types of diseases or physiological features, such as Type 2 diabetes, Bone mineral density, Longevity and Cortical thickness
		


© Popov Nikita 2016