Resequencing and SNP search.

Task 0: New directory was created: /nfs/srv/databases/ngs/nikita/ Files: chr7.fastq, chr7.fasta

Task 1: FASTQC usage for .fastq file analyse: command - fastqc chr7.fastq. Result is presented below:

Task 2: Trimmomatic was used to cut off end low quality (<20) nucleotides; >50 length reads were left. Command: java -jar /usr/share/java/trimmomatic.jar SE -phred33 chr7.fastq chr7.1.fastq TRAILING:20 MINLEN:50
Reads shorter than 50 nucleotides have to be excluded because they may contain adapters, which can lead to mistakes and inaccuracy. The end low quality nucleotides also have to be removed, because the reaction product peaks are not well resolved and may cause precision fall.
Before (amount of reads - 3752)
After (amount of reads - 3650)

Task 3: Mapping:
For this task hisat2 commands from the previous table were used. As a result: 8 reads were not aligned, 3458 were aligned once and 184 were aligned more times

Task 4: Alignment analysis:
Used commands:
  • samtools view alignment.sam -b -o alignment.bam
  • samtools sort alignment.bam -T alignment.txt -o samsorted.bam
  • samtools index samsorted.bam
Output file contains different information: read name, name of the reference sequence, mapping quality, coordinates of the read's alignment beginning on the reference sequence etc.

Task 5: SNP's and indels search:
Used commands:
  • samtools mpileup -uf chr7.fasta samsorted.bam -o snp.bcf
  • bcftools call -cv snp.bcf -o SNP.vcf
Polymorphysms received from the .vcf file are presented in the following table:
Coordinate Polymorphysm type In reference In read Coverage depth Quality
1 120979525 Insertion CCT CCTCTCT 4 143.973
2 134221694 Replacing G A 2 7.79993
3 134239978 Replacing C A 10 29.0123

Task 6: SNP annotation:
Used commands:
  • perl /nfs/srv/databases/annovar/ -format vcf4 snp_vcf.vcf > chr7.avinput. Result: A total of 31 locus in VCF file passed QC threshold, representing 31 SNPs (24 transitions and 7 transversions) and 0 indels/substitutions.
  • perl /nfs/srv/databases/annovar/ -filter -out chr7.snp -build hg19 -dbtype snp138 chr7.avinput /nfs/srv/databases/annovar/humandb/. Result: amount of rs: 28.
RefGene annotation:
Area Description SNP number
Inronic SNP in introns 25
Exonic SNP in exons 4
UTR3 SNP in 3'-noncoding area 2

dbSNP annotation: chr7.snp.hg19_snp138_dropped file contains the information about annotated SNP's. 28 SNP's have rs, 3 have not.
Clinvar annotation: There is no annotated SNP's :((.
1000 genomes annotation: the frequency of SNP's appearance was calculated: maximal one - 0.650759, minimal - 0.0239617.
GWAS annotation: some kind of SNPs can be associated with different types of diseases or physiological features, such as Type 2 diabetes, Bone mineral density, Longevity and Cortical thickness

