Popov Nikita


	Resequencing and SNP search.

Task 0: New directory was created: /nfs/srv/databases/ngs/nikita/ Files: chr7.fastq, chr7.fasta

Task 1: FASTQC usage for .fastq file analyse: command - fastqc chr7.fastq. Result is presented below:

Task 2: Trimmomatic was used to cut off end low quality (<20) nucleotides; >50 length reads were left. Command: java -jar /usr/share/java/trimmomatic.jar SE -phred33 chr7.fastq chr7.1.fastq TRAILING:20 MINLEN:50
Reads shorter than 50 nucleotides have to be excluded because they may contain adapters, which can lead to mistakes and inaccuracy. The end low quality nucleotides also have to be removed, because the reaction product peaks are not well resolved and may cause precision fall.

	Before (amount of reads - 3752)
	After (amount of reads - 3650)

Commands:

fastqc chr7.fastq	File analyse with FASTQC programm
java -jar /usr/share/java/trimmomatic.jar SE -phred33 chr7.fastq chr7.1.fastq TRAILING:20 MINLEN:50	Cut off end low quality (<20) nucleotides; >50 length reads are left
hisat2-build chr7.fasta chr7his	Indexes reference sequence
hisat2 --no-spliced-alignment --no-softclip -x chr7his -U chr7.1.fastq -S alignment.sam	Alignment of the reference sequence
samtools view alignment.sam -b -o alignment.bam	SAM -> BAM (binary)
samtools sort alignment.bam -T alignment.txt -o samsorted.bam	Sorts alignment file; writes temporary files to PREFIX.nnnn.bam
samtools index samsorted.bam	Indexes samsorted.bam
samtools mpileup -uf chr7.fasta samsorted.bam -o snp.bcf	Creates .bcf file (with polymorphisms)
bcftools call -cv snp.bcf -o SNP.vcf	Creates file with diefferences between reference and read in .vcf format
perl /nfs/srv/databases/annovar/convert2annovar.pl -format vcf4 snp_vcf.vcf > chr7.avinput	Creates file.avinput
perl /nfs/srv/databases/annovar/annotate_variation.pl -filter -out chr7.snp -build hg19 -dbtype snp138 chr7.avinput /nfs/srv/databases/annovar/humandb/	SNPs with rs
perl /nfs/srv/databases/annovar/annotate_variation.pl -out refgene -build hg19 chr7.avinput /nfs/srv/databases/annovar/humandb/	refGene annotation
perl /nfs/srv/databases/annovar/annotate_variation.pl -filter -out dbsnp -build hg19 -dbtype snp138 chr7.avinput /nfs/srv/databases/annovar/humandb/	dbsnp annotation
perl /nfs/srv/databases/annovar/annotate_variation.pl -filter -dbtype 1000g2014oct_all -buildver hg19 -out 1000g chr7.avinput /nfs/srv/databases/annovar/humandb/	1000 genomes annotation
perl /nfs/srv/databases/annovar/annotate_variation.pl -regionanno -build hg19 -out GWAS -dbtype gwasCatalog chr7.avinput /nfs/srv/databases/annovar/humandb/	GWAS annotation
perl /nfs/srv/databases/annovar/annotate_variation.pl -filter -out clinvar -dbtype clinvar_20150629 -buildver hg19 chr7.avinput /nfs/srv/databases/annovar/humandb/	Clinvar annotation

Task 3: Mapping:
For this task hisat2 commands from the previous table were used. As a result: 8 reads were not aligned, 3458 were aligned once and 184 were aligned more times

Task 4: Alignment analysis:
Used commands: samtools view alignment.sam -b -o alignment.bam samtools sort alignment.bam -T alignment.txt -o samsorted.bam samtools index samsorted.bam
Output file contains different information: read name, name of the reference sequence, mapping quality, coordinates of the read's alignment beginning on the reference sequence etc.

Task 5: SNP's and indels search:
Used commands: samtools mpileup -uf chr7.fasta samsorted.bam -o snp.bcf bcftools call -cv snp.bcf -o SNP.vcf
Polymorphysms received from the .vcf file are presented in the following table:

	Coordinate	Polymorphysm type	In reference	In read	Coverage depth	Quality
1	120979525	Insertion	CCT	CCTCTCT	4	143.973
2	134221694	Replacing	G	A	2	7.79993
3	134239978	Replacing	C	A	10	29.0123

Task 6: SNP annotation:
Used commands:
RefGene annotation:

Area	Description	SNP number
Inronic	SNP in introns	25
Exonic	SNP in exons	4
UTR3	SNP in 3'-noncoding area	2

dbSNP annotation: chr7.snp.hg19_snp138_dropped file contains the information about annotated SNP's. 28 SNP's have rs, 3 have not.

Clinvar annotation: There is no annotated SNP's :((.

1000 genomes annotation: the frequency of SNP's appearance was calculated: maximal one - 0.650759, minimal - 0.0239617.

GWAS annotation: some kind of SNPs can be associated with different types of diseases or physiological features, such as Type 2 diabetes, Bone mineral density, Longevity and Cortical thickness

Resequencing and SNP search.