Part 10. Make your new directory and copy your .fastq and .fasta files thereNew address: /nfs/srv/databases/ngs/sophia.veselova/ Files: chr7.fastq, chr7.fasta1. Use FASTQC to analyse .fastq file
|
|
Terminal command:
java -jar /usr/share/java/trimmomatic.jar SE -phred33 chr7.fastq chr7_1.fastq TRAILING:20 MINLEN:50
Command | Notes |
hisat2-build chr7.fasta chr7_neue | Indexing of a reference sequence |
hisat2 --no-spliced-alignment --no-softclip -x chr7_neue -U chr7_1.fastq -S alignment.sam | Alignment of the reference sequence --no-softclip - without soft sequence clipping (when bases in 5' and 3' are not the part of the alignment BUT they haven't been removed from the read sequence) --no-spliced-alignment - no gaps in alignment -x - 'basename' for all indexed files -U - file with reads -S - SAM output file |
Output: 3650 reads; of these: 3650 (100.00%) were unpaired; of these: 8 (0.22%) aligned 0 times 3458 (94.74%) aligned exactly 1 time 184 (5.04%) aligned >1 times 99.78% overall alignment rate |
As we can see from the output, 8 reads were not aligned, and 184 were aligned more than once. From the total of 3650 reads 3458 were aligned exactly one time. |
Command | Notes |
samtools view alignment.sam -b -o alignment.bam | SAM -> BAM (binary) -b - BAM -o - output file |
samtools sort alignment.bam -T alignment.txt -o samsorted.bam | Sorts alignment file; writes temporary files to PREFIX.nnnn.bam -T - temporary file -o - output file |
samtools index samsorted.bam | Indexing of samsorted.bam |
samtools mpileup -uf chr7.fasta samsorted.bam -o snp.bcf bcftools call -cv snp.bcf -o SNP.vcf
# | Coordinates | Type | Reference | Read | Depth | Quality |
1 | 120965718 | Insertion | taa | tAAaa | 2 | 16.5627 |
2 | 120978918 | Insertion | ATT | ATTT | 8 | 105.467 |
3 | 134250322 | Replacing | A | C | 68 | 225.009 |
perl /nfs/srv/databases/annovar/convert2annovar.pl -format vcf4 snp_neue.vcf > chr7.avinput | Result: A total of 31 locus in VCF file passed QC threshold, representing 31 SNPs (24 transitions and 7 transversions) and 0 indels/substitutions Note: transition - interchange of purines (A-G) or pyrimidines (T-C) transversion - interchage of purine for pyrimidine
perl /nfs/srv/databases/annovar/annotate_variation.pl -filter -out chr7.snp -build hg19 | Result: amount of rs: 28 -dbtype snp138 chr7.avinput /nfs/srv/databases/annovar/humandb/
Database | Input | Output | Notes |
Refgene | perl /nfs/srv/databases/annovar/annotate_variation.pl -out refgene -build hg19 chr7.avinput /nfs/srv/databases/annovar/humandb/ | refgene.exonic_variant_function refgene.log refgene.variant_function | Groups: exonic/intronic/UTR3 25 intronic SNPs 4 exonic SNPs, 2 UTR3 SNPs - (UTR3 means 3' noncoding region) |
dbsnp | perl /nfs/srv/databases/annovar/annotate_variation.pl -filter -out dbsnp -build hg19 -dbtype snp138 chr7.avinput /nfs/srv/databases/annovar/humandb/ | dbsnp.hg19_snp138_dropped dbsnp.hg19_snp138_filtered dbsnp.log | 28 rs for 31 SNPs (so, 28 SNPs were annotated) |
1000 genomes | perl /nfs/srv/databases/annovar/annotate_variation.pl -filter -dbtype 1000g2014oct_all -buildver hg19 -out 1000g chr7.avinput /nfs/srv/databases/annovar/humandb/ | 1000g.hg19_ALL.sites.2014_10_dropped 1000g.hg19_ALL.sites.2014_10_filtered 1000g.log | There is frequency data for each SNP (except four of them).
Maximum value: 0,650759 Average: 0,32507983 Median: 0,336462 |
GWAS | perl /nfs/srv/databases/annovar/annotate_variation.pl -regionanno -build hg19 -out GWAS -dbtype gwasCatalog chr7.avinput /nfs/srv/databases/annovar/humandb/ | GWAS.hg19_gwasCatalog GWAS.log | See Table below (Type 2 diabetes, Bone mineral density, Longevity and Cortical thickness). from the received data we can assume that some kind of SNPs can be associated with different kind of diseases or physiological features. |
Clinvar | perl /nfs/srv/databases/annovar/annotate_variation.pl -filter -out clinvar -dbtype clinvar_20150629 -buildver hg19 chr7.avinput /nfs/srv/databases/annovar/humandb/ | clinvar.hg19_clinvar_20150629_dropped clinvar.hg19_clinvar_20150629_filtered clinvar.log | .dropped is empty |
Coordinate | SNP | Quality | DP | refgene | dbsnp | 1000 genomes | GWAS | Clinvar |
---|---|---|---|---|---|---|---|---|
100487721 | G T | 37.5136 | 4 | UTR3 ACHE(NM_001302622:c.*114C>A,NM_015831:c.*858C>A,NM_000665:c.*114C>A,NM_001282449:c.*114C>A) | rs17228616 | 0.0936502 | - | - |
100490077 | G A | 221.999 | 34 | exonic ACHE | rs7636 | 0.105232 | Type 2 diabetes | - |
120965652 | C T | 62.0073 | 8 | intronic WNT16 | rs7782648 | 0.105232 | - | - |
120969769 | G A | 222.974 | 14 | exonic WNT16 | rs2908004 | 0.510383 | Bone mineral density | - |
120979089 | C T | 225.2 | 3 | exonic WNT16 | rs2707466 | 0.502995 | Cortical thickness | - |
134221694 | G A | 7.79993 | 2 | intronic AKR1B10 | rs706159 | 0.442891 | - | - |
134222091 | C A | 10.4247 | 2 | intronic AKR1B10 | rs138166076 | - | - | - |
134237239 | T C | 6.20226 | 1 | intronic AKR1B15 | rs73724974 | 0.0241613 | - | - |
134237294 | A G | 6.20226 | 1 | intronic AKR1B15 | rs73164857 | 0.341653 | - | - |
134239978 | C A | 29.0123 | 10 | intronic AKR1B15 | rs2113451 | 0.336462 | - | - |
134246036 | G A | 3.54557 | 1 | intronic AKR1B15 | - | - | - | - |
134248045 | A G | 11.3429 | 1 | intronic AKR1B15 | rs10261532 | 0.504992 | - | - |
134250322 | A C | 225.009 | 68 | intronic AKR1B15 | rs4732038 | 0.503594 | Longevity | - |
134251041 | C T | 5.46383 | 1 | intronic AKR1B15 | rs706201 | 0.573482 | - | - |
134252691 | C T | 26.0177 | 5 | intronic AKR1B15 | rs59136474 | 0.273762 | - | - |
134252718 | C T | 78.0075 | 6 | intronic AKR1B15 | rs17775934 | 0.368011 | - | - |
134253269 | C T | 135.015 | 9 | intronic AKR1B15 | rs56097712 | 0.337859 | - | - |
134254029 | G A | 212.009 | 47 | intronic AKR1B15 | rs3792574 | 0.513578 | - | - |
134254427 | G A | 185.999 | 23 | intronic AKR1B15 | rs782538 | 0.650759 | - | - |
134255326 | C T | 7.79993 | 1 | intronic AKR1B15 | - | - | - | - |
134259951 | T C | 87.0076 | 8 | intronic AKR1B15 | rs7788801 | 0.241813 | - | - |
134259962 | T C | 105.008 | 12 | intronic AKR1B15 | rs1465473 | 0.473243 | - | - |
134260106 | C T | 225.009 | 38 | intronic AKR1B15 | rs782539 | 0.0273562 | - | - |
134260464 | G A | 225.009 | 71 | intronic AKR1B15 | rs73724979 | 0.0239617 | - | - |
134261097 | C G | 225.009 | 66 | intronic AKR1B15 | rs2161803 | 0.482628 | - | - |
134261302 | T C | 145.008 | 26 | intronic AKR1B15 | rs59326083 | 0.024361 | - | - |
134261674 | C T | 225.009 | 53 | intronic AKR1B15 | rs6979933 | 0.306709 | - | - |
134262441 | T C | 225.009 | 71 | intronic AKR1B15 | rs10241998 | 0.336262 | - | - |
134262747 | G C | 68.0074 | 8 | intronic AKR1B15 | rs10229876 | 0.336462 | - | - |
134264286 | C T | 187.009 | 42 | exonic AKR1B15 | rs6467538 | 0.335663 | - | - |
134264546 | C G | 3.0136 | 3 | UTR3 AKR1B15(NM_001080538:c.*245C>G) | - | - | - | - |