0. Choose one fileChosen file: chr7.2.fastq1. Use FASTQC to analyse .fastq file
|
|
hisat2-build chr7.fasta chr7 hisat2 --no-softclip -x chr7 -U chr7.2.fastq -S al.sam (trancriptome isn't supposed to be not spliced, so '--no-spliced-alignment' parameter has been removed from the command line)
13701 reads; of these: 13701 (100.00%) were unpaired; of these: 122 (0.89%) aligned 0 times 13529 (98.74%) aligned exactly 1 time 50 (0.36%) aligned >1 times 99.11% overall alignment rate |
As we can see from the output, 122 reads were not aligned, and 50 were aligned more than once. From the total of 13701 reads 13529 were aligned exactly one time. |
samtools view al.sam -b -o al.bam samtools sort al.bam -T al.txt -o samsa.bam samtools index samsa.bam
Command | Function |
bedtools bamtobed -i samsa.bam > samsa.bed | converts sequence alignments in BAM format into BED |
bedtools intersect -a /P/y14/term3/block4/SNP/rnaseq_reads/gencode.genes.bed -b samsa.bed -u > samsadone.bed | allows one to screen for overlaps between two sets of genomic features.
-a - BED file "A" -b - one BED file "B" -u - reports the fact at least one overlap was found in B. |
From the samsadone.bed file we can receive an information about amounts of reads for each gene.
Download Excel data:
for CPED1: all 62 (because other genes are pseudogenes)
Gene | Notes |
CPED1 | Cadherin like and PC-esterase domain containing 1 coding gene. CDEP1 located in ER in cell. Its function is unknown. |
HMGN1P18 | Pseudogene |
RNA5SP241 | Pseudogene |
RNU6-517P | Pseudogene |
Task | Command |
1. Receive .fq file from .bam | bedtools bamtofastq -i result.bam -fq result.fq |
2. Receive .fasta file with nucleotide sequence for your reference sequence | bedtools getfasta -fi chr7.fasta -bed nu2.bed > nu2.fasta nu2.bed content: chr7 120627524 120627575 D00795:16:C7BV0ACXX:5:1311:14764:31328 60 + |
3. Split chr7 into equal (1 miilion of nuclotides length) fragments | bedtools makewindows -g data.txt -w 1000000 > nu3.bed |
4. Unite all reads into one cluster | bedtools cluster -i samsa.bed -s >nu4.bed |
5. Pick 100 random fragments 200 nucleotides length each | bedtools random -g data.txt -n 1000 -l 200 > ran.bed |