Home page
Term 1
Term 2
Term 3
About me
Faculty website

Bedtools

THERE IS A TABLE WITH ALL THE COMMANDS USED AT THE END OF THE WEBPAGE

Required task

First the alignment in .bam format was converted into one in .bed format. Then bedtools intersect -c was employed to count the number of reads mapped to each gene or other genome feature. The resulting file contained a host of lines with zero intersections, so regular expressions were used to extract the nonzero overlap lines into another file, wherein we can observe the already described NPM1 (see previous practical) as well as another protein, FAM172A, which is a cotranscriptional regulator involved in alternative splicing. According to this file, NPM1 is covered by 23310 reads, while the FAM172A receives 327 reads. There is also a pseudogene NPM1P27 covered by 325 reads.
The same result can be achieved with bedtools coverage -counts
According to UCSC Genome Browser, NPM1 transcript variant 1 is 22167 nt long (23181 nt if UTRs are included) and contains 11 exons and 10 introns. It is encoded in the positive strand with the coordinates hg19 chr5:170,814,953-170,837,569 (or 170,814,708-170,837,888 if UTRs are included).
RefSeq Summary (NM_002520): The protein encoded by this gene is involved in several cellular processes, including centrosome duplication, protein chaperoning, and cell proliferation. The encoded phosphoprotein shuttles between the nucleolus, nucleus, and cytoplasm, chaperoning ribosomal proteins and core histones from the nucleus to the cytoplasm. This protein is also known to sequester the tumor suppressor ARF in the nucleolus, protecting it from degradation until it is needed. Mutations in this gene are associated with acute myeloid leukemia. Dozens of pseudogenes of this gene have been identified. (provided by RefSeq, Aug 2017).
Information about FAM127A:
Transcript (Including UTRs)
   Position: hg19 chrX:134,166,333-134,167,575 Size: 1,243 Total Exon Count: 1 Strand: +
Coding Region
   Position: hg19 chrX:134,166,414-134,166,755 Size: 342 Coding Exon Count: 1

(almost) Optional tasks

I did the tasks number 1, 2, and 4. Commands I used in these tasks can be found in the table below, while here only the resulting files are posted.
1) FASTQ file
2) FASTA file
4) Clusters
For the second task I created a .bed file containing the coordinates of NPM1, using its coordinates from UCSC Genome Browser.

Table with all the commands used

CommandFunction
bedtools bamtobed -i mapped_transcriptome_sorted.bam > mapped_transcriptome.bedConverts a .bam alignment into a .bed alignment. -i is just an input file.
bedtools intersect -c -a gencode.genes.bed -b mapped_transcriptome_sorted.bed > intersection1.bedThis command, launched with -c, counts the number of intersections of parts of genes from gencode.genes.bed with the reads from mapped_transcriptome_sorted.bed
grep -vw 0 intersection1.bed > intersection1_nonzero.bedExtracts lines with nonzero numbers of intersections.
bedtools bamtofastq -i mapped_transcriptome_sorted.bam -fq mapped_transcriptome_sorted.fastqConverts a .bam file into a .fastq file
bedtools cluster -i mapped_transcriptome_sorted.bed > cluster.bedForms clusters out of the reads.
bedtools getfasta -fi chr5.fasta -bed NPM1.bed > NPM1.fasta Extracts a region of a reference sequence (-fi) into a .fasta file. The coordinates of the region are provided the file following -bed.


© Stanislav Tikhonov, 2018