Home page Term 1 Term 2 Term 3 About me Faculty website |
BedtoolsTHERE IS A TABLE WITH ALL THE COMMANDS USED AT THE END OF THE WEBPAGERequired taskFirst the alignment in .bam format was converted into one in .bed format. Then bedtools intersect -c was employed to count the number of reads mapped to each gene or other genome feature. The resulting file contained a host of lines with zero intersections, so regular expressions were used to extract the nonzero overlap lines into another file, wherein we can observe the already described NPM1 (see previous practical) as well as another protein, FAM172A, which is a cotranscriptional regulator involved in alternative splicing. According to this file, NPM1 is covered by 23310 reads, while the FAM172A receives 327 reads. There is also a pseudogene NPM1P27 covered by 325 reads.The same result can be achieved with bedtools coverage -counts According to UCSC Genome Browser, NPM1 transcript variant 1 is 22167 nt long (23181 nt if UTRs are included) and contains 11 exons and 10 introns. It is encoded in the positive strand with the coordinates hg19 chr5:170,814,953-170,837,569 (or 170,814,708-170,837,888 if UTRs are included). RefSeq Summary (NM_002520): The protein encoded by this gene is involved in several cellular processes, including centrosome duplication, protein chaperoning, and cell proliferation. The encoded phosphoprotein shuttles between the nucleolus, nucleus, and cytoplasm, chaperoning ribosomal proteins and core histones from the nucleus to the cytoplasm. This protein is also known to sequester the tumor suppressor ARF in the nucleolus, protecting it from degradation until it is needed. Mutations in this gene are associated with acute myeloid leukemia. Dozens of pseudogenes of this gene have been identified. (provided by RefSeq, Aug 2017). Information about FAM127A: Transcript (Including UTRs) Position: hg19 chrX:134,166,333-134,167,575 Size: 1,243 Total Exon Count: 1 Strand: + Coding Region Position: hg19 chrX:134,166,414-134,166,755 Size: 342 Coding Exon Count: 1 (almost) Optional tasksI did the tasks number 1, 2, and 4. Commands I used in these tasks can be found in the table below, while here only the resulting files are posted.1) FASTQ file 2) FASTA file 4) Clusters For the second task I created a .bed file containing the coordinates of NPM1, using its coordinates from UCSC Genome Browser. Table with all the commands used
|