Практикум 14Сборка генома de novo1. С помощью команды gzip -d C.fastq.gz был разархивирован файл.2. Теперь адаптеры.Команда: java -jar /usr/share/java/trimmomatic.jar SE -phred33 C.fastq outC.fastq ILLUMINACLIP:adapters.fasta:2:7:7,
Рис.1. Команда для адаптеров 3. Убираем плохие буквы с концов.Надо пройти скользящим окном (SLIDINGWINDOW) длины 5 по каждому прочтению и убрать части ридов после любого окна со средним качеством ниже 28.
Рис.2. Команда для отбора по качеству Рис.3. Команда для удаления коротких фрагментов 4. Запуск velveth.Запустить velveth для создания 31-меров (т.е., параметр hash_length должен быть равен 31). См. руководство пользователя пакета Velvet. Чтения в нашем случае короткие и не парные (short). Команда: velveth /nfs/srv/databases/ngs/nastya_lacewing/14 31 -short -fastq goodC2.fastq (картинка старая) Рис.4. Запуск velveth Создались: Файлы Log, Roadmaps и Sequences 5.Собираем контиги программой velvetg.Команда: velvetg . Рис.5. Запуск velvetg В итоге создались Graph, PreGraph, LastGraph, contigs.fa, stats.txt (Картинки старые) 6. Информация о запуске velvet.Как видно из выхода программы "Final graph has 257875 nodes and n50 of 68, max 672, total 5410757, using 0/3525084 reads."
7. Аннотация программой BLAST.(указать для каждого контига: банковскую аннотацию (организм и описание) лучшей находки, число выравниваний контига с этой находкой, выданных BLAST-ом, характеристики этих выравниваний (покрытие контига, процент идентичности). a) Большой контиг
Рис.6. BLAST и большой контиг Рис.7. Находки Рис.8. Выравнивание с лучшей находкой Таким образом, лучшая находка - "Arabidopsis thaliana myo-inositol oxygenase 2 (MIOX2), mRNA" - мио-инозитол оксигеназа 2 у резушки. Ссылка на аннотацию в ncbi б) Контиг с самым высоким покрытием
Рис.9. BLAST и контиг с высоким покрытием Рис.10. Находки Рис.11. Выравнивания с несколькими лучшими находками Здесь, выходит, что лучшая находка - ген NHL3, расположенный в 5 хромосоме у резушки. Причем 100% совпадение наблюдается у всех экотипов. Ссылка на аннотацию в ncbi ("Arabidopsis thaliana ecotype Ms-0 NHL3 (NHL3) gene, partial cds") в) Контиг с самым низким покрытием
Рис.12. BLAST и контиг с низким покрытием Картинки:
| |
nastya_lacewing@kodomo:/nfs/srv/databases/ngs/nastya_lacewing/14$ velvetg . [0.000001] Reading roadmap file ./Roadmaps [13.069566] 3525084 roadmaps read [13.074688] Creating insertion markers [14.841841] Ordering insertion markers [17.108252] Counting preNodes [18.445220] 1251885 preNodes counted, creating them now [24.595574] Sequence 1000000 / 3525084 [30.520597] Sequence 2000000 / 3525084 [36.326501] Sequence 3000000 / 3525084 [39.530705] Adjusting marker info... [41.462523] Connecting preNodes [43.647650] Connecting 1000000 / 3525084 [45.828360] Connecting 2000000 / 3525084 [48.420296] Connecting 3000000 / 3525084 [50.443201] Cleaning up memory [50.483984] Done creating preGraph [50.484066] Concatenation... [51.201701] Renumbering preNodes [51.201790] Initial preNode count 1251885 [51.365704] Destroyed 697242 preNodes [51.365802] Concatenation over! [51.365817] Clipping short tips off preGraph [51.645467] Concatenation... [51.931252] Renumbering preNodes [51.931310] Initial preNode count 554643 [52.048195] Destroyed 174789 preNodes [52.048255] Concatenation over! [52.048266] 110648 tips cut off [52.048275] 379854 nodes left [52.048421] Writing into pregraph file ./PreGraph... [54.226054] Reading read set file ./Sequences; [55.544843] 3525084 sequences found [61.646247] Done [65.099933] Reading pre-graph file ./PreGraph [65.101395] Graph has 379854 nodes and 3525084 sequences [66.319619] Scanning pre-graph file ./PreGraph for k-mers [67.105382] 6775454 kmers found [68.491144] Sorting kmer occurence table ... [71.994918] Sorting done. [71.994984] Computing acceleration table... [72.127158] Computing offsets... [72.224952] Ghost Threading through reads 0 / 3525084 [72.227348] Ghost Threading through reads 1000000 / 3525084 [72.229748] Ghost Threading through reads 2000000 / 3525084 [72.236254] Ghost Threading through reads 3000000 / 3525084 [72.242212] === Ghost-Threaded in 0.021161 s [72.242310] Threading through reads 0 / 3525084 [82.579861] Threading through reads 1000000 / 3525084 [93.653297] Threading through reads 2000000 / 3525084 [105.653964] Threading through reads 3000000 / 3525084 [114.685998] === Threaded in 42.443723 s [114.713758] Correcting graph with cutoff 0.200000 [114.767308] Determining eligible starting points [115.593066] Done listing starting nodes [115.593151] Initializing todo lists [115.842281] Done with initilization [115.842345] Activating arc lookup table [116.179062] Done activating arc lookup table [116.252501] 10000 / 379854 nodes visited [116.334936] 20000 / 379854 nodes visited [116.444759] 30000 / 379854 nodes visited [116.568464] 40000 / 379854 nodes visited [116.706278] 50000 / 379854 nodes visited [116.870702] 60000 / 379854 nodes visited [117.075387] 70000 / 379854 nodes visited [117.256423] 80000 / 379854 nodes visited [117.487738] 90000 / 379854 nodes visited [117.675395] 100000 / 379854 nodes visited [117.811590] 110000 / 379854 nodes visited [117.940649] 120000 / 379854 nodes visited [118.065364] 130000 / 379854 nodes visited [118.191347] 140000 / 379854 nodes visited [118.323139] 150000 / 379854 nodes visited [118.465467] 160000 / 379854 nodes visited [118.614894] 170000 / 379854 nodes visited [118.761414] 180000 / 379854 nodes visited [118.907452] 190000 / 379854 nodes visited [119.064777] 200000 / 379854 nodes visited [119.210894] 210000 / 379854 nodes visited [119.364163] 220000 / 379854 nodes visited [119.520420] 230000 / 379854 nodes visited [119.676647] 240000 / 379854 nodes visited [119.848178] 250000 / 379854 nodes visited [120.019173] 260000 / 379854 nodes visited [120.205335] 270000 / 379854 nodes visited [120.393453] 280000 / 379854 nodes visited [120.575410] 290000 / 379854 nodes visited [120.764343] 300000 / 379854 nodes visited [120.958745] 310000 / 379854 nodes visited [121.143658] 320000 / 379854 nodes visited [121.318708] 330000 / 379854 nodes visited [121.529430] 340000 / 379854 nodes visited [121.746941] 350000 / 379854 nodes visited [121.954052] 360000 / 379854 nodes visited [122.159835] 370000 / 379854 nodes visited [122.364553] 380000 / 379854 nodes visited [122.577407] 390000 / 379854 nodes visited [122.777648] 400000 / 379854 nodes visited [122.967121] 410000 / 379854 nodes visited [123.154931] 420000 / 379854 nodes visited [123.329678] 430000 / 379854 nodes visited [123.510240] 440000 / 379854 nodes visited [123.712092] 450000 / 379854 nodes visited [123.876773] 460000 / 379854 nodes visited [124.025761] 470000 / 379854 nodes visited [124.177378] 480000 / 379854 nodes visited [124.389389] 490000 / 379854 nodes visited [124.437872] 500000 / 379854 nodes visited [124.485147] 510000 / 379854 nodes visited [124.530304] 520000 / 379854 nodes visited [124.569131] 530000 / 379854 nodes visited [124.604144] 540000 / 379854 nodes visited [124.639056] 550000 / 379854 nodes visited [124.676071] 560000 / 379854 nodes visited [124.705553] 570000 / 379854 nodes visited [124.720811] Concatenation... [124.815571] Renumbering nodes [124.815630] Initial node count 379854 [124.855660] Removed 76313 null nodes [124.855722] Concatenation over! [124.855752] Clipping short tips off graph, drastic [125.374001] Concatenation... [125.498273] Renumbering nodes [125.498325] Initial node count 303541 [125.546596] Removed 45666 null nodes [125.546658] Concatenation over! [125.546690] 257875 nodes left [125.546922] Writing into graph file ./Graph... [127.276814] WARNING: NO COVERAGE CUTOFF PROVIDED [127.276876] Velvet will probably leave behind many detectable errors [127.276886] See manual for instructions on how to set the coverage cutoff parameter [127.276947] Removing contigs with coverage < -1.000000... [127.303856] Concatenation... [127.384021] Renumbering nodes [127.384077] Initial node count 257875 [127.385052] Removed 0 null nodes [127.385069] Concatenation over! [127.409058] Concatenation... [127.491046] Renumbering nodes [127.491096] Initial node count 257875 [127.492094] Removed 0 null nodes [127.492113] Concatenation over! [127.492137] Clipping short tips off graph, drastic [127.515145] Concatenation... [127.594205] Renumbering nodes [127.594262] Initial node count 257875 [127.595283] Removed 0 null nodes [127.595299] Concatenation over! [127.595308] 257875 nodes left [127.595327] WARNING: NO EXPECTED COVERAGE PROVIDED [127.595335] Velvet will be unable to resolve any repeats [127.595344] See manual for instructions on how to set the expected coverage parameter [127.595362] Concatenation... [127.673245] Renumbering nodes [127.673312] Initial node count 257875 [127.674321] Removed 0 null nodes [127.674337] Concatenation over! [127.674349] Removing reference contigs with coverage < -1.000000... [127.699514] Concatenation... [127.777535] Renumbering nodes [127.777603] Initial node count 257875 [127.778583] Removed 0 null nodes [127.778599] Concatenation over! [127.809934] Writing contigs into ./contigs.fa... [128.403973] Writing into stats file ./stats.txt... [129.358465] Writing into graph file ./LastGraph... Final graph has 257875 nodes and n50 of 68, max 672, total 5410757, using 0/3525084 reads | |
© Cherkashina Anastasia 2017 |