BelindaCSE531.2

Tuning Tophat2 Belinda Giardine Tophat2  Aligns reads from RNA to the genome  Ribonucleic acid (RNA) is a ubiquitous family of large biological molecules that perform multiple vital roles in the coding, decoding, regulation, and expression of genes.  Adds on dealing with gaps in the alignments by breaking the reads into small pieces ~20 bases and reassembling the reads after mapping.  Though the new version is more parallel still slow (more than 4 days for recent runs)  It uses Bowtie to do the actual mapping RNA-seq image from wikipedia fastq file, a single read: @DGM97JN1:330:C3EW0ACXX:1:1101:2723:1993 1:N:0:NAAGGC GAATGCCCCCGGCCGTCCCTCTTAATCATGGCCTCAGTTCCGAAAACCANCAAAATAGAACCGCGGTCCTATTNN + CCCFFFFFHHHHGIIFGIIIIJJIIJIFGIJEHIIJIGHIJHAGHHFEE#,;;BACEEDDDDDD@B>BBDCDC## Tophat2  Pipeline written in C++ (34,351 lines of code in 63 files)  Wrapper written in Python  3 of the programs use Boost pthreads  long_spanning_reads.cpp  segment_juncs.cpp  tophat_reports.cpp  Programs are compiled as one unit under autoconfig and automake, communication between programs with temporary files.  Many prerequisites: zlib, Boost, samtools, Bowtie, this and the amount of file IO makes running on MIC only not feasible. Data files  Reads in fastq format, 20–200 million reads (2 x 20gb for my test)  Reference sequence and indexes used for mapping 6gb for mouse  Final output 14gb for my test Work from last time  Compiling  start with gcc  then icc  then add –mmic (this failed in trying to get all the prerequisites)  Test run on host, using Tophat’s log of run for time.  Run on biostar(Xeon) using 8 threads took 26 hours  Run on stampede (host) using 16 threads took 19 hours, 40 mins  Run on stampede (host) using 32 threads took 24 hours New work  Python wrapper and long run times makes gprof and vtune difficult to profile code with.  Going from my experience in Biostar, I am starting with segment_juncs executable.  Keeping the temporary files that are used for passing data between programs, I ran just segment_juncs.  Time for segment_junctions run alone:  8 threads 2 hours 13 minutes  16 threads 1 hour 15 minutes (2 ½ out of 19 ½ hours total)  of this 76% is spent in the parallel section  32 threads 2 hours 12 minutes Failed attempts  Run vtune on segment_juncs  times out of full data  license errors  Check loops in par_report that are assumed dependencies.  lines of code indicated not loops or in loops?  contradictory lines  Offloading threaded section of code in segment_juncs.cpp. Will it actually improve speed or too much file IO?  Lots of variables to copy  File IO Hardison Lab vec_report3 segment_juncs.cpp(135): (col. 32) remark: loop was not vectorized: existence of vector dependence. segment_juncs.cpp(135): (col. 32) remark: vector dependence: assumed ANTI dependence between r.92068 line 135 and r.92068 line 135. segment_juncs.cpp(135): (col. 32) remark: vector dependence: assumed FLOW dependence between r.92068 line 135 and r.92068 line 135. Line 135: left_seg.left = max(0, T.right() - 2); opt_report  REMOVED VAR left_mismatches.201433.0_V$78b  REMOVED PACK left_mismatches.201433.0  REMOVED VAR right_mismatches.201433.0_V$78d  REMOVED PACK right_mismatches.201433.0 gprof output for segment_juncs  Each sample counts as 0.01 seconds.   % cumulative self self time seconds seconds total calls Ts/call Ts/call name  100.01 0.01 0.01 extend_from_seeds(std::vector<SeedExtension, std::allocator<SeedExtension> >&, PackedSplice const&, std::vector<std::vector<ReadHit, std::allocator<ReadHit> >, std::allocator<std::vector<ReadHit, std::allocator<ReadHit> > > > const&, std::string const&, std::string const&, unsigned long, unsigned long, int)    0.00 0.01 0.00 int, unsigned int) 0.00 0.01 0.00 89528 3 0.00 0.00 0.00 0.01 0.00 2 0.00 const&, unsigned int, unsigned int) 0.00 pack_splice(std::string const&, int, 0.00 __do_global_dtors_aux 0.00 pack_right_splice_half(std::string Parallel section of code: vector<boost::thread*> threads; for (int i = 0; i < num_threads; ++i) { SegmentSearchWorker worker; worker.rt = &rt; worker.reads_fname = left_reads_fname; worker.segmap_fnames = &left_segmap_fnames; worker.partner_reads_map_fname = right_reads_map_fname; worker.seg_partner_reads_map_fname = right_seg_fname_for_segment_search; worker.juncs = &vseg_juncs[i]; worker.deletions = &vdeletions[i]; worker.insertions = &vinsertions[i]; worker.fusions = &vfusions[i]; worker.read = READ_LEFT; worker.partner_hit_offset = 0; worker.seg_partner_hit_offset = 0; if (i == 0) { worker.begin_id = 0; worker.seg_offsets = vector<int64_t>(left_segmap_fnames.size(), 0); worker.read_offset = 0; } else { worker.begin_id = read_ids[i-1]; worker.seg_offsets.insert(worker.seg_offsets.end(), offsets[i-1].begin()+1, offsets[i-1].end()); worker.read_offset = offsets[i-1][0]; if (partner_offsets.size() > 0) worker.partner_hit_offset = partner_offsets[i-1]; if (seg_partner_offsets.size() > 0) worker.seg_partner_hit_offset = seg_partner_offsets[i-1]; } worker.end_id = (i+1 < num_threads) ? read_ids[i] : std::numeric_limits<uint64_t>::max(); //Geo debug: //fprintf(stderr, "Worker %d: begin_id=%lu, end_id=%lu\n", i, worker.begin_id, worker.end_id); if (num_threads > 1 && i + 1 < num_threads) threads.push_back(new boost::thread(worker)); else worker(); }

BelindaCSE531.2

Related documents

Products

Support

BelindaCSE531.2

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib