doc

2010 and beyond, the decade of high-performance computing for the next-generation sequence analysis Mary Qu Yang, Ph.D. United States National Human Genome Research Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20852 And Oak Ridge, D.O.E. yangma[at]mail.NIH.gov With the advent of high-throughput next-generation sequencing technologies, high performance computing algorithms are demanding to process the sheer volume of data. Effective analysis of the next-generation sequence relies on the intelligent algorithms for handling the metagenome, epigenetic and disease genome, as well as immune-genome, viral profiling, transcriptome profiling, and de novo genome sequencing data, therefore, nextgeneration sequence analysis is a very important but challenging task in bioinformatics due to extremely largescale “noisy” datasets. The plethora of handling the nextgeneration sequencing dealing with essentially two types of problems: alignment for which a reference sequence is available and de novo assemble for which no reference sequence is available. Each strikes a different balance among speed of data generation, cost efficiency, read length, and data volume. Consequently, the development of high performance computing techniques to solve biological applications will depend on the type of sequencing technology used to generate the data and the availability of the data. In this lecture, we propose a key high-performance computing technology for successfully meeting searching and alignment challenges. We focus mainly on the development of high-performance genetic algorithms based on multi-core technology as an example and open to fast-moving competing platforms to be emerged. The most fundamental problem is effectiveness of sequence alignment and sequence search. Due to largescale next-generation sequencing datasets, implementing thousands of queries is challenging the current caliber of supercomputers. We consider parallelizing thousands of queries in search sequences on multi-core processors is an important but difficult task, in this lecture, we propose new way to formulate the parallelization problem by developing a systematic method utilizing genetic algorithms on the multi-cores. Furthermore, the method makes it possible to view the parallelization problem as a derivation of the Traveling Salesman Problem (TSP). We combine many heuristic methods based on genetic algorithms to solve the TSP for the task parallelization problem. The algorithm provides a viable alternative for handling vast next -generation sequencing data efficiently and opens the enormous applications in translational biomedical sequence analysis. Dr. Mary Yang received MSECE, MS and Ph.D. degrees, all from Purdue University, West Lafayette main campus and postdoctoral training from NIH main campus in Maryland. She also completed the research specialist training from NIH, U.S. Department of Health and Human Services and Oak Ridge, DOE and received training in biostatistics and bioinformatics from Johns Hopkins University. She was a visiting scholar of Dr. Jun S. Liu's statistical and computational genomics laboratory of Harvard University in Cambridge. Dr. Yang was a recipient of the Outstanding Interdisciplinary Bilsland Dissertation Fellow for Computer Engineering (Advisor: Dr. Okan K. Ersoy) and Biophysics (Advisor: Dr. Albert W. Overhauser) Dual Degrees and NIH Fellow for the National Human Genome Research. Dr. Yang works in both engineering practice and translational medicine and was trained as a combined experimental and computer scientist with more than 15 years of teaching, research and engineering practice experience. She is Editor-inChief of International Journal of Computational Biology and Drug Design, Consulting Editor of International Journal of Functional Informatics and Personalized Medicine, both official journals of International Society of Intelligent Biological Medicine. She has been an editor of a number of journals including Journal of Supercomputing (Springer), and International Journal of Pattern Recognition and Artificial Intelligence (World Scientific). Dr. Yang was a co-author of 30 PubMed indexed articles at journals including Science, PLoS Computational Biology (official journal of international society of computational biology), Endocrine Pathology (official journal of endocrine pathology society), BMC Bioinformatics, International Journal of Data Mining and Bioinformatics, and BMC Genomics. She was a coauthor of 40 DBLP indexed papers and has delivered a number of keynote and invited lectures to promote the emerging fields of translational bioinformatics and personalized medicine. She specializes in genomics and high performance computing.

doc

Related documents

Products

Support

doc

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib