doc

advertisement
2010 and beyond, the decade of high-performance computing
for the next-generation sequence analysis
Mary Qu Yang, Ph.D.
United States National Human Genome Research Institute,
National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20852
And Oak Ridge, D.O.E.
yangma[at]mail.NIH.gov
With the advent of high-throughput next-generation
sequencing technologies, high performance computing
algorithms are demanding to process the sheer volume of
data. Effective analysis of the next-generation sequence
relies on the intelligent algorithms for handling the metagenome, epigenetic and disease genome, as well as
immune-genome, viral profiling, transcriptome profiling,
and de novo genome sequencing data, therefore, nextgeneration sequence analysis is a very important but
challenging task in bioinformatics due to extremely largescale “noisy” datasets. The plethora of handling the nextgeneration sequencing dealing with essentially two types
of problems: alignment for which a reference sequence is
available and de novo assemble for which no reference
sequence is available. Each strikes a different balance
among speed of data generation, cost efficiency, read
length, and data volume. Consequently, the development
of high performance computing techniques to solve
biological applications will depend on the type of
sequencing technology used to generate the data and the
availability of the data. In this lecture, we propose a key
high-performance computing technology for successfully
meeting searching and alignment challenges. We focus
mainly on the development of high-performance genetic
algorithms based on multi-core technology as an example
and open to fast-moving competing platforms to be
emerged. The most fundamental problem is effectiveness
of sequence alignment and sequence search. Due to largescale next-generation sequencing datasets, implementing
thousands of queries is challenging the current caliber of
supercomputers. We consider parallelizing thousands of
queries in search sequences on multi-core processors is an
important but difficult task, in this lecture, we propose
new way to formulate the parallelization problem by
developing a systematic method utilizing genetic
algorithms on the multi-cores. Furthermore, the method
makes it possible to view the parallelization problem as a
derivation of the Traveling Salesman Problem (TSP). We
combine many heuristic methods based on genetic
algorithms to solve the TSP for the task parallelization
problem. The algorithm provides a viable alternative for
handling vast next -generation sequencing data efficiently
and opens the enormous applications in translational
biomedical sequence analysis.
Dr. Mary Yang received
MSECE, MS and Ph.D. degrees, all from Purdue
University, West Lafayette main campus and postdoctoral training from NIH main campus in Maryland.
She also completed the research specialist training from
NIH, U.S. Department of Health and Human Services
and Oak Ridge, DOE and received training in
biostatistics and bioinformatics from Johns Hopkins
University. She was a visiting scholar of Dr. Jun S. Liu's
statistical and computational genomics laboratory of
Harvard University in Cambridge. Dr. Yang was a
recipient of the Outstanding Interdisciplinary Bilsland
Dissertation Fellow for Computer Engineering (Advisor:
Dr. Okan K. Ersoy) and Biophysics (Advisor: Dr. Albert
W. Overhauser) Dual Degrees and NIH Fellow for the
National Human Genome Research. Dr. Yang works in
both engineering practice and translational medicine and
was trained as a combined experimental and computer
scientist with more than 15 years of teaching, research
and engineering practice experience. She is Editor-inChief of International Journal of Computational Biology
and Drug Design, Consulting Editor of International
Journal of Functional Informatics and Personalized
Medicine, both official journals of International Society
of Intelligent Biological Medicine. She has been an
editor of a number of journals including Journal of
Supercomputing (Springer), and International Journal of
Pattern Recognition and Artificial Intelligence (World
Scientific). Dr. Yang was a co-author of 30 PubMed
indexed articles at journals including Science, PLoS
Computational Biology (official journal of international
society of computational biology), Endocrine Pathology
(official journal of endocrine pathology society), BMC
Bioinformatics, International Journal of Data Mining
and Bioinformatics, and BMC Genomics. She was a coauthor of 40 DBLP indexed papers and has delivered a
number of keynote and invited lectures to promote the
emerging fields of translational bioinformatics and
personalized medicine. She specializes in genomics and
high performance computing.
Download