Introduction

advertisement
CpSc 845: Bioinformatics Algorithms
Introduction
Copy Right Notice
Most slides in this presentation are
adopted from various sources. The
Copyright belong to the original authors.
Thanks!
2
General Information
Instructor:
Dr. Feng Luo, Associate professor
school of computing
210 McAdams Hall
Office: (864) 656 4793
E-mail: luofeng@clemson.edu
– Class Hours/Room:
2:00 PM ~ 3:15 PM, TTH, 119 McAdams Hall
Office Hours/Room:
2:00 PM ~ 3:00 PM, MW, and by appointment, 310
McAdams Hall
Course Webpage:
www.cs.clemson.edu/~luofeng/course/2014fall/845/bioinfo.ht
ml
3
Text Book
Textbook:
Neil C. Jones, Pavel A. Pevzner “An Introduction
to Bioinformatics Algorithms”; The MIT Press;
ISBN: 0262101068
Reference books:
Jonathan Pevsner, "Bioinformatics and
Functional Genomics", First edition
(October 2003); Publisher: Wiley, John & Sons;
ISBN: 0471210048
4
Grading
Grading:
Mid-term exam
Final exam
Term Project
5
25%
25 %
50 %
The “old” biology
The most challenging task for a scientist is to get good data
6
The “new” biology
7
The most challenging task for a scientist is to make sense
of lots of data
Old vs. New - What’s the
difference? 1) Economics
Miniaturize – less cost
Multiplex – more data
Parallelize – save time
Automate – minimize human intervention
Thus, you must be able to deal with large
amounts of data and trust the process that
generated it
8
What’s the difference? 2) Scale
From gene sequencing (~ 1 KB) to
genome sequencing (many MB, even
GB)
From picking several genes for
expression studies to analyzing the
expression patterns of all genes
From a catalog of key genes in a few
key species to a catalog of all genes in
many species
9
Analyzing your data in isolation makes
less sense when you can make much
more powerful statements by including
data from others
What’s the difference? 3) Logic
Hypothesis-driven research to datadriven research
Expertise-driven approach versus
information-driven approach
Reductionist versus integrationist
How to answer the question becomes
how to question an answer
Algorithmic approaches for filtering,
normalizing, analyzing and interpreting
become increasingly important
10
Data-driven Science
Must have some hypothesis – data is
not the end goal of science
Finding patterns in the data is where
analysis starts, not ends
Must understand the limits of highthroughput technology (e.g. microarrays
measure transcription only, one genome
does not tell you about species
variation, etc.)
Must understand or explore the limits of
your algorithm
11
# of databases (estimated) .
Data is being collected faster and in
greater amounts
700
600
500
400
300
200
100
0
2005
Year
2000
1995
1990
1985
1980
12
Growth in microarray publications
14000
# of microarray papers
12000
10000
8000
6000
4000
2000
0
1998
13
1999
2000
13
2001
2002
2003
2004
2005
Plummeting Cost of Sequencing
1000000000
Original Data: Memory cost: $/Mbyte
"Original Data: CPU cost: $/MFLOP"
Original Data: Sequencing cost: $/base-pair
Fit to CPU
Fit to Mem. Cost
Fit to Seq. Cost
100000000
1000000
100000
10000
1000
100
$
[Greenbaum et al., Am. J. Bioethics ('08)]
10000000
10
1
0.1
0.01
0.001
0.0001
0.00001
0.000001
0.0000001
1980
14
1985
1990
1995
2000
2005
2010
Growth in information &
knowledge
# of articles in MEDLINE (millions)
MEDLINE spans:
16
14
>4,800 Journals
12
>16,000,000 records
10
8
672,000 new papers in 2005
(~1,840 per day)
6
4
2
0
2005
2001
1997
15
1993
1989
1985
1981
1977
1973
Year
The use of software & algorithms is
becoming more common in
biomedical research
16
The use of software & algorithms is
becoming more common in
biomedical research
17
The Biomedical Information Science
and Technology Initiative (BISTI)
Prepared by the Working Group on
Biomedical Computing
Advisory Committee to the Director
National Institutes of Health
http://www.nih.gov/about/director/060399.h
tm
18
What is Bioinformatics?
Bioinformatics: Research, development,
or application of computational tools and
approaches for expanding the use of
biological, medical, behavioral or health
data, including those to acquire, store,
organize, archive, analyze, or visualize
such data. (BISTIC Definition Committee,
July 2000)
Computational Biology: The
development and application of dataanalytical and theoretical methods,
mathematical modeling and
computational simulation techniques to
the study of biological, behavioral, and
social systems. (BISTIC Definition
Committee, July 2000)
19
Bioinformatics is Interdisciplinary
Mathematics
Statistics
Computer Science
Biomedicine
Molecular
Biology
Structural
Biology
Ethical,
legal and
social implications
20
Bioinformatics
Biophysics
Evolution
Patrice Koehl
21
22
Bioinformatics Opportunity
“From the Principal Investigators who
understand how to use computers to solve
biomedical problems to the people who keep
the computers running, there is a shortfall of
trained, educated, competent people. The NIH
needs a program of workforce development for
biomedical computing that encompasses every
level, from the technician to the Ph.D. The
National Programs of Excellence in Biomedical
Computing would provide a structure for
developing expertise among biomedical
researchers in using computational tools.”
(BISTI, 1999)
23
NATURE MEDICINE • VOLUME 5 • NUMBER 7 • JULY 1999
24
NATURE|VOL 400 | 8 JULY 1999 |www.nature.com
25
Nature 410, 293 (15 March 2001)
26
Nature Biotechnology 22, 933 (2004)
Published online: 27 July 2004; | doi:10.1038/nbt0804-933
27
Industry Opportunity
Aris Persidis, Industry Trends: Bioinformatics, NATURE BIOTECHNOLOGY VOL 17 AUGUST 1999
28
Industry Opportunity
29
Aris Persidis, Industry Trends: Bioinformatics, NATURE BIOTECHNOLOGY VOL 17 AUGUST 1999
Top ten challenges for
bioinformatics
[1] Precise models of where and when transcription
will occur in a genome (initiation and termination)
[2] Precise, predictive models of alternative RNA
splicing
[3] Precise models of signal transduction pathways;
ability to predict cellular responses to external
stimuli
[4] Determining protein:DNA, protein:RNA,
protein:protein recognition codes
30
[5] Accurate ab initio protein structure prediction
Top ten challenges for
bioinformatics
[6] Rational design of small molecule inhibitors of
proteins
[7] Mechanistic understanding of protein evolution
[8] Mechanistic understanding of speciation
[9] Development of effective gene ontologies:
systematic ways to describe gene and protein
function
[10] Education: development of bioinformatics
curricula
31
Source: Ewan Birney,
Chris Burge, Jim Fickett
Topic Covered
Sequence Analysis
Microarray Analysis
Protein Structure
Biological Network
More …
32
Download