Lecture 1 (Genomics? Computational Biology?) #1_Aug20 BCB 444/544

advertisement
BCB 444/544
Lecture 1
What is Bioinformatics?
(Genomics? Computational Biology?)
#1_Aug20
Thanks to Mark Gerstein (Yale)
& Eric Green (NIH)
for many borrowed & modified PPTs
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
1
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
2
BCB 444/544
Introduction to Bioinformatics
Instructors:
Drena Dobbs
ddobbs@iastate.edu
Michael Terribilini
terrible@iastate.edu
Jae-Hyung Lee
jhlee777@iastate.edu
TAs:
Jeff Sander
jdsander@iastate.edu
Pete Zaback
petez@iastate.edu
Lab: MBB 106, 4-4991
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
3
BCB 444/544 - Website
http://bindr.gdcb.iastate.edu/bcb544
• Syllabus
• Lecture & Lab Schedules
(with Homework Assignments)
• Lecture PPTs
• Lab Exercises
• Practice Exams
• Grading Policy
• Project Guidelines, etc.
• Links
• Check regularly for updates!
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
4
BCB 444/544 - Computer Lab
Meets in 1304 MBB every week
EXCEPT this week:
1st Lab meets in Library Rm 32
Current schedule: Thurs 1-3 PM
Conflicts?
Alternatives?
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
5
BCB 444/544 - Required Textbook
Essential Bioinformatics
Jin Xiong, Cambridge, 2006
ISBN-13: 9780521600828
Textbook Companion Website:
Not much of one for Xiong: Xiong resources
but check out companion sites for optional texts
(next slide - URLs also provided on class website)
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
6
BCB 444/544 - Optional Textbooks
Please don't buy these yet! Completely optional, perhaps
useful, references. All are available from ISU Bookstore
but are cheaper from online booksellers.
• Mount - good reference for
both "biologists" and "computer
scientists" - but a bit out of
date; Online resources - include
lists of applications with URLs
• Pevsner - great overview, esp.
for those with little biology
background; Online resources excellent: many links & PPTs.
• Jones & Pevzner - good
introduction to basic algorithms,
esp. for biologists with little
computer science background;
Online resources - very good:
problems, links & PPTs.
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
7
Required Reading
(after today, must read before lecture)
Wed Aug 22 - for Lecture #2
• Xiong Textbook:
• Chp 1 - Introduction
• Chp 2 - Biological Databases
Thurs Aug 23 - for Lab #1:
• Literature Resources for Bioinformatics
Andrea Dinkelman, see Lab Schedule for URL
Fri Aug 24
• Genomics & Its Impact on Science & Society:
Genomics & Human Genome Project Primer
see Lecture Schedule for URL
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
8
Xiong: Chps 1 & 2
SECTION I
INTRODUCTION AND BIOLOGICAL DATABASES
1 Introduction
What Is Bioinformatics?
Goal
Scope
Applications
Limitations
New Themes
Further Reading
2 Introduction to Biological Databases
What Is a Database?
Types of Databases
Biological Databases
Pitfalls of Biological Databases
Information Retrieval from Biological Databases
Summary
Further Reading
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
9
Assignment #1:
Tell us about you
Due: Wed, Aug 22
1- Complete HW1_Aug20 for Drena
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
10
Assignment #2 (& for Fun):
DNA Interactive "Genomes"
http://www.dnai.org/c/index.html
A tutorial on genomic sequencing, gene structure,
genes prediction
Howard Hughes Medical Institute (HHMI)
Cold Spring Harbor Laboratory (CSHL)
1.
2.
3.

Take the Tour
Read about the Project
Do some Genome Mining with:
Nothing to turn in - just do it!
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
11
What is Bioinformatics?
Wikipedia:
• Bioinformatics and computational biology
involve the use of techniques including:
applied mathematics
informatics
statistics
computer science
artificial intelligence
chemistry & biochemistry
(& engineering)
to solve biological problems usually on the molecular level
• Research in computational biology often overlaps with
systems biology (& genomics)
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
12
What is Systems Biology?
Genomics?
Wikipedia:
• Systems Biology - a term used very widely in the
biosciences, particularly from the year 2000 onwards,
and in a variety of contexts...
• Genomics - is the study of an organism's entire genome
Hmmm -- these aren't very useful!
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
13
What is Bioinformatics?
Gerstein (Yale):
• Bioinformatics is conceptualizing biology in terms of
molecules & applying “informatics” techniques - derived
from disciplines such as mathematics, computer science,
and statistics - to organize and understand information
associated with these molecules, on a large scale
Modified from Mark Gerstein
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
14
What is the Information?
Biological Sequences, Structures,
Processes
Central Dogma of
Molecular Biology
Central Paradigm for
Computational Biology
• DNA Sequence (1 gene)
• DNA Sequence (entire genome)
-> mRNA, rRNA, tRNA, snRNAs
-> regulatory RNAs, e.g. miRNAs
-> mRNA
-> Protein
-> Phenotype
• Molecules
-> Proteins
-> Phenotype
 Sequence
 Structure
 Function
• Molecules & Systems
 Mechanism
 Specificity
 Regulation
• Large Amounts of Information
 Sequence, Structure, Function
 Interactions
 Pathways & Networks
• Processes
Modified from Mark Gerstein
 Standardized ontologies
 Statistical analyses
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
15
Explosion of "Omes" & "Omics!"
Genome, Transcriptome, Proteome…
• Genome - complete collection of DNA
(genes and "non-genes") of an
organism
• Transcriptome - complete collection
of RNAs (mRNAs & others)
expressed in an organism*
• Proteome - complete collection of
proteins expressed in an organism*
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
16
Genome = Constant
(more or less…)
Transcriptome & Proteome = Variable
* Note:
• Although DNA is "identical" in all
cells of a single organism, both
types and amounts of RNAs &
proteins vary greatly in different
cells & tissues
• Expression patterns depend on
variables such as developmental
stage, age, disease state,
environmental conditions, etc.
BCB 444/544 F07 ISU
• Genome - complete collection
of DNA (genes and "nongenes") of an organism
• Transcriptome - complete
collection of RNAs (mRNAs &
others) expressed in an
organism *
• Proteome - complete
collection of proteins
expressed in an organism *
Dobbs #1 - What is Bioinformatics?
8/20/07
17
Molecular Biology Information:
DNA & RNA Sequences
Functions:
DNA sequence:
• Genetic material
• Information transfer (mRNA)
• Protein synthesis (rRNA/tRNA)
• Catalytic & regulatory activities
(some very recently discovered!)
Information:
• 4 letter alphabet: A C G T
of DNA nucleotides (nt)
• ~ 1000 base pairs (bp) in avg gene
(in bacteria)
• ~ 3 X 109 bp in human genome
Modified from Mark Gerstein
BCB 444/544 F07 ISU
atggcaattaaaattggtatcaatggttttggtcgtat
gcacaacaccgtgatgacattgaagttgtaggtattaa
atggcttatatgttgaaatatgattcaactcacggtcg
aaagatggtaacttagtggttaatggtaaaactatccg
Gcaaacttaaactggggtgcaatcggtgttgatatcgctttaactg
atgaaactgctcgtaaacatatcactgcaggcgcaaaaaaagtt
RNA sequence has "U" instead of "T"
• Where are the genes?
• Which DNA sequences encode
RNA?
• Which genomic DNA is "junk"?
• Which RNA sequences encode
proteins?
Dobbs #1 - What is Bioinformatics?
8/20/07
18
Molecular Biology Information:
Protein Sequences
Functions: Most cellular functions are either performed by or
•
•
•
•
•
regulated by proteins
Biocatalysis
Cofactor transport/storage
Mechanical motion/support
Immune protection
Regulation of growth and
differentiation
Information:
• 20 letter alphabet:
ACDEFGHIKLMNPQRSTVWY
of amino acids (aa)
• ~ 300 aa in an average protein
Protein sequences:
d1dhfa_
LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTT
d8dfr__
LNSIVAVCQNMGIGKDGNLPWPPLRNEYKYFQRMTS
d4dfra_ ISLIAALAVDRVIGMENAMPWNLPADLAWFKRNTL
d3dfr__ TAFLWAQDRDGLIGKDGHLPWHLPDDLHYFRAQTV
(in bacteria)
• > 3 X 106 known protein sequences
Modified from Mark Gerstein
BCB 444/544 F07 ISU
• What is this protein?
• Which amino acids are most
important for folding, activity, or
interaction with other proteins?
• Which sequence variations are
harmful (or beneficial)?
Dobbs #1 - What is Bioinformatics?
8/20/07
19
Molecular Biology Information:
Macromolecular Structures
DNA/RNA/Protein Structures
• How does a protein (or RNA)
sequence fold into an active 3-D
structure?
• Can we predict structure from
sequence?
• Can we predict function from
structure (or perhaps, from
sequence alone?)
Modified from Mark Gerstein
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
20
We don't understand the protein folding code yet but we try to engineer proteins anyway!
Modified from Mark Gerstein
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
21
Molecular Biology Information:
Biological Processes
Genomics & Systems Biology
• How do patterns of gene expression
determine phenotype?
• Which genes and proteins are
required for differentiation during
during development?
• How do proteins interact in
biological networks?
• Which genes and pathways have
been most highly conserved during
evolution?
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
22
"On a Large Scale?"
Whole Genome Sequencing
1st a complete bacterial genome,
then yeast
Genome sequences now accumulate so
quickly that, in less than a week, a single
laboratory can produce more bits of data
than Shakespeare managed in a lifetime,
although the latter make better reading.
-- G A Pekso, Nature 401: 115-116 (1999)
Modified from Mark Gerstein
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
23
Genome Projects:
Rapid Automated Sequencing
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Another recent improvement: rapid & high resolution separation of
fragments in capillaries instead of gels E Yeung, Ames Lab, ISU
More recently?
Modified from Eric Green
Pyro-sequencing
454 sequencing http://www.454.com/
$ 1000 genomes?
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
24
1st Draft Human Genome:
"Finished" in 2001
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Modified from Eric Green
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
25
Human Genome Sequencing
Two approaches:
• Public (government) - International Consortium
(mainly 6 countries, NIH-funded in US)
• Hierarchical cloning & BAC-to-BAC sequencing
• Map-based assembly
• Private (industry) - Celera, Craig Venter, CEO
• Whole genome random "shotgun" sequencing
• Computational assembly
(took advantage of public maps & sequences, too)
Guess which human genome they sequenced? Craig's
How many genes?
~
20,000 (Science, May 2007)
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
26
Public Sequencing:
International Consortium
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Modified from Eric Green
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
27
So, having a list of parts is not enough!
BIG QUESTION?
How do parts work together to form a
functional system?
SYSTEMS BIOLOGY
What is a system? Macromolecular complex, pathway,
network, cell, tissue, organism, ecosystem…
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
28
Is this Bioinformatics?
• Creating digital libraries
YES
• Automated bibliographic search and text comparison
• Knowledge bases for biological literature
• Methods for structure determination
• Computational X-ray crystallography
• NMR structure determination
• Distance geometry
• Metabolic pathway simulation
Modified from Mark Gerstein
BCB 444/544 F07 ISU
YES
YES
Dobbs #1 - What is Bioinformatics?
8/20/07
29
Is this Bioinformatics?
• Gene identification by sequence inspection
• Prediction of splice sites, promoters, etc.
YES
• DNA methods in forensics
YES
• Modeling populations of organisms
YES
• Ecological Modeling
• Genomic sequencing methods
• Assembling contigs
• Physical and genetic mapping
YES
• Linkage analysis
• Linking specific genes to various traits
Modified from Mark Gerstein
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
YES
8/20/07
30
Is this Bioinformatics?
•
•
•
•
Rational drug design
RNA structure prediction
Protein structure prediction
Artificial life simulations
YES
• Artificial immunology
• Computer security
Modified from Mark Gerstein
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
31
So, this is Bioinformatics
What is it good for?
Just a few examples…
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
32
Designing drugs
• Understanding how proteins bind other molecules
• Structural modeling & ligand docking
• Designing inhibitors or modulators of key proteins
Figures adapted from Olsen Group Docking Page at Scripps, Dyson NMR Group Web
page at Scripps, and from Computational Chemistry Page at Cornell Theory Center).
Modified from Mark Gerstein
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
33
Finding homologs of "new" human genes
Modified from Mark Gerstein
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
34
Finding WHAT?
Homologs - "same genes" in different organisms
• Human vs Mouse vs Yeast
• Much easier to do experiments on yeast to determine function
• Often, function of an ortholog in at least one organism is known
Best Sequence Similarity Matches to Date Between Positionally Cloned
Human Genes and S. cerevisiae Proteins
Human Disease
MIM #
Human
Gene
GenBank
BLASTX
Acc# for
P-value
Human cDNA
Yeast
Gene
GenBank
Yeast Gene
Acc# for
Description
Yeast cDNA
Hereditary Non-polyposis Colon Cancer
Hereditary Non-polyposis Colon Cancer
Cystic Fibrosis
Wilson Disease
Glycerol Kinase Deficiency
Bloom Syndrome
Adrenoleukodystrophy, X-linked
Ataxia Telangiectasia
Amyotrophic Lateral Sclerosis
Myotonic Dystrophy
Lowe Syndrome
Neurofibromatosis, Type 1
120436
120436
219700
277900
307030
210900
300100
208900
105400
160900
309000
162200
MSH2
MLH1
CFTR
WND
GK
BLM
ALD
ATM
SOD1
DM
OCRL
NF1
U03911
U07418
M28668
U11700
L13943
U39817
Z21876
U26455
K00065
L19268
M88162
M89914
9.2e-261
6.3e-196
1.3e-167
5.9e-161
1.8e-129
2.6e-119
3.4e-107
2.8e-90
2.0e-58
5.4e-53
1.2e-47
2.0e-46
MSH2
MLH1
YCF1
CCC2
GUT1
SGS1
PXA1
TEL1
SOD1
YPK1
YIL002C
IRA2
M84170
U07187
L35237
L36317
X69049
U22341
U17065
U31331
J03279
M21307
Z47047
M33779
DNA repair protein
DNA repair protein
Metal resistance protein
Probable copper transporter
Glycerol kinase
Helicase
Peroxisomal ABC transporter
PI3 kinase
Superoxide dismutase
Serine/threonine protein kinase
Putative IPP-5-phosphatase
Inhibitory regulator protein
Choroideremia
Diastrophic Dysplasia
Lissencephaly
Thomsen Disease
Wilms Tumor
Achondroplasia
Menkes Syndrome
303100
222600
247200
160800
194070
100800
309400
CHM
DTD
LIS1
CLC1
WT1
FGFR3
MNK
X78121
U14528
L13385
Z25884
X51630
M58051
X69208
2.1e-42
7.2e-38
1.7e-34
7.9e-31
1.1e-20
2.0e-18
2.1e-17
GDI1
SUL1
MET30
GEF1
FZF1
IPL1
CCC2
S69371
X82013
L26505
Z23117
X67787
U07163
L36317
GDP dissociation inhibitor
Sulfate permease
Methionine metabolism
Voltage-gated chloride channel
Sulphite resistance protein
Serine/threoinine protein kinase
Probable copper transporter
Modified from Mark Gerstein
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
35
Comparative Genomics:
Genome/Transcriptome/Proteome/Metabolome
Databases, statistics
• Occurrence of a specific genes
or features in a genome
• How many kinases in yeast?
• Compare Tissues
• Which proteins are expressed
in cancer vs normal tissues?
• Diagnostic tools
• Drug target discovery
Modified from Mark Gerstein
BCB 444/544 F07 ISU
Dobbs #1 - What is Bioinformatics?
8/20/07
36
Download