Biology and computers

advertisement
Southern California
Bioinformatics Summer Institute
Wendie Johnston, Jamil Momand,
Sandra Sharp, Nancy Warter-Perez
SoCalBSI Mission
To identify and educate college students for
successful careers in bioinformatics
Student Requirements:




Junior year undergraduate through 2nd year of graduate school
Majoring in Computer-related field or Molecular Life Science field
3.0 GPA minimum
Desires a career in Bioinformatics field
Goals of SoCalBSI
1) Familiarize computer science
and molecular life science
students with bioinformatics
software programs
2) Introduce programming skills
that will enable students to
independently write
programs
3) Explore the social, moral and
ethical issues associated with
the human genome
sequence
4) Offer career counseling and
expose to career
opportunities
5) Provide research experiences
with professional
bioinformaticists
6) Create opportunities for
interactions with
bioinformaticists
7) Foster long-lasting
professional relationships
Molecular
Life Science
Central Dogma
Molecular Basis of Disease
DNA Primary Sequence
Protein Primary, Secondary,
Tertiary Structure
Molecular Evolution
Signal Transduction
Statistics
Probability Theory
Hidden Markov Chain
Bayes’ Theory
Expect Value
Rule of Counting
Computer Science
Structured Programming
Data Structures
Algorithms
Complexity Analysis
Software Engineering
Core Bioinformatics
Literature Searching
NCBI Model
Scoring Matrices
Dynamic Programming
Global vs. Local Sequence Alignment
Multiple Sequence Alignment
Phylogenetic Trees
Molecular Life Science Databases
Protein Modeling
Microarrays
Ethics
Proteomics
Privacy
Security
Human Genome
Achievement of Program Goals
S. Sharp-coPI
Research Training
Post-Summer Followup
Graduate School
Opportunities
Jackie Leung
N. Warter-Perez-coPI W. Johnston-coPI
Didactic Training
Professional Development
B. Krilowitz -Special Topics instructors
Project
-J. Faust
Evaluation
-E. Torres
-S. Heubach
Coordination of
Research Mentors
-Research Seminar Speakers
BioDiscovery
-T. Chen -M. Pelligrini
Caltech
-B. Hoff -D. Robbins
City of Hope
-G. Larson -S. Tavare
Protein Pathways
-E. Mjolsness -B. Wold
UCLA
-M. Nordborg -T. Yeates
USC
ViaLogy
Program Coordinator
-Student Recruitment
-WebPage Development
-Speaker Travel
-Budget
-Scheduling
-Student Housing
-Field Trips
Who are you?
8 Males, 6 Females
7 graduate students and 7 undergraduates
11 from CA, 1 from MA, 1 from TX, 1 from
NM
Majors in Biochemistry, Biology,
Biostatistics, Computer Science,
Cybernetics, Engineering, Mathematics
Open hours
Computer Workstations open from 8 am-8
pm
SoCalBSI office PS504 8-12 noon M-Th, F
8-4.
Eating on campus
Campus Escort
Welcome to SoCalBSI
June 16, 2003
Review of course prerequisites
Course objectives




Become proficient at using existing bioinformatics
software
Understand the statistics behind bioinformatics
Write an algorithm that answers a specific
bioinformatics problem
Become acquainted with some ethical issues
surrounding the science of the human genome project
Rationale for offering SoCalBSI
Program
1.
2.
3.
Give participants a chance to understand how
popular bioinformatics algorithms operate
(Clustal W, BLAST, BLIMPs).
Give the students the opportunity to create
software that uses concepts from Bioinformatics
history
Give the students an insight into bioinformatics
so they can make an intelligent determination of
how to achieve education and career goals.
Learning Goals of Didactic
Portion of the Program
Introduce software and databases currently used
by bioinformaticists
Introduce the organization of the data. How data
is gathered and how it is annotated.
Introduce statistics of data analysis.
Introduce the concept of dynamic programming
Give the students an opportunity to create an
algorithm that analyzes sequence data
Encourage the student to research a bioinformatics
company and report on its products.
Course logistics
Course Website
(http://www.calstatela.edu/faculty/jmomand
/Bioinformaticscourse.html)
Power point presentation
 In-class Workshop
 References
 Writing assignment

Definition of Bioinformatics
Many definitions at the moment:
Use of computers to catalog and organize
biological information into meaningful entities.
 Conceptualization of biology in terms of
molecules and the application of “informatics”
techniques (from disciplines such as applied
math, computer science and statistics) to
understand and organize the information
associated with these molecules

Bioinformatics is Multidisciplinary
Genomics
Drug Design
Computer
Science
Molecular
Life Sciences
Phylogenetics
Structural
Biology
Math
Statistics
How much of the genome is defined?
Unknown Function
How is Bioinformatics Used?
Bioinformatics is often used to help “focus”
the experiments of the benchtop scientist
Bioinformatics isn’t going to replace
lab work anytime soon
Experimental proof is still the
“Gold Standard”.
Useful textbooks on the subject
Bioinformatics: A Practical Guide to
the Analysis of Genes and Proteins,
Second Edition by Andreas D.
Baxevanis (Editor), B. F. Francis
Ouellette
ISBN: 0471383910
• GenBank database
• Sequence alignment programs
Useful textbooks on the subject (2)
Computational Molecular Biology
by Pavel A. Pevzner
ISBN: 0262161974
• Discussion of dynamic
programming
• Needleman-Wuncsh method
• Smith-Waterman method
• Recursive functions
Useful textbooks on the subject (3)
Discovering Genomics, Proteomics
& Bioinformatics
by Campbell and Heyer
ISBN: 0805347224
•
•
•
•
•
Genome sequence acquisition
and analysis.
Basic and applied research with
DNA microarrays.
Proteomics.
Modeling whole-genome circuits.
Transition from genetics to
genomics: medical case studies.
Basis of molecular biology
Hierarchy of relationships (some exceptions):
Genome
Gene 1
Gene 2
Gene 3
Gene X
Protein 1
Protein 2
Protein 3
Protein X
Function 1
Function 2
Function 3
Function X
How can one use bioinformatics
to link diseases to genes?
Disease
Old days: functional
cloning of genes
1.
Function
2.
Gene
3.
4.
Map
5.
Careful description of
disease
Establish link between
disease and metabolic defect
Isolate protein
Isolate cDNA
Determine if DNA is
mutated in human
How can one use bioinformatics
to link diseases to genes?
Disease
New days: positional cloning
of genes
1.
Map
Gene
Function
2.
3.
4.
5.
Find genetic markers associated
with disease
Sequence DNA in close proximity
to the markers
Compare DNA from afflicted
individuals to DNA of normal
individuals (database)
Find abnormality
Predict gene function from
sequence information
What is the approach used to
sequence genomes?
Divide and conquer
 Split the genome into fragments
 Clone into vectors that can accept large
fragments: yeast artificial chromosomes (YAC
Library)
 Landmarks within the genome can be obtained
using Sequence Tagged Sites (STS)
 Sequences of YAC clones are matched with
each other.
 Sequences that overlap form contigs.
History of the Human Genome
Project
1953
Watson,
Crick
DNA
structure
1972
Berg,
1st
recombinant
DNA
1977
Maxam,
Gilbert,
Sanger
sequence
DNA
1980
1982
1984
1985
1986
Botstein,
Sinsheimer DOE begins
Wada
MRC
Davis,
genome
proposes to publishes hosts
Skolnick
build
first large meeting to studies with
White
discuss HGP $5.3 million
automated genome
propose to sequencing Epstein-Barrat UCSanta
map human robots
virus (170 Cruz;
genome with
Kary Mullis
kb)
RFLPs
develops
PCR
1987
Gilbert announces
plans to start company
to sequence and
copyright DNA;
Burke, Olson, Carle
develop YACs; DonisKeller publish first
map (403 markers)
History of the Human Genome
Project (continued)
1987 (cont) 1988
1989
Hood
produces
first
automated
sequencer;
Dupont
devolops
fluorescent
dideoxynucleotides
Proposal
Venter
Simon
Hood,
to
sequence
announces develops
Olson,
20
Mb
in
strategy to BACs; US
Botstein
model
sequence
and French
Cantor
propose organism by ESTs. He teams
2005;
plans to
publish first
using
Lipman,
patent
physical
STS’s to map
Myers
partial
maps of
the human
chromosome
genome publish the cDNAs;
BLAST
Uberbacher s; first
algorithm develops
genetic maps
GRAIL, a of mouse and
gene finding human
program
genome
published
NIH
supports the
HGP;
Watson
heads the
project and
allocates
part of the
budget to
study social
and ethical
issues
1990
1991
1992
1993
Collins is
named
director
of
NCHGR;
revise
plan to
complete
seq of
human
genome
by 2005
1995
Venter
publishes
first
sequence of
free-living
organism:
H. influenzae
(1.8 Mb);
Brown
publishes on
DNA arrays
1996
Yeast
genome is
sequenced
(S.
cerevisiae)
History of the Human Genome
Project (continued)
1997
Blattner,
Plunket
complete E.
coli
sequence; a
capillary
sequencing
machine is
introduced.
1998
SNP project
is initiated;
rice genome
project is
started;
Venter
creates new
company
called Celera
and proposes
to sequence
HG within 3
years; C.
elegans
genome
completed
1999
2000
NIH
proposes to
sequence
mouse
genome in 3
years; first
sequence of
chromosome
22 is
announced
Celera and
others
publish
Drosphila
sequence
(180 Mb);
human
chromosome
21 is
completely
sequenced;
proposal to
sequence
puffer fish;
Arabadopsis
sequence is
completed
2001
Celera
publishes
human
sequence in
Science; the
HGP
consortium
publishes the
human
sequence in
Nature
2003
Completed
genomes:
112 Microbial
18 Eukaryotes
1275 Viruses
Public funding vs. Private
funding
Public-Taxpayers’ money, international
effort.
Private-Companies that invest money hope
to provide access to their information on a
fee basis. Celera also allows some free
information to small research groups.
Both groups have published the sequence of
the human genome.
Useful Websites that give information on
Bioinformatics courses
http://www.calstatela.edu/faculty/jmomand/Bioinformaticscourse.html
http://gaia.ecs.csus.edu/~mei/bio296c.html
http://www.smi.stanford.edu/projects/helix/bmi214/
http://cmgm.stanford.edu/biochem218/ - Syllabus
http://www.sdsc.edu/~gribskov/bimm140/bimm140_lectures.html
http://linkage.rockefeller.edu/wli/bioinfocourse/
Download