Biology and computers - Cal State LA

advertisement
Welcome to Chem 434
Bioinformatics
March 29, 2005
Review of course prerequisites
Course objectives
Become proficient at using existing
bioinformatics software
 Understand the statistics behind bioinformatics
 Write an algorithm that answers a specific
bioinformatics problem

Rationale for offering an introductory
course in bioinformatics
1.
2.
3.
Give the students a chance to understand how
popular bioinformatics algorithms operate
(Clustal W, BLAST, Scan Prosite).
Give the students a programming assignment
that is a unique component of the course.
Give the students insight into bioinformatics so
they can make an intelligent determination of
whether or not to pursue this field.
Learning Goals of CSULA
Intro to Bioinformatics Course
Introduce software and databases currently used
by bioinformaticists
Introduce the organization of the data. How data
is gathered and how it is annotated.
Introduce statistics of data analysis.
Introduce the concept of dynamic programming
Give the students an opportunity to create an
algorithm that analyzes sequence data
Encourage the student to analyze the primary
literature in the field and write about current
research
Course logistics
Course Website
(http://www.calstatela.edu/faculty/jmomand
/Bioinformaticscourse.html)
Syllabus
 Power point presentation
 Homework
 In-class Workshop
 References

Course logistics (continued)
Grading
Homework (100 pts)
 Quizzes (4 x 25 pts)
 Writing Assignment (100 pts)
 Final Project (200 pts)

Course logistics (continued)
Prerequisites




Upper division or graduate student status
Majoring in Computer-related field or Molecular
Life Science
If Computer-related, you must have previously
completed a Molecular Life Science course with
grade C or better.
If Molecular Life Science, you must have previously
completed an object-oriented programming course
or C++ programming course with grade C or better.
Definition of Bioinformatics
Many definitions at the moment:
Use of computers to catalog and organize
biological information into meaningful entities.
 Conceptualization of biology in terms of
molecules and the application of “informatics”
techniques (from disciplines such as applied
math, computer science and statistics) to
understand and organize the information
associated with these molecules

Bioinformatics is Multidisciplinary
Genomics
Drug Design
Computer
Science
Molecular
Life Sciences
Phylogenetics
Structural
Biology
Math
Statistics
How much of the genome is defined?
Unknown Function
How is Bioinformatics Used?
Bioinformatics is used to help “focus”
the experiments of the benchtop scientist
Bioinformatics isn’t going to replace
lab work anytime soon
Experimental proof is still the
“Gold Standard”.
Textbook
Bioinformatics and Functional
Genomics
by Jonathan Pevsner
ISBN: 0471210048
• Genome sequence acquisition
and analysis.
• Basic and applied research with
DNA microarrays.
• Proteomics.
Useful textbooks on the subject (2)
Python Programming for the
Absolute Beginner by M. Dawson
ISBN: 1592000738
Useful textbooks on the subject (3)
Learning Python, Second Edition
by David Ascher and
Mark Lutz
ISBN: 0596002815
Useful textbooks on the subject (4)
Bioinformatics: A Practical Guide to
the Analysis of Genes and Proteins,
Second Edition by Andreas D.
Baxevanis (Editor), B. F. Francis
Ouellette
ISBN: 0471383910
• GenBank database
• Sequence alignment programs
Basis of molecular life sciences
Hierarchy of relationships (some exceptions):
Genome
Gene 1
Gene 2
Gene 3
Gene X
Protein 1
Protein 2
Protein 3
Protein X
Function 1
Function 2
Function 3
Function X
How can one use bioinformatics
to link diseases to genes?
Disease
Positional cloning of
genes
1.
Map
2.
3.
Gene
4.
Function
5.
Find genetic markers
associated with disease
Sequence DNA next to the
markers
Compare DNA from
afflicted individuals to
DNA of normal
individuals (database)
Find abnormality
Predict gene function
from sequence
information
What is the approach used to
sequence genomes?
Divide and conquer
 Split the genome into fragments
 Clone into vectors that can accept large
fragments: yeast artificial chromosomes (YAC
Library)
 Landmarks within the genome can be obtained
using Sequence Tagged Sites (STS)
 Sequences of YAC clones are matched with
each other.
 Sequences that overlap form contigs.
History of the Human Genome
Project
1953
Watson,
Crick
DNA
structure
1972
Berg,
1st
recombinant
DNA
1977
Maxam,
Gilbert,
Sanger
sequence
DNA
1980
1982
1984
1985
1986
Botstein,
Sinsheimer DOE begins
Wada
MRC
Davis,
genome
proposes to publishes hosts
Skolnick
build
first large meeting to studies with
White
discuss HGP $5.3 million
automated genome
propose to sequencing Epstein-Barrat UCSanta
map human robots
virus (170 Cruz;
genome with
Kary Mullis
kb)
RFLPs
develops
PCR
1987
Gilbert announces
plans to start company
to sequence and
copyright DNA;
Burke, Olson, Carle
develop YACs; DonisKeller publish first
map (403 markers)
History of the Human Genome
Project (continued)
1987 (cont) 1988
1989
Hood
produces
first
automated
sequencer;
Dupont
devolops
fluorescent
dideoxynucleotides
Proposal
Venter
Simon
Hood,
to
sequence
announces develops
Olson,
20
Mb
in
strategy to BACs; US
Botstein
model
sequence
and French
Cantor
propose organism by ESTs. He teams
2005;
plans to
publish first
using
Lipman,
patent
physical
STS’s to map
Myers
partial
maps of
the human
chromosome
genome publish the cDNAs;
BLAST
Uberbacher s; first
algorithm develops
genetic maps
GRAIL, a of mouse and
gene finding human
program
genome
published
NIH
supports the
HGP;
Watson
heads the
project and
allocates
part of the
budget to
study social
and ethical
issues
1990
1991
1992
1993
Collins is
named
director
of
NCHGR;
revise
plan to
complete
seq of
human
genome
by 2005
1995
Venter
publishes
first
sequence of
free-living
organism:
H. influenzae
(1.8 Mb);
Brown
publishes on
DNA arrays
1996
Yeast
genome is
sequenced
(S.
cerevisiae)
History of the Human Genome
Project (continued)
1997
Blattner,
Plunket
complete E.
coli
sequence; a
capillary
sequencing
machine is
introduced.
1998
SNP project
is initiated;
rice genome
project is
started;
Venter
creates new
company
called Celera
and proposes
to sequence
HG within 3
years; C.
elegans
genome
completed
1999
2000
NIH
proposes to
sequence
mouse
genome in 3
years; first
sequence of
chromosome
22 is
announced
Celera and
others
publish
Drosphila
sequence
(180 Mb);
human
chromosome
21 is
completely
sequenced;
proposal to
sequence
puffer fish;
Arabadopsis
sequence is
completed
2001
Celera
publishes
human
sequence in
Science; the
HGP
consortium
publishes the
human
sequence in
Nature
2003
Completed
genomes:
112 Microbial
18 Eukaryotes
1275 Viruses
Public funding vs. Private
funding
Public-Taxpayers’ money, international
effort.
Private-Companies that invest money hope
to provide access to their information on a
fee basis. Celera also allows some free
information to small research groups.
Both groups have published the sequence of
the human genome.
Useful Websites that give information on
Bioinformatics courses
http://www.calstatela.edu/faculty/jmomand/Bioinformaticscourse.html
http://gaia.ecs.csus.edu/~mei/bio296c.html
http://www.smi.stanford.edu/projects/helix/bmi214/
http://www.sdsc.edu/~gribskov/bimm140/bimm140_lectures.html
http://www.nslij-genetics.org/bioinfotraining/
Download