First Day Overview Power-point

advertisement
The Elements of Molecular Biology
A principal goal is to understand cells and
organisms as molecular systems / machines. The
basic classes of molecules are:
• DNA in the chromosomes of the genome contains all the
information to develop an organism and operate all its cell
types.
• RNA serves both short-term informational roles and
structural roles.
• Proteins execute the functions of a cell and provides its
structural integrity.
• Small metabolites (fats, sugars, etc.) provide energy, raw
materials, and serve some limited structural roles.
Cells As Molecular Machines
Cell
Nucleus
Genome
Polymerase
Gene
TBF
mRNA
Transcription
Splicing
Transport
Signal
Cascade
Ribosome
Protein
Translation
Activation
Receptor
Secretion
Metabolics:
Synthesis
Degradation
Energy
Understanding Cells at the Molecular Level
 Determining the DNA sequences of the chromosomes of a species.
Sequencing
 An accurate parts list of all the proteins and RNAs in the cell.
Annotation
 A graph of all the interactions taking place between these agents.
Pathways
 What is happening during each interaction.
Function
 Where each interaction is taking place.
Subcellular Localization
 Simulating the system to predict behavior.
Cellular Dynamics
Current State

We can sequence the euchromatic portions of genomes.

We can recognize 75% of the genes but not accurately unless they have
been experimentally verified. We don’t know much about alternate
splicing.

We can crudely observe expression of mRNAs and with even greater
difficulty observe the more abundant proteins.

Most accurate molecular biological information is still being verified one
hypothesis at a time.

We must either coordinate efforts or reduce experimental costs to the point
where each investigator is greatly empowered.
Current Technologies

Sequencing: Randomly sample and sequence 600bp stretches from the ends of
segments of a given length and assemble, followed by a directed finishing phase.

Expression Assays: High density arrays where each spot is a set of 18-50bp DNAs
complementary to the RNA sequence to be measured, or geometric amplification
from a pair of DNA probes complementary to the RNA sequence (quantitative
PCR).

Proteomics: Mass spectrometers can measure the amount and atomic weight of
ionized protein pieces (peptides) allowing complex mixtures to be analyzed.

Light Microscopy: With confocal microscopes and antibody, or RNA, or organometallic staining, phenomenon involving but a few particles are being observed.

All of these technologies involve interesting problems in the interpretation of the
data.
Data Analysis vs. Data Mining
On Systems Biology

The goal is to understand cells and organisms as molecular systems. An
objective that can be reached in the next 10-25 years is the elucidation of
the topology of the “circuit” diagram of a species’ cells.


A graph whose nodes are proteins, RNAs, DNAs, and metabolites
Edges or hyper edges describe relationships, e.g.
•



Catalyzes A->B, Kinase (adds phosphate group), Complexes with, etc.
Nodes carry attributes describing where and when the agent is present
Believe that such knowledge will result in tremendous “understanding” in
that it will generate many hypotheses that are true.
Many “circuits” when modeled as stochastic diff. eq.s are so robust to their
parameters that they have only a handful of operating ranges and the
biologically relevent one(s) is readily evident.
Apoptosis: Programmed cell death
A Proposal based on Drosophila

Goal: Develop a comprehensive set of accurate genomics-based data
and biological resources for the advancement of cellular biology.
Make them openly and readily accessible and available to all.
1. Sequence 10-20 species of Drosophila, 3-5 to a nearly-finished state and
10-15 to an assembled state.
2. Understand the evolution of the genomes and accurately annotate the
genes and regulatory signals using comparative informatics.
3. Develop a comprehensive, experimentally verified annotation of all
possible transcripts of the nearly-finished species. Capture each in a
gate-way clone library.
4. Build state of the art expression assays for these genomes and apply them
in a number of surveys.
5. Utilize mass spec. technology to begin systematically exploring complex
formation and signalling cascades of Drosophila cells.
Why Drosophila?

Sufficiently interesting model: development, differentiation,
behavior

Easy to stock lines and many experimental methods.

Compact genome: 20 Dros. Genomes = 1 mammalian genome

Excellent evolutionary and molecular model for humans

Large community: coordinated efforts are possible
Comparative Gene Finding
Require ab initio models to predict orthologs
consistent with the evolutionary ladder
amongst all strains.
40M
30M
20M
sechellia
mauritiana
simulans
melanogaster
orena
teissieri
yakuba
mimetica
pseudoobscura
10M
0M
Given a ladder of 5-10 Drosophila genomes we should be able to
accurately discern most protein encoding genes, many RNA
genes, and a host of regulatory signals
The Role of Informatics
 We need to make computers easier to program – i.e. we need to put
scientific computing in the hands of the scientists.
 Our information management technologies are inadequate – huge data
sets, semi-structured, data contains errors, not integrated – we need to
model these and develop flexible data mining capabilities over them.
 There will be a continued need for new algorithms and tools as driven by
new technologies and protocols.
 Physical simulations systems of various types will be needed – docking,
ligand binding, stochastic differential equations.
 Experimental design, driven by analysis and simulation, should be a part
of our discipline and is an area where we can but are not contributing.
Download