The Human Genome Project

advertisement
The Human Genome Project
at UC Santa Cruz
Phoenix Eagleshadow
November 9, 2004
The Human Genome Project
Began in 1990
• The Mission of the HGP: The quest to
understand the human genome and the
role it plays in both health and disease.
“The true payoff from the
HGP will be the ability to
better diagnose, treat, and
prevent disease.”
--- Francis Collins, Director of the HGP
and the National Human Genome
Research Institute (NHGRI)
The genome is our Genetic Blueprint
• Nearly every human
cell contains 23 pairs
of chromosomes
– 1 - 22 and XY or XX
• XY = Male
• XX = Female
• Length of chr 1-22, X,
Y together is ~3.2
billion bases (about 2
meters diploid)
The Genome is
Who We Are on the inside!
• Chromosomes
consist of DNA
– molecular strings of A,
C, G, & T
– base pairs, A-T, C-G
• Genes
– DNA sequences that
encode proteins
– less than 3% of
human genome
Information coded
in DNA
5000 bases per page
CACACTTGCATGTGAGAGCTTCTAATATCTAAATTAATGTTGAATCATTATTCAGAAACAGAGAGCTAACTGTTATCCCATCCTGACTTTATTCTTTATG AGAAAAATACAGTGATTCC
AAGTTACCAAGTTAGTGCTGCTTGCTTTATAAATGAAGTAATATTTTAAAAGTTGTGCATAAGTTAAAATTCAGAAATAAAACTTCATCCTAAAACTCTGTGTGTTGCTTTAAATAATC
AGAGCATCTGC TACTTAATTTTTTGTGTGTGGGTGCACAATAGATGTTTAATGAGATCCTGTCATCTGTCTGCTTTTTTATTGTAAAACAGGAGGGGTTTTAATACTGGAGGAACAA
CTGATGTACCTCTGAAAAGAGA AGAGATTAGTTATTAATTGAATTGAGGGTTGTCTTGTCTTAGTAGCTTTTATTCTCTAGGTACTATTTGATTATGATTGTGAAAATAGAATTTATCC
CTCATTAAATGTAAAATCAACAGGAGAATAGCAAAAACTTATGAGATAGATGAACGTTGTGTGAGTGGCATGGTTTAATTTGTTTGGAAGAAGCACTTGCCCCAGAAGATACACAAT
GAAATTCATGTTATTGAGTAGAGTAGTAATACAGTGTGTTCCCTTGTGAAGTTCATAACCAAGAATTTTAGTAGTGGATAGGTAGGCTGAATAACTGACTTCCTATC ATTTTCAGGTT
CTGCGTTTGATTTTTTTTACATATTAATTTCTTTGATCCACATTAAGCTCAGTTATGTATTTCCATTTTATAAATGAAAAAAAATAGGCACTTGCAAATGTCAGATCACTTGCCTGTGGT
CATTCGGGTAGAGATTTGTGGAGCTAAGTTGGTCTTAATCAAATGTCAAGCTTTTTTTTTTCTTATAAAATATAGGTTTTAATATGAGTTTTAAAATAAAATTAATTAGAAAAAGGCAA
ATTACTCAATATATATAAGGTATTGCATTTGTAATAGGTAGGTATTTCATTTTCTAGTTATGGTGGGATATTATTCAGACTATAATTCCCAATGAAAAAACTTTAAAAAATGCTAGTGA
TTGCACACTTAAAACACCTTTTAAAAAGCATTGAGAGCTTATAAAATTTTAATGAGTGATAAAACCAAATTTGAAGAGAAAAGAAGAACCCAGAGAGGTAAGGATATAACCTTACC
AGTTGCAATTTGCCGATCTCTACAAATATTAATATTTATTTTGACAGTTTCAGGGTGAATGAGAAAGAAACCAAAACCCAAGACTAGCATATGTTGTCTTCTTAAGGAGCCCTCCCCT
AAAAGATTGAGATGACCAAATCTTATACTCTCAGCATAAGGTGAACCAGACAGACCTAAAGCAGTGGTAGCTTGGATCCACTACTTGGGTTTGTGTGTGGCGTGACTCAGGTAATCT
CAAGAATTGAACATTTTTTTAAGGTGGTCCTACTCATACACTGCCCAGGTATTAGGGAGAAGCAAATCTGAATGCTTTATAAAAATACCCTAAAGCTAAATCTTACAATATTCTCAAG
AACACAGTGAA ACAAGGCAAAATAAGTTAAAATCAACAAAAACAACATGAAACATAATTAGACACACAAAGACTTCAAACATTGGAAAATACCAGAGAAAGATAATAAATAT
TTTACTCTTTAAAAATTTAGTTAAAAGCTTAAACTAATTGTAGAGAAAA AACTATGTTAGTATTATATTGTAGATGAAATAAGCAAAACATTTAAAATACAAATGTGATTACTTAAAT
TAAATATAATAGATAATTTACCACCAGATTAGATACCATTGAAGGAATAATTAATATACTGAAATACAGGTCAGTAGAATTTTTTTCAATTCAGCATGGAGATGTAAAAAATGAAAA
TTAATGCAAAAAATAAGGGCACAAAAAGAAATGAGTAATTTTGATCAGAAATGTATTAAAATTAATAAACTGGAAATTTGACATTTAAAAAAAGCATTGTCATCCAAGTAGATGTG
TCTATTAAATAGTTGTTCTCATATCCAGTAATGTAATTATTATTCCCTCTCATGCAGTTCAGATTCTGGGGTAATCTTTAGACATCAGTTTTGTCTTTTATATTATTTATTCTGTTTACTAC
ATTTTATTTTGCTAATGATATTTTTAATTTCTGACATTCTGGAGTATTGCTTGTAAAAGGTATTTTTAAAAATACTTTATGGTTATTTTTGTGATTCCTATTCCTCTATGGACACCAAGGCT
ATTGACATTTTCTTTGGTTTCTTCTGTTACTTCTATTTTCTTAGTGTTTATATCATTTCATAGATAGGATATTCTTTATTTTTTATTTTTATTTAAATATTTGGTGATTCTTGGTTTTCTCAGCC
ATCTATTGTCAAGTGTTCTTATTAAGCATTATTATTAAATAAAGATTATTTCCTCTAATCACATGAGAATCTTTATTTCCCCCAAGTAATTGAAAATTGCAATGCCATGCTGCCATGTGG
TACAGCATGGGTTTGGGCTTGCTTTCTTCTTTTTTTTTTAACTTTTATTTTAGGTTTGGGAGTACCTGTGAAAGTTTGTTATATAGGTAAACTCGTGTCACCAGGGTTTGTTGTACAGATCA
TTTTGTCACCTAGGTACCAAGTACTCAACAATTATTTTTCCTGCTCCTCTGTCTCCTGTCACCCTCCACTCTCAAGTAGACTCCGGTGTCTGCTGTTCCATTCTTTGTGTCCATGTGTTCTC
ATAATTTAGTTCCCCACTTGTAAGTGAGAACATGCAGTATTTTCTAGTATTTGGTTTTTTGTTCCTGTGTTAATTTGCCCAGTATAATAGCCTCCAGCTCCATCCATGTTACTGCAAAGAA
CATGATCTCATTCTTTTTTATAGCTCCATGGTGTCTATATACCACATTTTCTTTATCTAAACTCTTATTGATGAGCATTGAGGTGGATTCTATGTCTTTGCTATTGTGCATATTGCTGCAAG
AACATTTGTGTGCATGTGTCTTTATGGTAGAATGATATATTTTCTTCTGGGTATATATGCAGTAATGCGATTGCTGGTTGGAATGGTAGTTCTGCTTTTATCTCTTTGAGGAATTGCCATG
CTGCTTTCCACAATAGTTGAACTAACTTACACTCCCACTAACAGTGTGTAAGTGTTTCCTTTTCTCCACAACCTGCCAGCATCTGTTATTTTTTGACATTTTAATAGTAGCCATTTTAACT
GGTATGAAATTATATTTCATTGTGGTTTTAATTTGCATTTCTCTAATGATCAGTGATATTGAGTTTGTTTTTTTTCACATGCTTGTTGGCTGCATGTATGTCTTCTTTTAAAAAGTGTCTGTT
CATGTACTTTGCCCACATTTTAATGGGGTTGTTTTTCTCTTGTAAATTTGTTTAAATTCCTTATAGGTGCTGGATTTTAGACATTTGTCAGACGCATAGTTTGCAAATAGTTTCTCCCATTC
TGTAGGTTGTCTGTTTATTTTGTTAATAGTTTCTTTTGCTATGCAGAAGCTCTTAATAAGTTTAATGAGATCCTGATATGTTAGGCTTTGTGTCCCCACCCAAATCTCATCTTGAATTATA
TCTCCATAATCACCACATGGAGAGACCAGGTGGAGGTAATTGAATCTGGGGGTGGTTTCACCCATGCTGTTCTTGTGATAGTGAATGAGTTCTCACGAGATCTAATGGTTTTATGAGG
GGCTCTTCCCAGCTTTGCCTGGTACTTCTCCTTCCTGCCGCTTTGTGAAAAAGGTGCATTGCGTCCCTTTCACCTTCTTCTATAATTGTAAGTTTCCTGAGGCCTTCCCAGCCATGCTGAA
CTTCAAGTCAATTAAACCTTTTTCTTTATAAATTACTCAGTCTCTGGTGGTTCTTTATAGCAGTGTGAAAATGGACTAATGAAGTTCCCATTTATGAATTTTTGCTTTTGTTGCAATTGCTT
TTGACATCTTAGTCATGAAATCCTTGCCTGTTCTAAGTACAGGACGGTATTGCCTAGGTTGTCTTCCAGGGTTTTTCTAATTTTGTGTTTTGCATTTAAGTGTTTAATCCATCTTGAGTTGA
TTTTTGTATATTGTGTAAGGAAGGGGTCCAGTTTCAATCTTTTGCATATGGCTAGTTAGTTATCCCAGTACCATTTATTGAAAAGACAGTCTTTTCCCCATCGCTCGTTTTTGTCAGTTTT
ATTGATGATCAGATAATCATAGCTGTGTGGCTTTATTTCTGGGTTCTTTATTCTGTTCTATTGGTTTATGTCCCTGTTTTTGTGCCAGTACCATGCTGTTTTGGTTAACATAGCCCTGTAGT
ATAGTTTGAGGTCAGATAGCCTGATGCTTCCAGCTTTGTTCTTTTTCTTAAGATTGCCTTGGCTATTTGGCCTCTTTTTTGGTTCCACATGAATTTTAAAACAGTTGTTTCTAGTTTTTGAA
GAATGTCATTGGTAGTTTGATAGAAATAGCATTTAATCTGTAAATTGATTTGTGCAGTATGGCCTTTTAATGATATTGATTCTTCCTATCCATGAGCATGATATGTTTTCCATTTTGTTTG
TATCCTCTCTGATTTCTTTGTGCAGTGTTTTGTAATTCTCAT TGTAGAGATTTTTCACCTCCCTGGTTAGTTGTATTTTACCCTAGATATTT TATTCTTTTTGTGAAAATTGTGAATGGGAT
TGCCTTCCTGATTTGACTGC CAGCTTGGTTACTGTTGGTTTATAGAAATGCTAGTGATTTTTGTACATTG ATTTTCTTTCTAAAACTTTGCTGAAGTTTTTTTTATTAGCAGAAGGAGCT
TTGGGGCTGAGACTATGGGGTTTTCTAGATATAGAATCATGTCAGCTTCAAATAGGGATAATTTTACTTCCTCTCTTCCTATTTGGATGCCCTTTATTTCTTTCTCTTGCCTGATTACTCTG
GCTGGGATTTCCTATGTTGAATAGGAGT CATGAGAGAGGGCATCAAATCTACACATATCAAATACTAACCTTGAATGTCTAGATATTT TATTCTTTTTGTGAAAATTGTGAATGGGAT
How much data make up the
human genome?
• 3 pallets with 40 boxes per pallet x 5000
pages per box x 5000 bases per page =
3,000,000,000 bases!
• To get accurate
sequence requires
6-fold coverage.
• Now: Shred 18 pallets
and reassemble.
The Beginning of the Project
• Most the first 10 years of the project were
spent improving the technology to
sequence and analyze DNA.
• Scientists all around the world worked to
make detailed maps of our chromosomes
and sequence model organisms, like
worm, fruit fly, and mouse.
UC Santa Cruz gets Involved
Because of the work Professor David
Haussler was doing in the field of
computational biology, UC Santa Cruz
was invited to participate in the HGP in
late of 1999.
Computational biology (or
Bioinformatics) is a research
field that uses computers to
help solve biological problems
The Tech Awards honors the UCSC Genome Bioinformatics Group in 2003!
The Challenges were
Overwhelming
• First there was the
Assembly
The DNA sequence is so long that
no technology can read it all at
once, so it was broken into
pieces.
There were millions of clones
(small sequence fragments).
The assembly process included
finding where the pieces
overlapped in order to put the
draft together.
3,200,000 piece puzzle
The “Working Draft” of the
human genome
ACCTTGG
CCTGAAT
CTAGGCT
TTGCATC
CCTAGTC
CTGATCG
Freeze of sequence data
generated by NCBI
Clone layouts generated
By Washington University
sequence
Assembly generated by
UCSC
Working draft assembly
Clone
maps
UCSC put the human
genome sequence on the
web July 7, 2000
UCSC put the
human genome
sequence on CD in
October 2000, with
varying results
Cyber geeks
Searched
for hidden
Messages,
and
“GATTACA”
The Completion of the Human
Genome Sequence
• June 2000 White House
announcement that the majority
of the human genome (80%)
had been sequenced (working
draft).
• Working draft made available
on the web July 2000 at
genome.ucsc.edu.
• Publication of 90 percent of the
sequence in the February 2001
issue of the journal Nature.
• Completion of 99.99% of the
genome as finished sequence on
July 2003.
The Project is not Done…
• Next there is the Annotation:
The sequence is like a topographical map,
the annotation would include cities, towns,
schools, libraries and coffee shops!
So, where are the genes?
How do genes work?
And, how do scientists use
this information for scientific
understanding and to
benefit us?
What do genes do anyway?
• We only have ~27,000 genes, so that means that
each gene has to do a lot.
• Genes make proteins that make up nearly all we
are (muscles, hair, eyes).
• Almost everything that happens in our bodies
happens because of proteins (walking, digestion,
fighting disease).
OR
OR
Eye Color and Hair Color
are determined by genes
Of Mice and Men:
It’s all in the genes
Humans and Mice have about the same
number of genes. But we are so different
from each other, how is this possible?
Did you say
cheese?
Mmm,
Cheese!
One human gene can make many different
proteins while a mouse gene can only
make a few!
Genes are important
• By selecting different pieces of a gene, your
body can make many kinds of proteins. (This
process is called alternative splicing.)
• If a gene is “expressed” that means it is turned
on and it will make proteins.
What we’ve learned from our
genome so far…
• There are a relatively small number of human
genes, less than 30,000, but they have a complex
architecture that we are only beginning to
understand and appreciate.
-We know where 85% of genes are in the
sequence.
-We don’t know where the other 15% are
because we haven’t seen them “on” (they may only
be expressed during fetal development).
-We only know what about 20% of our genes
do so far.
• So it is relatively easy to locate genes in the
genome, but it is hard to figure out what they
do.
How do scientists find genes?
• The genome is so large that useful
information is hard to find.
• Researchers at UCSC decided to make a
computational microscope to help
scientists search the genome.
• Just as you would use “google” to find
something on the internet, researchers
can use the “UCSC Genome Browser”
to find information in the human genome.
Explore it at http://genome.ucsc.edu
The UCSC Genome Browser
The browser takes you from
early maps of the genome . . .
. . . to a multi-resolution view . . .
. . . at the gene cluster level . . .
. . . the single gene level . . .
. . . the single exon level . . .
. . . and at the single base level
caggcggactcagtggatctggccagctgtgacttgacaag
caggcggactcagtggatctagccagctgtgacttgacaag
The Continuing Project
• Finding the complete set of genes and annotating
the entire sequence. Annotation is like detailing;
scientists annotate sequence by listing what has
been learn experimentally and computationally
about its function.
• Proteomics is studying the structure and function
of groups of proteins. Proteins are really important,
but we don’t really understand how they work.
• Comparative Genomics is the process of
comparing different genomes in order to better
understand what they do and how they work. Like
comparing humans, chimpanzees, and mice that
are all mammals but all very different.
Who works on this stuff anyway?
• Biologists and Chemists understand the
physical sciences-they take biology and
chemistry classes.
• Computer Scientists program the computers
(the same people who make video games!)-they
take math and computer classes.
• Computer Engineers try to build better, faster,
smarter computers-they take math, physics and
computer classes.
• Social Scientists try to understand how this new
information and technology will impact our livesthey take sociology and philosophy classes.
UCSC Summer Workshop on
Human Genome Research
• Held annually in July
• It’s a free event for
students and teachers
• Workshops by faculty and
researchers on a wide
array of topics
• Tours of our laboratories
and kilocluster
• Free breakfast and lunch
• Travel funds are available
• RSVP: 831-459-1702 or
phoenix@soe.ucsc.edu
How can I work on this project, or
something like it?
• Read about it, online at www.genome.gov,
or in Nature, Science, or other scientific
magazines.
• Take classes in biology, chemistry, math,
physics and English classes at high school.
• OR take classes at your local community
college or University-Extension in biology,
bioinformatics, or genetics.
• Go to college and get a degree in science,
engineering, math, or social sciences.
Bioinformatics Opportunities
Bioinformatics
PhD
Biochemistry
Biology
Computer Science
Computer Engineering
Mathematics
MS (MA)
Ocean Sciences
Physics
(Education, Sociology,
Philosophy,
Psychology,
Community Studies)
BS (BA)
A research degree in
any of these majors
will take you far!
Director/Professor University
Company
National Laboratory
Research Foundation
Research Staff Company/University
National Laboratory
Research Foundation
Teaching Community College
Public Schools
Entry-Level Company
National Laboratory
Teaching –
Private Schools
Thank you for letting us come
talk to you today and share
what we do!
Bye!
Come to
UCSC, Slugs
are cool!
Download