HUMAN GENOME PROJECT What is the Human Genome Project? Goal: Sequence all of the nucleotides in the human DNA sequence (“genome”). HUMAN GENOME PROJECT Why: A. To understand how genes work. B. To understand why some genes don’t work. HUMAN GENOME PROJECT Who: A. National Institutes of Health, Dept. of Energy B. International Project When: A. 1990; finish in 15 years HUMAN GENOME PROJECT When: B. First chromosome sequenced (22) - 1996 C. 1/3 of genome completed 1999 “Cracking the Code of Life” Chapter 7 first segment NOTE – all Cracking the Code segments can be found at http://www.pbs.org/wgbh/nova/genome/program.html Celera A. Private company founded by Craig Venter, former NIH scientist B. Finish project in 2 years? “Cracking the Code of Life” Chapter 4 (6:29) How do you profit from sequencing the human genome? “Cracking the Code of Life” Chapter 8 (4:06) Sell information for scientists to look at. Eventually, public project will complete HGP, so what do you sell then? “Cracking the Code of Life” Chapter 7 second segment (49:10). How do you profit? continued Patenting DNA sequences – is this right? Whose data is it? Does patenting DNA sequences encourage or discourage research from being done? Who won? Both groups shared credit for “finishing” the HGP in 2001. Competition sped up sequencing process. “Cracking the Code of Life” link? What Have We Learned From HGP? Humans are 99.9% identical. Total number of genes ~ 30,000. This doesn’t match the number of proteins (over 100,000) so each gene must be able to code for more than one protein. Over 50% of genes have unknown functions. What Have We Learned From HGP? Less than 2% of DNA codes for genes. Most genes are clustered in “urban centers” (not randomly spread out). Over 50% of DNA is “not human” – hitchhiking “junk” DNA. What’s next? Gene regulation – how do genes know when to turn on and off? Proteome – what proteins do these genes code for and what do the proteins do? Personalized medicine – medications to treat you based on your genetics. What’s next? Copy Number Variant – reading SNP’s – reading Epigenetics – reading STR’s - lab How does sequencing work? The Key Missing oxygen #2 = dideoxyribonucleic acid Missing oxygen = deoxyribonucleic acid This is a nucleotide called a dideoxynucleotide. Why are dideoxynucleotides important? Since there is no oxygen on the 3’ end, no additional nucleotides can be added. DNA Synthesis is stopped. What is needed for a Sequencing Reaction? Original DNA Nucleotides Primer DNA Polymerase “Detectable” dideoxynucleotides (radioactivity or fluorescence) Now it’s your turn to sequence! How does a Sequencing Reaction work? www.dnai.org - manipulation - techniques - sorting and sequencing - cycle sequencing Three steps 1. Denaturing – 950C 2. Annealing – 500C 3. Extension – 600C Only one cycle so do not need to use expensive Taq polymerase How does a Sequencing Reaction work? Nucleotides are randomly selected by DNA Polymerase. Sequencing is stopped when ddNTP is randomly selected. Sequences of varying lengths are produced. How would we separate these differently sized pieces? How does a Sequencing Reaction work? Gel Electrophoresis Laser detects the fluorescence of each ddNTP Computer records the order of the colors (order of the bases) How does a Sequencing Reaction work? Results are presented as an “electropherogram”. www.dnai.org - manipulation - techniques - Interview “Inside an automated sequencer”. Sequencing Process Review Sequencing Animation Now it’s your turn to sequence, Part 2! How do you sequence so many letters so quickly? Shotgun sequencing – divide many copies of genome into small bits. Sequence each fragment. Use computers to align sequence. How do you sequence so many letters so quickly? www.dnai.org - genome - The Project - Putting It Together - Animations - Whole Genome Shotgun (private) How do you sequence so many letters so quickly? www.dnai.org - genome - The Project - Putting It Together - Sequencing Game So what can you conclude about shotgun sequencing? Overlapping provides a context. (unlike first Mouse and Cookie sentence fragments). Requires multiple copies each copy cut with a different restriction enzyme to generate overlapping pieces Up to 8% of human genome remains unsequenced due to highly repetitive sections (especially ends and middles– telomeres and centromeres). Whose DNA was sequenced? Public – a random couple from Buffalo, NY Celera – random, nameless volunteers (though Dr. Venter’s DNA was “randomly” selected What’s next? To learn which sequences lead to genetic disorders, many different human genomes need to be sequenced. Which is more important to studying genetic disease? Sequences that are the same? Sequences that are different? WHY? What are those differences called? SNP’s – single nucleotide polymorphisms; DNA sequence that is one letter different. Develop “personalized medicine” based on the exact SNP causing genetic disorder. Are SNP’s the whole story? CNV’S – copy number variants; not everyone has two copies of each gene. Higher number of gene copies, higher level of protein might be produced; not necessarily good. Ex. EGFR copy number can be higher than normal in some types of lung cancer cells. Copy Number Variants What else is next? Epigenome – changes made to DNA structure without altering the sequence of bases. These changes quite often involve a methyl (-CH3) group to tag or mark a gene. Cell normally uses these methyl tags to “turn off” a gene. DNA or histones are methylated. Does this mean that identical twins don’t have to be . . . Identical? YES! Think of the Agouti mice. Only difference is what the mom ate prior to conception and birth. Other epigenome examples? Let’s go to the video! So what’s the sequencing “revolution”? Original sequencing reactions used radioactive ddNTP’s not fluorescent. Results looked like: Problems with Radioactive Sequencing Very difficult to read results Cannot reuse a machine exposed to radioactivity in a machine Again, what’s the “revolution”? Computers and fluorescent ddNTP’s Machines can automatically run a sequencing reaction. Computers can store sequencing data. Fluroescent ddNTP’s make machines reusable. “100 letters in a day vs. 1000 letters every second” More HGP Info Cracking the Code of Life – Chapters 4, 5, 6, and 16 (http://www.pbs.org/wgbh/nova/genome/progra m.html) In order to sequence all DNA, Celera relied on freely available DNA sequence from public research group. Who finished first – public or private research group? When? ELSI? Ethical, Legal, and Social Issues At the beginning of the project, genetic privacy was one of the major concerns – as we learn more about our own DNA sequences: – Who should have access to that information? – How do you help someone interpret that information and decide how to act on it? ELSI? ELSI Video Human Genome Project Cracking the Code of Life – Chapter Two “Getting the Letters Out” http://www.pbs.org/wgbh/nova/genome/pr ogram.html