Unit 6 Bioinformatics: Genes, Proteins, and Evolutionary Relationships Lesson 1 • 1A – Introduction – Neanderthal Webquest • 1B – Lecture PCR Activity – PCR Lab Lesson 1A - Introduction • Computer Webquest: Neanderthal Genome • Research the Smithsonian Genetics website. • http://humanorigins.si.edu/evidence/gen etics/ancient-dna-and-neanderthals • Respond to questions • Whole class discussion about DNA sequences used and types of biotechnology procedures used in DNA identification. Lesson 1A – What you need to know • By comparing the following types of genetic material, what can we conclude about relationships between Neanderthals and modern humans? • mtDNA • Nuclear DNA • Genes for red hair and pale skin • FOXP2 gene • O allele • TAS2R38 gene • Microcephalin Lesson 1B - PCR • Before Neanderthal DNA sequencing could be done to study and compare the genome to modern humans, the DNA collected from bones, hair, etc. was not adequate in amount for the actual sequencing. • The DNA had to be amplified in order to have sufficient quantity for further testing. • A procedure called the polymerase chain reaction (PCR) was used to amplify the DNA PCR • PCR background • Polymerase Chain Reaction (PCR) is a rapid technique to clone specific DNA fragments. • The technique revolutionized biotechnology with its many applications. • Among these applications are its use in forensics testing as well as a replacement for DNA libraries as it is much faster than building a screening a library. PCR • PCR Technique PCR • Target DNA is put into a PCR test tube. • DNA is mixed with DNA polymerase, deoxyribonucleotides (dATP, dGTP, dCTP and dTTP) and buffer. • A pair of primers (short single stranded DNA nucleotides) is added. The primers are complimentary to nucleotides on the ends of the DNA. PCR • The test tube is placed in a thermocycler, a sophisticated heating block capable of changing temperatures over short time periods. • The thermocycler takes the sample through a series of reactions called the PCR cycle PCR • Each PCR cycle has 3 stages: • Denaturation- Sample is heated to 94-96 degrees C. This causes the DNA to separate into single strands. • Hybridization – Sample is cooled to 55-65 degrees C. This allows the primers to hydrogen bond to complimentary bases at opposite end of the target sequence. PCR • Extension – Sample is heated to 70-75 degrees C. The DNA polymerase copies the target sequences by binding the nucleotides to the 3’ end of each primer. • At the end of one cycle, the amount of DNA has doubled. • Researchers usually run 20-30 cyles of PCR. • After 20 cycles, there are about 1 million copies of target DNA PCR • One of the keys to PCR is the type of DNA polymerase used. • Most DNA polymerase would denature in the heating and cooling process of PCR. • Taq DNA polymerase is used in PCR. • It is isolated from Thermus aquaticus, an Archaea species that thrives in the hot springs of Yellowstone National Park. • Taq is stable at high temperatures. PCR • Cloning PCR products • If you wish to clone a gene made by PCR: • Thermostable polymerases like Taq add a single adenine nucleotide to the 3’ end of all PCR products (It’s a quirk). • PCR products can be ligated to T vectors which are plasmids that have a single stranded thymine nucleotide at each end. • Once ligated, the recombinant plasmid can be introduced into a bacteria. PCR • http://highered.mcgrawhill.com/sites/0072556781/student_view0/chapter14/animation_qui z_6.html • PCR animation PCR Activity - Lab • We will conduct the PCR procedure and check for PCR products with gel electrophoresis. • Refer to your lab handout for directions. 1B – What you need to know • Be able to explain, in detail, the PCR technique. • What is the importance of Taq polymerase? Lesson 2- DNA Sequencing and The Human Genome Project • 2A – Lecture Early History of DNA Sequencing and Sanger method. Activity- DNA Sequencing by Sanger method • 2B – Lecture Human Genome Project and Automated DNA sequencing. Activity – Video: Cracking the Code of Life DNA Sequencing and Human Genome Project • Lesson 2A -Early History • Techniques to sequence DNA developed many years before the Human Genome Project. • In the 1970s and early 1980s, DNA sequencing was done manually and required methods that were expensive, labor intensive, and used dangerous chemicals and radioactive reagents. • Only short sequences were identified. • By the mid 1980s, the Sanger chain termination method was developed by Fred Sanger. It quickly became the method of choice for DNA sequencing DNA Sequencing and The Human Genome Project • Original Sanger method • Four separate reaction tubes are set up. • Each tube contained - Identical DNA of interest - Radioactively labeled primer to get DNA synthesis started - Deoxyribonucleotide phosphate to be used in DNA synthesis (dNTP) - Small amount of dideoxyribonucleotide phosphate (ddNTP) - DNA polymerase. DNA Sequencing and The Human Genome Project • All four test tubes have each of the four nucleotide bases (dNTP) but each one of the tubes will also have one radioactively labeled (ddNTP). • Example • "G" tube: all four dNTP's, ddGTP , DNA polymerase, and primer • "A" tube: all four dNTP's, ddATP , DNA polymerase aqnd primer • "T" tube: all four dNTP's, ddTTP, DNA polymerase and primer • "C" tube: all four dNTP's, ddCTP , DNA polymerase, and primer DNA Sequencing and The Human Genome Project • Sanger Method • DNA strands are separated. • The radioactive primer binds to the 3’ end of the fragment. • DNA polymerase synthesizes a complimentary DNA sequence. • Every time a specific ddNTP is used in the complimentary strand, the DNA synthesis halts. • This creates fragments of different lengths. DNA Sequencing and The Human Genome Project • EX: On the right are the contents of the “A” tube. It has ddATP in it. • The ddATP is used. Where the termination process ends with the ddATP is random in the tube. So you generate fragments of different lengths because every possible A site has incorporated ddATP DNA Sequencing and The Human Genome Project • Sanger Method • The same process that occurred in the A tube occurs in the C, G, and T tube. • The DNA from each tube is run in gel electrophoresis. The banding pattern allows you to sequence the DNA. • The sequence on the right is ATGCCAGTA. • How do you figure this out? DNA Sequencing and The Human Genome Project • Sanger method animations • http://highered.mcgrawhill.com/sites/0072556781/student_view0/chapter15/anima tion_quiz_1.html • Http://www.dnalc.org/resources/animations/sangerseq.html Sanger method activity Complete the DNA sequencing activity as explained in your handout. DNA Sequencing • There are a variety of techniques in use or being explored. • Pyrosequencing – Uses DNA on a bead to sequence complimentary DNA strands. • SOLID – Supported oligonucleotide ligation and detection which generates 6 billion base pairs/reaction. • http://www.youtube.com/watch?v=nlvyF8bFDwM&feature=relat ed • https://www.youtube.com/watch?v=4XMO5VfLIKs • Nanotechnology – to sequence DNA without fluorescent tags. Lesson 2A – What you need to know • How many reaction tubes are used? • What is added to each reaction tube? • Describe the Sanger method procedure. • Explain how gel electrophoresis enables the determination of DNA sequence. DNA Sequencing and The Human Genome Project • Lesson 2B – Human Genome Project • In January 1989, a group of biologists, ethicists, industry scientists, engineers, and computer scientists met and announced the Human Genome Project. • The Human Genome Project was a 13 year research effort (1990-2013) staffed by scientists from all over the world from both the private and public sectors. • There was stiff competition between the public and private sector scientists to sequence the human genome. • In 2001, the leaders of both groups announced the first version of the human genome and published their research simultaneously. DNA Sequencing and The Human Genome Project • The purpose of the Human Genome Project was: 1. Sequence the entire human genome 2. Analyze genetic variations among humans. 3. Map and sequence the genomes of model organisms ,including bacteria, yeast, roundworms, fruit flies, mice, and others. 4. Develop new laboratory technologies such as automated sequencers and computer databases. 5. Disseminate genome information among scientists and the general public. 6. Consider the ethical, legal, and social issues that accompany the HGP and genetic research. DNA Sequencing and The Human Genome Project • When the Human Genome Project started off, results were slow to come. • DNA sequencing was done manually by a combination of the Sanger method and the shotgun cloning method. • In the shotgun method, very large pieces of DNA are cut into smaller overlapping fragments by restriction enzymes. • The overlapping fragments are called contiguous sequences (contigs). • DNA fragments were inserted into bacterial plasmids and replicated along with the bacteria. • Scientists then used the Sanger method to sequence each fragment. • Using computer analysis, the fragments were reassembled into the entire genome sequence. DNA Sequencing and The Human Genome Project DNA Sequencing and The Human Genome Project • In 1988, Celera Genomics focused on the development of automated DNA sequencing technologies. • Celera announced, with this new technology, they could sequence the entire genome by 2001 for $200 million. • A competition between the private and public sectors began and this had an positive impact on the work to be done. • In time, the entire genome project, adopted Celera’s automated DNA sequencing equipment. • Celera’s original sequencing machines could sequence about 500 nucleotides in a single reaction; a vast improvement over the 200 nucleotides sequenced by the manual Sanger method. • Today’s automated DNA sequencing technologies can sequence about 1 billion nucleotides in a single reaction. DNA Sequencing and The Human Genome Project • What did we learn from the Human Genome? • The human genome consist of about 3.1 billion base pairs. • The genome is 99.9% the same among all humans. • Single nucleotide polymorphisms (SNPs) account for the genomic diversity among humans and serve as markers to identify disease. • Less that 2% of the total genome codes for protein. • 98% of the total genome is composed nonprotein coding DNA with 50% of it being repetitive DNA sequences and the other 50% of transposons. DNA Sequencing and The Human Genome Project • What did we learn from the Human Genome? • The genome has approximately 20,000 coding genes. • Many genes make more than one protein; 20,000 genes make 100,000 proteins. • Functions of one half of all human genes is unknown. • Chromosome 1 has the highest number of . The Y chromosome has the least. • Many of the genes in the human chromosome show a high degree of similarity to genes in other organisms. • Thousands of human diseases have been identified and mapped to their chromosomal locations. Video – Cracking the Code of Life • http://video.pbs.org/video/1841308959/ • NOVA – Cracking the Code of Life 153 minutes Lesson 2B – What you need to know • List the purposes of the Human Genome Project. • Describe how DNA was sequenced at the beginning of the genome project. • How did Celera change DNA sequencing. • What did we learn from the Human Genome Project? Lesson 3 Bioinformatics • 3A – Lecture: Overview of Bioinformatics Activity: Bioinformatics Exercise 1 • 3B- Lecture : Gene Databases Activity: Bioinformatics Exercises 2-9 Activity: Using GEO database to research disease. • 3C- Lecture: Protein Databases Activity: Bioninformatics amino acid sequences 3A Bioinformatics - Overview • The work of the Human Genome Project and significant developments in molecular biology have generated staggering amounts of biological information. • The huge amount of nucleic acid and protein sequences have led to the development of an interdisciplinary field called bioinformatics. • Bioinformatics uses computational models to store, organize, analyze and manipulate vast amounts of biological data. • Those involved in the field of bioinformatics make use of biology, computer science, mathematics, genetics, and statistics to analyze DNA and protein sequences, to study genomes, and predict the structure and function of DNA and proteins. • This marriage of biological data and computer science allows scientists to manipulate and analyze gene and protein data via many different computer databases. Bioinformatics Bioinformatics • A biological database is a collection of biological data organized in a specific and useful way. • Bioinformatics data bases are very large, accessible by the Internet, and are continuously updated with new research information. • To acquire information from the database, a request called a query is made and the information obtained is called the result. Bioinformatics • The three comprehensive databases on the Internet are - GenBank at the National Center for Biotechnology Information (in the U.S.) - EMBL: European Molecular Biology Laboratory (in the U.K.) - DDBJ: DNA Database of Japan • The major databases share and integrate the data from different sources, and each database provides links to other online sources • Researches and the public can take advantage of these vast online resources. Bioinformatics • There are a many types of biological data bases 1. Nucleic acid sequences 2. Genome 3. Gene Expression 4. Protein sequences 5. Protein structure 6. Gene and protein comparisons for evolutionary relationships Bioinformatics Activity • Refer to your handout for Exercise 1: Introduction to Bioinformatics. 3A – What you need to know • • • • What is bioinformatics? Why did bioinformatics develop as a field of work? What are the 3 comprehensive databases? What types of databases are available for use? 3B Bioinformatics – Gene Databases • Nucleic Acid Sequence Database • The BLAST program is a useful basic research tool for finding nucleic acid base sequences. • If a researcher has an unknown DNA sequence, the sequence can be entered into nucleotide BLAST as a query. • The program will compare it against all DNA sequences and find a similar identified sequences. Bioinformatics • The similar sequences will be assigned a Max Score and a Total Score which give an indication of the number of aligned sequences. The higher the scores the better the match. • The Query Coverage gives the percentage of sequence alignments. The higher the percentage the better the match. • An Expected value (E value) will also be assigned to the sequence. This is a measure of the number of matching sequences that can be expected by random chance for a particular query. The lower the E value, the more likely the sequences are related to each other. Bioinformatics Bioinformatics Activity • Refer to your handout. • We will complete Exercises 2,3,4,5, and 9 Bioinformatics • Genome Databases • Whole genome databases have been developed for more than 1,000 organisms. • Database tools all scientist to identify genes and gene families within specific genomes, to find locations of genes in the genome, and to analyze evolutionary relationships. • These databases like BLAST are linked to other sequence and literature databases. Bioinformatics Bioinformatics Activity • A quick tour of dbVAR • Go to the NCBI site: http://www.ncbi.nlm.nih.gov/ • Click on Genomes and Maps in the left column. • Click on the Database for Genomic structural variation. • Click on Genome Browser • You can query by gene, chromosome number, or gene location. • Enter PTEN (a gene) and GO. • You will see a map of chromosome 10. • Below you can find the location of the gene (blue line). • Click on it and select GENBANK VIEW. • Click on Gene in right column. Bioinformatics • Gene Expression Databases • The phenotype of an organism-its physical characteristics, the way it interacts with the environment, and diseases – depends on which genes are expressed. • Gene expression varies in different cell types and in any cell over time, based on its developmental stage and the environment. • One of the ways that all expressed genes in a particular cell at a particular time can be measured is through microarray analysis. Bioinformatics • A microarray tray holds all the DNA from a genome. The DNA is cut into fragments and placed in wells on a plastic tray. • All mRNA from a cell of interest is converted to cDNA and tagged with a fluorescent dye. It is also fragmented. • The DNA from the tray and the fluorescent DNA hybridize on the tray. • The tray is scanned by a laser that causes the dye to fluoresce when cDNA binds to gene DNA on the slide. • The fluorescent spots indicate which genes are expressed in the cells of interest. Bioinformatics Bioinformatics • Data collected from these tests show the relationship between a particular phenotype and which genes are being expressed. • Microarray and several other types of tests are used by scientists in researching genetic expression in cells. • Their research findings can be submitted to a Gene Expression Omnibus Database (GEO). • This database has gene expression information with more than 120,000 samples from 200 organisms. Bioinformatics - Activity • GEO Database • We will research Merkel Cell Carcinoma on line. • Then we will visit the GEO database and summarize research literature about genetic expression in this type of cancer. • See your handout for details Bioinformatics • Gene Expression • As we learned in an earlier unit, each gene has a promoter region, sometime an enhancer, the actual gene to be transcribed, and a stop region. • Depending of various factors present, the gene can be expressed or it can be shut off. • Sometimes in gene expression research, scientists need to locate a promoter region, or the actual coding region of the gene. • The following activity shows you how this is accomplished. Bioinformatics Activity • Refer to your handout. • We will complete exercises 6,7,and 8 Lesson 3B - What you need to know • What is the function of nucleotide BLAST? • What do BLAST scores indicate? • What does the query coverage indicate? • What does an E value indicate? • What type of information is provided by dbVAR? • Describe the microarray procedure. • What information does microarray provide? • Information on your BLAST handout and how each search is conducted. 3C –Bioinformatics Proteins • Protein Sequence Databases • Information about DNA sequences is important but proteins determine the structure, function, and behavior of an organism. • Mutations in genes , for example, can lead to the production of defective proteins. • The major goals in protein research are to identify, catalog, and understand the functions of proteins. • Given that there are far fewer genes than proteins, indicates that the protein complement of an organism cannot be fully characterized by gene analysis alone. The study of proteins is a necessary tool to understand the complexities of living cells. Bioinformatics • Scientists use protein sequence databases to identify amino acid sequences based on submission of nucleotide sequences. • Protein sequences can be compared to other protein sequences and this might reveal same or similar function. • Evolutionary relationships can also be discovered in the comparison of protein sequences. • BLAST allows researchers to run protein comparisons or to translate a nucleotide sequence into a protein product. • The BLAST results can help a scientist determine the direction of his/her research. Bioinformatics • Protein Structures • The function of a proteins depends on the 3D structure adopted by the folding of the linear amino acids. • Bioinformatics provides tools that permit scientists to perform online protein modeling. (i.e. predict the shapes of proteins). • When an amino acid sequence is known, it is possible to compare the unknown sequence with identified protein structures in the database. • As novel proteins become known and protein folding patterns are discovered, online protein modeling will become an important skill for biologists studying protein function. Bioinformatics Activity • Refer to your handout for activity directions. • We will translate a nucleotide sequence into an amino acid sequence, compare amino acid sequences to identify proteins from different species, and finally take a look at a protein structure database. Lesson 3B – What you need to know. • • • • • What is the major goal of proteins research? What does blastx do? What does protein blast do? What does online protein modeling enable a researcher to do? Information from your BLAST search and how each search is conducted. Lesson 4 Evolutionary Relationships • Lecture – Evolutionary Relationships • Activity – Construct a phylogenetic tree using amino acid sequences. • Activity – Use bioinformatics to study phylogenetic relationships. Evolutionary Relationships • Speciation is associated with changes in the genetic structures of populations and with genetic divergence of those populations. • Therefore, we are able to use genetic differences among species to reconstruct evolutionary histories and relationships (phylogenies). • After an ancestral species diverges into two separate species, each branch gives rise to independent lineages that accumulate by chance different mutations, usually small base pair changes in their DNA. • Comparing nucleotide sequences and looking for these differences is one way to establish evolutionary relationships. Evolutionary Relationships • The redundancy of the genetic code means that these nucleotide changes might lead to a change in an amino acid and a mutation in an amino acid does not always change the function of a protein. • Scientists look at evolving proteins to study relationships between organisms. • In general with nucleotide sequences and amino acid sequences, the more similar sequences are when compared, the more like the two organisms are closely related. Evolutionary Relationships • Nucleotide Sequences • There are many genes and non-coding regions of DNA that can be compared when establishing evolutionary relationships. • Scientists have used nuclear, mitochondrial, and genomic DNA, along with mRNA, tRNA, rRNA, SINES, and the Y chromosome to name a few. • The choice of which sequences to compare has to do with the types of relationships in which a scientist is interested. Evolutionary Relationships • When looking for relationships among several organisms over very long periods of time, a DNA sequence that is highly conserved is used. • 16srRNA is a good example of this. It is used to show evolutionary relationships (and speciation) among prokaryotes and found on mitochondrial DNA. • The gene is universally present in all bacteria, it mutates slowly, essential functions have been conserved, and random mutations can be correlated to evolution between species. Evolutionary Relationships • On the other hand, when looking at more closely related species, such as at the family, genus, or species level, a DNA sequence that mutates faster is used. • The protein coding genes in mitochondrial DNA( for some proteins in the electron transport chain) are good examples of these types of evolutionary markers. • The Y chromosome is a good example of both types of DNA sequences: • http://www.dnalc.org/view/15092-Studying-the-Y-chromosome-tounderstand-population-origins-and-migration-Michael-Hammer.html Evolutionary Relationships • Proteins • Proteins with a significant amino acid sequence similarity are conserved and are predicted to be members of the same protein family. • Scientists compare amino acid sequences, as well, to determine evolutionary relationships. • Generally, these comparisons are used to study relationships over long periods of time. • Changes in DNA sequences do not always result in changes in a protein. Thus protein evolution is a lengthier process than DNA evolution. • Limitation to protein comparisons is that it is only concerned with gene coding sections of DNA Evolutionary Relationships • http://www.youtube.com/watch?v=mA7BE3mEb64 • Molecular Evolution Video Clip Evolutionary Relationships • Phylogenetic Tree • Once genetic or protein differences are found, the relationships among organisms are often presented in the form of phylogenetic trees. • The groups can be fromthe same species, or larger groups, and branches of the tree represent lineages over time. • Points at which the lines diverge are called nodes and show when a species splits into two or more species. • Each node represents a common ancestor of the species diverging at that node. • The root of the phylogenetic tree represents the oldest common ancestor to all the groups shown on a tree. Evolutionary Relationships Evolutionary Relationships Activity • Construct a phylogenetic tree based on amino acid differences. • See your handout for directions. Evolutionary Relationships Activity • We will be using bioinformatics to study phylogenetic relationships among bears. • It is Exercise 11 in your handout. Lesson 4 – What you need to know • What two methods are used to compare evolutionary relationships? • How are DNA sequences selected for the study of evolutionary relationships? • When are amino acid sequences used in the study of evolutionary relationships. • How do you construct a phylogenetic tree? • Information on your bioinformatics search regarding how the search was conducted.