CHEM 114A | Key Concepts Lectures 6-7 I. Key Concepts from Lecture 6 Restriction Enzymes a. What are they? What do they recognize? What can a cleavage pattern tell you? - Restriction enzymes recognize a specific sequence in double-stranded DNA and cleave both strands. - Cleavage possesses twofold rotational symmetry. - Recognized sequence is palindromic. - Hundreds of restriction enzymes. - Name = three-letter abbreviation of host organism, followed by strain and roman numeral - These enzymes cleave large DNA molecules into smaller sizes that are easier to analyze. - The cleavage pattern, the pattern of fragments produced by a restriction enzyme, can serve as a “fingerprint” of that specific DNA molecule b. Palindromic - Recognized sequence is palindromic; sequence is the same forwards and backwards. II. Blotting a. Northern Blotting= Identifying RNA i. Radiolabeled complementary DNA sequence (single strand) binds to the target RNA. Exact same process as Southern Blotting, but detecting RNA sequence instead of DNA sequence. b. Southern Blotting=Identifying DNA - i. Radioactively labeled complementary DNA sequence (single strand) binds to the DNA on nitrocellulose sheet, if the target DNA sequence is present. Appears on the autoradiogram Used to detect specific DNA molecules amongst many other DNA molecules. c. Western Blotting=Identifying Protein - III. i. Radiolabeled antibody that binds to a protein of interest on a SDS page gel. Allows the detection and identification of specific proteins separated by gel electrophoresis. SDS Polyacrylamide gel Transfer proteins to polymer sheet Add radiolabeled specific antibody Overlay film and develop. DNA Sequencing a. The importance of Sanger sequencing. - i. What is the role of 2’,3’-Dideoxy analog? This analog can be used to control the termination of replication by removing the alcohol group which would bond to the next nucleic acid. - ii. How does Sanger sequencing work? Invented by Frederick Sanger. Based on the generation of DNA fragments whose length is determined by the identity of the last base in the sequence. - Such a collection of fragments can be obtained through the controlled termination of replication. - This is done using dideoxy analogs of each NTP. - Four separate reaction mixtures used. Each containing a small amount of only one dideoxy analog per mixture. Each mixture also has all 4 radioactively labeled dNTP’s. DNA Polymerase used to make complement of the denatured single strand with a primer. Leads to DNA strands of varying length depending on where the dideoxy analog bonded. - Each reaction let run on a denaturing polyacrylamide gel. iii. *Reading a Sanger sequencing gel to determine the original DNA sequence. - Sequence is read from the pattern of chain termination. Read from bottom to top (short strands to long strands). - Fluorescence is more commonly used nowadays than radioactive labelling. - Modern DNA sequencing instruments can sequence more than 106 bases per day. a. How does DNA synthesis work? Is there a size limitation? - DNA strands can be chemically synthesized by the sequential addition of activated monomers. - Synthesis occurs in the 3’ to 5’ direction. - Size limitation of about 100 bases. - Allows the generation of short DNA’s that can be used to amplify genes using PCR. - Can be used to make DNA probes for the aforementioned blotting techniques. - Recent Development: multiple DNA’s are synthesized corresponding to a much larger sequence and joined to form new tailor-made genes V.Polymerase Chain Reaction (PCR) a. - Why is PCR so important? Developed by Kary Mullis in 1984 – Nobel Prize in 1993 Method of greatly amplifying small quantities of specific DNA. Millions of copies can be made from a starting DNA molecule. Flanking sequences of the DNA target must be known. Short DNA must be synthesized that is complementary to these flanking sequences: these are called primers. b. Know the components that are needed for PCR: - i. Pair of primers (Know the purpose of a primer!) Primers needed that hybridize and are complementary to the flanking sequences. ii. All four dNTPS: A, C, T, and G iii. Heat stable DNA polymerase (why would it need to be heat stable?) - From a thermophilic bacteria. Such as Taq polymerase from Thermus aquaticus. - iv. Thermal cycler Machine that cycles between different temperatures. c. Know the three steps of PCR - 1) Strand Separation: the two strands of the target DNA molecule are separated by heating at 95 0C. - 2) Hybridization of Primers: the solution is quickly cooled to 54 oC to allow the primers to hybridize the 5’ and 3’ ends of the target DNA. 3) DNA Synthesis: the solution is heated to 72 oC which is the optimal temperature for DNA synthesis by Taq DNA polymerase. Taq is a heat stable polymerase from the thermophilic bacterium Thermus aquaticus. - This cycle is repeated about 20-35 times. - Ideally, after n cycles, the sequence should be amplified 2n fold. Millionfold after 20 cycles, billionfold after 30 cycles. PCR Real-Life Uses: - Medical diagnostics: very small amount of bacterial and viral DNA can be detected using PCR, HIV can be detected at an early stage. - Forensics: DNA froma crime scene can be greatly amplified for further identification. - Molecular Archaeology: DNA from extinct organismcan be amplified for evolutionary studies. Ex) Neanderthal genome. a. What is the purpose of DNA Ligase? In what situation would you use DNA Ligase? - DNA ligase catalyzes the ligation, joining, of two DNA duplexes having compatible overhangs. Restriction enzymes. - DNA ligase requires a free 3’-hydroxyl group and a 5’-phosphoryl group. - Both DNAs must be double helical. - An energy source such as ATP is required for the joining of DNAs. - b. Overhangs - Restriction enzymes can produce either 5’ or 3’ overhangs. Ligase needs one of each overhang. c. Both DNAs must be double helical d. ATP dependent CHEM 114A | Key Concepts Lectures 6-7 VII. Cloning - DNA ligase can be used to insert novel DNA sequences into a DNA vector. A DNA vector is inserted into the host where it replicates autonomously, by itself. Two commonly used vectors are plasmids and phages. Vectors can be prepared for cloning by cutting with a suitable restriction enzyme followed by the ligation with target DNA. Both vector and DNA target must have compatible ends. Vectors allow the production of large quantities of a DNA of interest. a. Vectors i. Plasmid - 1. Circular dsDNA molecules that occur naturally in some bacteria. Plasmids are circular, double stranded DNA molecules that occur naturally in some bacteria. Range in size from 2 to several hundred kilobases (kb) - 2. Antibiotic resistance Plasmids carry genes for a selectable marker, such as antibiotic resistance. - 3. Site that tolerates insertion of a new DNA sequence. Plasmids contain a site that tolerate the insertion of a new DNA sequence. - 4. Puc18 plasmid example - - - a. Origin of replication for propagation in the host organism Plasmids have an origin of replication that is required for propagation in the host organism. b. Amp resistance and beta-galactosidase as selectable markers. pUC18 has ampicillin resistance as a selectable marker. Beta-galactosidase gene encodes a protein that breaks down a sugar analog to produce a blue color. This gene contains a polylinker sequence containing many restriction sites. The presence of an insertion will disrupt the Beta-galactosidase and return a white color. X-Gal turns into galactose and a dark blue compound when exposed to Beta-Galactosidase. Thus, pUC18 has two selectable markers: ampicillin resistance selects for bacterial cells containing the plasmid, B-galactosidase gene allows for blue/white color selection to determine which bacterial cells contain the DNA insert. ii. Phage(bacteriophage) - 1. What are phages? Phages are viruses that infect bacterial cells and replicate, also called bacteriophages. Inject DNA into a bacterial cell resulting in the production of more viral particles. 2. What are the two modes of infection for lambda phage? a. Lytic: viral functions are fully expressed, leads to destruction of the host cell and release of hundreds of virus particles. b. Lysogenic: the phage DNA is integrated into the host genome and can be replicated together with the host DNA. 3. Lambda Phage as cloning vector - a. What are the benefits of phage cloning? Large segments of the 48 kilobase genome of lambda phage can be deleted and replaced with foreign DNA. Mutant phages have been made containing extra restriction sites into which a new DNA can be inserted. The two remaining pieces of lambda DNA after digestion with EcoRI is equal to 72% of a normal genome length. Only DNA measuring from 75 to 105% of a normal lambda genome will be packaged into a viral particle. Phages can tolerate larger DNA insertions than plasmids ( >10 kb). These modified viruses enter bacteria much more easily than plasmid vectors. b. Genomic library Specific genes can be cloned from very large genomes. Genomic DNA is first digested into large fragments. Fragments are isolated that are about 15 kb long using gel electrophoresis. These fragments are ligated to lambda DNA using compatible ends. E.coli bacteria are infected with these recombinant phages. Phages replicate and lyse or kill their bacterial hosts. The resulting lysate contains a large number of phage particles containing fragments from the entire genome. This is known as a genomic library. Key Concepts from Lecture 7 I. Mutagenesis of DNA: Proteins with new functions can be created through directed changes (mutations) in DNA. - a. Deletions A specific sequence within a larger DNA can be excised using restriction enzymes. The remaining ends are joined together by DNA ligase. Can use PCR to make targeted deletions of any size (Professor’s preferred method) overlap extension PCR: primers for first round of PCR have single stranded extensions that are complementary to each other. b. Substitutions - i. Site-directed mutagenesis: Developed by Michael Smith – Nobel Prize in 1993 1. How does it work? Mutant proteins can be made containing a single amino acid substitution using oligonucleotides (primers) with the desired mutation. Need to know the sequence of the gene to be altered. Mutant primer is annealed to the DNA template and is elongated using DNA polymerase. Original parental DNA can be degraded using DpnI, which only cleaves methylated DNA. Only mutant DNA is left which will produce mutant protein. 2. What is the purpose of DpnI? - Original parental DNA can be degraded using DpnI, which only cleaves methylated DNA. c. Insertions i. Cassette Mutagenesis Involves cutting plasmid DNA with two different restriction enzymes to remove a specific region. Then purifying the large fragment. - A newly synthesized DNA fragment containing compatible ends is then ligated into the plasmid. - Allows the swapping of one gene for another. Gene Synthesis: - Completely new proteins with novel functions can be designed and synthesized. - No starting DNA template needed. - Many oligonucleotides are synthesized which correspond to the desired sequence. - These 40-100 base oligonucleotides are annealed and joined together to form the final DNA sequence for the protein. - This final sequence is cloned into a plasmid for the final protein. - Design Amino acid sequence Design and synthesize gene Produce and characterize protein. Genome Sequencing: - The complete genomes of many organisms have been sequenced. - This includes bacteria, fungi, plants, insects, worms, humans, etc. - This has been made possible with the advent of automated DNA sequencers and high speed computers for data analysis and sequence assembly. The First Complete Genome Sequence: - Diagram in lecture 7, slide 9 depicts the genome sequence of the bacterium Haemophilis influenzae - First genome sequence of a free-living organism - This sequence was determined using a “shotgun” approach in which the genomic DNA is shattered unto many smaller pieces followed by sequencing of these fragments. - These random fragments are analyzed by a computer for overlapping regions, which determines how they come together to form the full genome sequence. Human Genome: - Human genome contains 3 billion base pairs of DNA distributed among 23 pairs of chromosomes. (total of 46) - Originally thought that humans would have 100,000 genes, however this was incorrect. - Humans have only ~25,000 genes. - The proteome is more complex due to alternative splicing and post-translational modification. - Human genome contains large amounts of non-coding DNA composed of introns and mobile genetic elements. - Only 1.5% of the human genome code for proteins - >90% of the genome is transcribed into RNA at high levels!! - Noncoding RNAs are most likely playing an important role in eukaryotes. - Area of intense study. Comparative Genomics: - Comparison of the genomes of different organisms can lead to the following insights… - II. III. - - o Allows the identification of novel genes: comparison of the human and pufferfish genomes lead to the identification of 1000 previously unknown genes. o Evolutionary relationships can be determined: comparisons between the human, chimpanzee, and neanderthal genomes gives insights into our own evolutionary history. o Gene Expression Analysis Most genes are present as one copy per genome, however the expression of most genes into mRNA varies widely Gene expression varies from cell type to cell type and also at different points in time or stages of development. The complete genome sequence allows us to systematically look at the expression levels of all the individual genes in an organism. Based on the assumption that the levels of mRNA indicate the level of protein being produced in the cell. High density arrays of oligonucleotides can be constructed which are complementary to the mRNAs produced by the various genes. Binding of an mRNA extract to this DNA microarray or “gene chip: results in fluorescence. This fluorescence can be quantitated to determine gene expression. Red corresponds to gene induction and green corresponds to gene repression. Eg. Monitoring yeast for changes in gene expression with different environmental conditions Allows the determination of gene function and reveals networks of genes. Recombinant Protein Expression Natural levels of specific protein levels are usually pretty low. Eukaryotic genes can be introduced into bacteria. Bacteria can be used as factories to produce proteins from eukaryotes. New genes can be introduced into plants to introduce new properties such as pest resistance. Eg. Bacillus thuringiensis (bacterium) toxin production in peanut plants protect it from damage caused by European corn borer larvae. a. Why do we need to start with an mRNA sequence and not the genomic sequence? To isolate the gene for protein expression, one must start with the mRNA sequence and not the genomic sequence. This is because the genomic DNA will have intron sequences that are removed only upon expression into mRNA. b. Know the process: mRNA Converted into cDNA using reverse transcriptase cDNA ligated into a protein expression plasmid E.coli bacterium infected with this plasmid and the protein is produced. CHEM 114A | Key Concepts Lectures 6-7 c. Know the steps of production of cDNA - Synthetic oligo(dT) primer is annealed to the poly(A) of the mRNA. - Reverse transcriptase uses the free 3’-OH end to initiate cDNA synthesis. - Treatment with alkali (NaOH) at high pH is used to degrade the RNA strand. - Terminal transferase is used to add a string of dGs to the 3’ end of the newly synthesized cDNA to create another primer site of known sequence. - PCR is then used to amplify the cDNA using the oligo(dT) and oligo(dC) primers. - And, only generated in viruses for integration of RNA into target genomic DNA. Synthesized through in vitro revers transcription. Protein Expression Vectors: - cDNA is inserted into a plasmid directly after a plasmid-encoded transcription promoter. - A ribosome binding site (Shine-Dalgarno sequence is located just before the start codon of the gene to be expressed (cDNA). - The resulting cDNA clones can be screened for expression of the protein of interest. d. Bacteria lack enzymes required for the post-translational modification of eukaryotic proteins. i. Eukaryote cells add carbohydrates groups on the surface of proteins as a result of post-translational modification. ii. Eukaryote cells have chaperone proteins that assist in the proper folding of newly synthesized proteins. iii. Since bacteria cells do `have these, a eukaryotic host must be used for expression of a target gene in some cases. - e. Introduction of Recombinant genes into eukaryotes Recombinant DNA can be introduced into eukaryote using several methods… - i. Microinjection DNA is directly injected into the nucleus of a cell using a micropipette. - ii. Electroporation Using a high voltage pulse to make the cell membrane permeable to DNA molecules. iii. Viral vectors - Retroviruses are the most efficient vectors for delivery of foreign DNA into eukaryotic cells. Retroviruses have the capability to integrate the DNA version of their RNA genomes into the host’s chromosomal DNA. This integrated DNA can be expressed and replicated by the host cell machinery. Retroviruses can accept DNA inserts of up to 6 kb Baculovirus is used for the expression of proteins in insect cells. IV.Gene disruption (knockout) a. Why would we need to knockout a gene? - - The function of a gene can be determined by inactivating the gene and looking for the effect upon the organism. This is called a gene “knockout”. b. How would we knockout a gene? Can be done in diverse organisms such as bacteria, yeast, and mice. Gene knockouts are made using homologous recombination with a mutant version of the gene. Gene Disruption by Homologous Recombination: o A mutant version of the target gene is designed. This mutant gene maintains some similarity with the wild type (WT) gene, especially at the 5’ and 3’ ends. o When this gene is introduced into embryonic cells, recombination occurs between similar regions leading to the replacement of the WT gene with the inactive mutant version. o Then look for phenotypic (visible) effects upon the organism. V.RNA Interference (RNAi) - 1998: discovered by Andy Fire and Craig Mello – Nobel Prize 2006 a. Know the process of the RNAi pathway - C. elegans: free living, transparent nematode, about 1 mm in length which lives in temperate soil environments. - dsRNA can be easily introduced into C. elegans worms by directly feeding them E. coli bacteria that produce dsRNA. - Large-scale screens can be done in which many genes are sequentially knocked down one by one to determine the gene function. - PROCEDURE: Introduction of a specific dsRNA into a cell disrupts the mRNA from the genes that contain sequence corresponding to the dsRNA molecule. dsRNA is cut into 21 nucleotide fragments (siRNA) by an enzyme called Dicer. These fragements consist of 19 base pairs with 2 nt of unpaired base at each 5’ end. The two strands of these fragments are separated and incorporated into the RISC complex. The single stranded 21 nt RNA serves to guide RISC to a complementary mRNA which is then degraded. VI. Recombinant DNA and Plants - Introduction of recombinant genes into plants can be done using the following methods… - a. Tumor-inducing plasmids (Ti plasmids) Integrate into genome and can express foreign DNA. The common cell bacterium Agrobacterium tumefaciens infects plants and introduces foreign DNA. A tumor, known as the crown gall, grows at the site of infection. Crown galls synthesize opine which are metabolized by the bacteria. The metabolism of plant cell is diverted to produce food for the Agrobacterium. Ti plasmids carried by the Agrobacterium are responsible for the shift to tumor state and synthesis of opines. A small portion of the Ti plasmid, called T-DNA, is integrated into the plant cell genome. - Foreign DNA can be inserted into the T-DNA region and expressed upon infection into a plant. Only works with dicots, broad-leaved plants such as grapes, and some monocots. b. Electroporation Use of high-voltage electrical pulse that makes the cell wall permeable to DNA. Foreign DNA can be inserted into a larger variety of plants, including cereal grains, using electroporation. The cellulose wall is first degraded by treatment with cellulase to form protoplasts. A mixture of plasmid DNA and protoplasts is subjected to high voltage electrical pulses. DNA enters the cells and expressed foreign DNA. c. Gene gun DNA is coated onto tungsten pellets and then fired into plant cells at high velocity Benefits of Recombinant DNA Technology: - Human gene therapy - Drought resistant plants - Genetically engineered microbes for bioremediation - Production of drugs GMO (Genetically Modified Food): - Myth: Eating foreign DNA will turn you into a mutant!!! - Fact: You eat grams of foreign DNA every day. - The following desired properties can be engineered: pest resistance, drought tolerance, salt tolerance, cold/heat tolerance, nutrition, quantity/crop yield, - With a growing world population, this is a much needed technique. - Genetically modified plants contain engineered or transplanted proteins that can be broken down by your stomach into amino acids. - Organic and regular produce both contain pesticides that can cause human health problems – relatively low concentration. - GMO food can greatly increase crop yields to feed a growing global population. - Greater food production per acre of land resulting in less environmental damage. Molecular Evolution: - Evolution is the foundation for all biology. - Molecular evolution is the study of how proteins, nucleic, acid, and other molecules have changed through time. - Two molecules are said to be homologous if they are derived from a common ancestor and later diverged from this ancestral sequence. a. Homology - - i. Paralogs Homologs that are present within one species. - ii. Orthologs Homologs that are present within different species and have very similar functions. - iii. What’s the difference between paralogs and orthologs? The 3D structures of bovine ribonuclease, human ribonuclease, and human angiogenin are very similar. - Homology can be detected by significant sequence similarity resulting in a common 3D structure. Bovine and human ribonuclease are orthologs. Human ribonuclease and angiogenin are paralogs. iv. Homology can be used to infer function - Large-scale sequencing has resulted in the discovery of many new genes. - These can be compared with genes of known function. - Sequence similarity most likely indicates a similar function in different organisms. - Sequence alignments are performed. - - - b. Sequence alignments Sequences are aligned in regions of similarity either in the specific amino acid sequence or in the physical character of amino acids. Hemoglobin: oxygen-carrying protein in blood Myoglobin: binds oxygen in muscle At first glance, it seems like not much sequence similarity. Sequences are slid past each other to find windows of greatest similarity. There are two good hits: one at the N-terminus, and the other on the C-terminus Both hits can be combined into one by introducing a gap in one of the sequences. Addition of a gap allows all regions of similarity to be included in the alignment. Gap is needed because one protein has evolved to either gain or lose amino acids. c. Scoring Alignments How do you test for the possibility that a grouping of sequence identities has occurred by chance alone? For example it is possible to insert many gaps into sequences to come up with any alignment. Use a scoring system to guard against this possibility. Eg. Each sequence identity is given +10 points, whereas each gap is assessed a penalty of -25 points. Therefore, for the hemoglobin-myoglobin alignment 38 identities and 1 gap results in a score if 355. d. Shuffling How do you know if the similarity is statistically significant? Is it better than a random hit? The amino acid sequence in one of the proteins from the alignment is shuffled. The red bar indicates the original alignment (eg. Hemoglobin-myoglobin). To the far right. e. Conservative substitutions - - i. Know what is a conservative substitution of amino caids. Some substitutions result in the replacement of an amino acid with one that has similar physical properties. The amino acids are similar in size and/or chemical properties. This is called conservative substitution. Conservative substitutions must be accounted for in scoring alignments, and result in a positive score. ii. Substitution matrices - Conservative and single nucleotide substitutions are more likely than substitutions with more radical changes. - We can examine substitutions that have already taken place in existing protein sequences. Substitution matrices can be deduced from this analysis. Large positive score means substitutions occur frequently while a large negative score indicates a rare substitution. Starting amino acid at top. Shaded amino acid only requires a single base mutation for the change. Conservative substitutions have high score. Amino acids such as cysteine and tryptophan are more conserved than amino acids like serine and alanine. Structurally conservative mutations such as lysine for arginine or isoleucine for valine have relatively high scores. This type of scoring system is much better at detecting homology between sequences as compared to just using an exact identity approach. - BLAST: Basic Local Alignment Search Tool: Online database of genome sequences that can be searched. - Expect value (E) should be less than 10-5 for a hit to be considered statistically significant. Expect Value is the number of sequences with this level of similarity expected to be in the database by chance. This should be much less than 1 in order to be statistically significant. - http://www.nci.nih.gov CHEM 114A | Key Concepts Lectures 6-7 - - - - - - - f. Structural homology 3D structure is much more closely associated with function than the primary sequence. a-Hemoglobin, myoglobin, leghemoglobin have very similar structures even though the sequence similarity between human myoglobin and lupin leghemoglobin is only 15.6% and is not statistically significant. These proteins were expected to be related based on their similar biochemical function of binding oxygen. Structural homology can also be found for proteins having unrelated biochemical function. Actin is a major component of the cytoskeleton. Heat shock protein (Hsp) assists in the folding of proteins inside cells. Suggests that they are paralogs. Descended from a common ancestor and adopted different roles. Knowledge of 3D structures can aid in the proper alignment of sequences. In a given family of proteins, residues that are critical for function are highly conserved. i. Can similar 3D structures can be considered homologues if the sequence alignment may be statistically significant? This conservation can be used as a signal in the detection of similar proteins even though the complete sequence alignment may not be statistically significant. g. Convergent Evolution When two unrelated proteins, not descended from a common ancestor, evolve to form the same structure and or function. Eg. Chymotrypsin and subtilisin – cleave peptide bonds through hydrolysis. Their active sites are almost identical, however, the overall 3D structures are very different making it unlikely that they are evolutionarily related. Convergent evolution can also happen at the macroscopic scale, such as in animals like bats and birds, or sharks and dolphins. i. What’s the difference between convergent evolution and paralog/ortholog homology? Convergent evolution is how proteins that are unrelated converge to have similar function/structure, while paralog/ortholog homology details how proteins have diverged. Convergent evolution sees similar function/structure from two unrelated proteins Paralog/ortholog homology takes a common ancestral protein to help determine a similar structure/function. h. What is a motif? More than 10% of all protein can contain motifs that are repeated in that protein. Can be detected by aligning the protein with itself. Are the result of a gene duplication event. i. RNA structural homology RNA folds back on itself to form elaborate secondary structures containing both double and single stranded regions. - Comparison of a conserved RNA sequence from multiple organisms can allow one to determine the secondary structure for that RNA. i. Compensatory mutations - Compensatory mutations: these are mutations that alter the sequence but maintain base pairing within the secondary structure. - MFOLD: a web server for predicting RNA secondary structures. Input RNA sequence of interest and MFOLD will output multiple RNA secondary structure arrange in order from most likely to least likely - http://mfold.rna.albany.edu/?q=mfold/RNA-Folding-Form Carl Woese (1928-2012): - 1977 – discovered that ribosomal RNA was highly conserved and could be aligned from multiple species. Protein synthesis by the ribosome is an ancient reaction carries out by all living things, therefore it is an excellent barometer of evolution. - Found a new class of life Archaea - First to prove that all life on earth was related. - - j. Phylogenetic trees Sequence alignments of given proteins or nucleic acids can be used to construct evolutionary trees, this is called phylogenetic analysis. Length of the branch connecting each pair of proteins is proportional to the number of amino acid differences between the sequences. Analysis of ancient DNA from Neanderthals allows us to determine their position on the evolutionary tree. Approximately 1-4% of human DNA is the result of inbreeding with Neanderthals. i. Know how to read the tree. The closer the branches, the more related the two species/proteins are. Distance between branches is proportional to the value of difference in sequence.