Jumping Genes Lead the Way Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Part A. Fetch PV92 Primer Sequences Example Sequence: PV 92 Forward and Reverse Primer Sequences from Wetlab Tool(s): Google Concept(s): Sequences are accessible online AZCTE BioScience Standard(s): 12.1 PCR: Polymerase Chain Reaction; a method to amplify defined stretch of DNA from within a larger target DNA complex. I. Find sequences 1. Open browser to http://www.geneticorigins.org/. 2. Click on ‘PV-92 Alu Insertion’ image PCR Primers: Short oligo-nucleotides that match the sequences that frame the target sequence to be amplified by PCR . 3. Click on ‘Continue…’ 4. Click on ‘Recipes.’ 5. Click on Amplicon: Amplified sequence. 6. Copy both PV92 PCR primer sequences. Cycle: PCR requires cycles of varying temperatures at which the reaction: melts the strands apart (T1); allows primers to hybridize to target sequences (T2); and Polymerase enzyme to extend the sequences (T3). II. Store sequences 1. Open a text document. 2. Paste the primer sequences into the document . 3. Safe the document in a place that you’ll remember. Questions: Q.1: What are the lengths of the two PV92 primer sequences in nucleotides [nt]? a. Forward PV92 primer: ________ b. Reverse PV92 primer: ________ Q.2: Describe the function of the two primers to amplify DNA by PCR in the wet lab. How does it work? _________________________________________________________________________________________________ _________________________________________________________________________________________________ _________________________________________________________________________________________________ Q.2: What is the significance of the two PV92 primer sequences in relation to the human genome? (Tip: Draw) _________________________________________________________________________________________________ Sketch how the primers appear in the human genome in relation to the amplicon. 1 Jumping Genes Lead the Way Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Additional Investigation: Write out the sequences in the human genome to which the primers anneal. Use the sequences from the Lab Manual and write out the nucleotides on the respectively other strand; indicate orientations (sequences are usually written in 5’ 3’ direction.) 2 Jumping Genes Lead the Way Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Part B. Conduct PCR Electronically Example Sequence: PV 92 Forward and Reverse Primer Sequences from Wetlab Tool(s): BLAST (blastn) Concept(s): BLAST sequence (text) queries are equivalent to primers hybridizing to target sequences in PCR AZCTE BioScience Standard(s): 12.1, 12.2, 12.3, 12.5, 12.7, 12.9 BLAST: Basic Local Alignment Search Tool; an algorithm to search databases of biological sequence information (e.g. DNA, RNA, or amino acid sequence) and return matches. III. Conduct a BLAST search (Query) 1. Open browser to http://ncbi.nlm.nih.gov. 2. Click on ‘BLAST.’ 3. Click on ‘nucleotide blast.’ 4. Enter both PV92 primer sequences into the search field. blastn: An algorithm that uses nucleotides to query data bases that contain nucleotide sequences. Query: Sequence use for BLAST search. 5. Enter ‘Organism’ ‘Homo sapiens’ (taxid: 9606). 6. Select ‘Optimize for’ radio button ‘Somewhat similar sequences (blastn).’ Subject: Matching sequence identified by BLAST search. 7. Check ‘Show results in new window.’ 8. Click Score: Measure for how many matches were present in query and subject together. (Higher scores better) IV. Examine results in the ‘Nucleotide Sequence’ section. Q.4: How many query sequences did BLAST use? How long? ___________ E-value: Measure for likelihood that a match could have arisen just by chance. (Small Evalues = high significance) Q.5: How does the BLAST query sequence relate to the 2 primer sequences? ________ Q.6: Would you expect the query sequence used by BLAST to be present as such in the human genome? _____ V. Examine results in the ‘Graphic Summary’ section. Q.7: What is the color key for higher-scoring matches (alignments)? What for lower-scoring? ________________ Q.8: What are the scores for “better” matches for your BLAST search with the PV92 primer sequences? What for “lesser” matches? _____________________ VI. Examine results in the ‘Descriptions’ section. Q.9: In the “E value” column, what numerical value does “4e-04” represent? _______________ Q.10: Small E values are more significant - are there matches with E values of less than 0.01? _______________ Q.11: Do these matches represent the “better” or the “lesser” matches in the graphic section above? __________ Q.12: Identify in the “Description” column information about the origin and location of the “best” match. _______ 3 Jumping Genes Lead the Way Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 VII. View more detailed information about the best match Click on the URL in the ‘Description’ column for the best match. Q.13: What is the human chromosome that contains the subject sequence(s) matching the query? ________ Q.14: What is the length of the DNA that contains the sequence(s) that matched the query? ________ Q.15: What is the “Number of Matches:” that the query sequence matches on this human genome sequence? ___ Q.16: What is the percentage of similarity between the query and the matches? _________ Q.17: Match up the nucleotides in the query sequence with those in the human genome sequence. (Record the coordinates then, draw a sketch of how the matches align to the query sequences.) _____________________________________________________________________________________________________ Sketch out how the primers match the human genome sequence How far are the two matches apart from each other? _____________________________________________________________________________________________________ What sequence length do the two matches span? (Include the matching query sequences in the count.) _____________________________________________________________________________________________________ Q.18: Does the answer to Q.17 confirm your answer to Q.6? How so? (If it doesn’t think again about Q.6.) _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ Q.19: Predict the size of the amplicon sequence in the wet lab. (Refer to your sketch from Q.17.) ___________ 4 Jumping Genes Lead the Way Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 VIII. View the GenBank data sheet for the sequence that holds the best matches Click on the URL ‘AC009028’ in the ‘Accession’ column for the best match to view the GenBank entry for this sequence. Q.20: Identify the following information about the GenBank data sheet for the sequence that contains the sequences that matched the query: Length of sequence in GenBank data sheet: ___________ Year of publication in GenBank: ___________ Definition: ___________ Organism from which sequence is derived: ___________ Authors: ______________________ Title: ___________ Journal: ___________ Q.21: Identify the DNA that matches the primer sequences used as query. (Tip: use coordinates from Q.17.) IX. Isolate the PV92 amplicon sequence Copy and paste the sequence for the amplicon by using the coordinates from the sketch above (Q.18) to copy and paste into your text document the nucleotide sequence that the two PV92 primer sequences span in the human genome. Delete all non-nucleotide characters (e.g. numbers), spaces and line breaks. Q.22: How long is this sequence? (There’s a count function in Microsoft Word…) ___________ Q.23: What is the equivalent of this sequence in the PCR wet lab experiment? ___________ Additional Investigation: Identify the sequences in the amplicon to which the primers hybridize and compare them to the PV92 primer sequences used for the BLAST search as well as for the wet lab. _________________________________________________________________________________________________ _________________________________________________________________________________________________ _________________________________________________________________________________________________ _________________________________________________________________________________________________ 5 Jumping Genes Lead the Way Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Part C. Examine the PV92 Locus in the Human Genome Example Sequence: Amplicon sequence from Part B, IX. Tool(s): NCBI Genome Viewer, BLAST Concept(s): Genome/Chromosome/Gene organization, gene structure, introns/exons, genome search engines and browsers, genes and function/phenotypes/disorders AZCTE BioScience Standard(s): 12.1, 12.2, 12.5, 12.7, 12.9 Q.24a: How many chromosomes are in the human genome? (Write down your answer.) X. View the human genome 1. Open browser to http://ncbi.nlm.nih.gov. 2. Click on ‘Genome.’ 3. Click on ‘Map Viewer.’ 4. Find human in the ‘Scientific Name’ column. 5. Click on the most recent ‘Annotation Release’ in the ‘Build’ column. OMIM: Online Mendelian Inheritance in Man. A database that contains all known loci in the human genome that have been found associated with human phenotypes, including diseases and disorders. Track: The individual regions of the display where information of certain types is mapped, such as genes, genetic distances, RNAs, etc. Q.24b: How many chromosomes are in the human genome? (Does the number of chromosomes displayed in the Homo sapiens (human) genome view match your response above? If not, what causes the differences?) ________ XI. BLAST search the human genome 1. Click on ‘BLAST search the human genome.’ 2. Paste the amplicon sequence from Step IX. into the search field. 3. Select ‘Chose Search Set’ drop-down ‘Genome (all assemblies…).’ 4. Select ‘Optimize for’ radio button ‘Somewhat similar sequences (blastn).’ 5. Check ‘Show results in new window.’ 6. Click XII. Examine results Click on ‘[Human genome view].’ Q.25: What is the color key for higher-scoring matches? What for lower-scoring? ___________ Q.26: Locate the highest scoring match and describe the location. ___________ 6 Jumping Genes Lead the Way Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 XII. Move into the single-chromosome view Click on the number for the chromosome that carries the highest scoring match. Q.27: Approximately, how long is the chromosome (in nucleotides). ___________ Q.28: Approximately, at what nucleotide position does the BLAST hit map? ___________ XII. Move into the sub-chromosome view Set the ‘Region Shown’ utility on the left to the 10,000,000-nucleotide window that surrounds the match identified in Q.28. (e.g., if the match maps approximately at 83 M, set ‘Region Shown’ from 78M to 88M. click ‘Go.’) Q.29: Approximately, at what nucleotide position does the BLAST hit map now? ___________ XIII. Rearrange the view 1. Click on 2. Remove from ‘Tracks Displayed’ anything but the ‘Gene’ track, using the ‘-‘ icon at the right. 3. Add a ruler to ‘Gene’ by clicking the ‘R’ icon to the left of the word ‘Gene’ 4. Click on ‘Cytogenetic Maps’ in ‘Available Tracks.’ 5. Click on the ‘+’ icon next to ‘Ideogram.’ (This will add ‘Ideogram’ to ‘Tracks Displayed.’) 6. Click on ‘OK.’ 7. Identify the red indicator for the match in the overview. (Tip: use the ruler to find the position from Q.29.) Q.30: Using the graphs in the ‘Gene_Seq’ column, determine whether PV92 is located in a gene or in the region between genes (=intergenic region). ___________ Q.31: Approximately, at what nucleotide position does the BLAST hit map? ___________ XIV. Move into an even more detailed view Set the ‘Region Shown’ utility to the 4,000,000-nucleotide window that surrounds the match identified in Q.28. (e.g., if the match maps approximately at 82.85 M, set ‘Region Shown’ from 80.85M to 84.85M.) Q.32: Using the graphs in the ‘Gene_Seq’ column, determine whether PV92 is located in a gene or in the region between genes (=intergenic region). ___________ Q.33: Approximately, at what nucleotide position does the BLAST hit map now? ___________ XV. Move into an even more detailed view Set the ‘Region Shown’ utility to the 2,000,000-nucleotide window that surrounds the match identified in Q.28. (e.g., if the match maps approximately at 82.85 M, set ‘Region Shown’ from 81.85M to 83.85M.) 7 Jumping Genes Lead the Way Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Q.34: Using the graphs in the ‘Gene_Seq’ column, determine whether PV92 is located in a gene or in the region between genes (=intergenic region___________ Q.35: Approximately, at what nucleotide position does the BLAST hit map now? ___________ Q.36: Approximately, how long is the gene? ______________________ Q.37: What is the name of the gene? ___________ Q.38: Determine whether the gene is a spliced gene or not. ______________________ Q.39: Is the PV92 locus in an exon or an intron of the gene? (Zoom in until you can decide that question.) ___________ XVI. Determine the function of a gene Click on the ‘OMIM’ link for the gene Q.40: What is the name of the gene? ___________ Q.41: What is the function of the gene? ________________________________________________________________ _____________________________________________________________________________________________________ Q.42: What is the length of the protein that the gene encodes? ___________ Q.43: What is the length of the gene’s coding sequence? ___________ Q.44: What disease(s) has the gene been found associated with? ___________ Q.45: Would you anticipate a change in phenotype/health if a small (ca. 300 bp) transposon is being inserted into the PV92 locus? _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ Additional Investigation: Use the amplicon and NCBI map viewer to identify the PV92 locus in the chimpanzee genome. Follow the procedure from X. through XVI. and answer Q24-Q.55 accordingly. _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ 8 Jumping Genes Lead the Way Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Part D. Identify All States of PV92 Known to Man Example Sequence: Amplicon sequence from Part B, IX. Tool(s): BLAST, DNA Subway, MUSCLE Concept(s): Transposons, human evolution, human relatedness, DNA behavior, Junk DNA AZCTE BioScience Standard(s): 12.1, 12.2, 12.3, 12.7, 12.9 XVII. Conduct a BLAST search (Query) Not sure what could go here – by now you are all so smart… 1. Open browser to http://ncbi.nlm.nih.gov. 2. Click on ‘BLAST.’ 3. Click on ‘nucleotide blast.’ 4. Enter the PV92 amplicon sequence into the search field. (From Part B., IX.) 5. Enter ‘Organism’ ‘Homo sapiens’ (taxid: 9606). 6. Select ‘Optimize for’ radio button ‘Somewhat similar sequences (blastn).’ 7. Check ‘Show results in new window.’ 8. Click XVIII. Examine results in the ‘Nucleotide Sequence’ section. Q.46: How does the BLAST query sequence relate to the PV92 wet lab? ___________ Q.47: In whose genome would you expect the query sequence to be present as a contiguous DNA stretch? _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ XIX. Examine results in the ‘Graphic Summary’ section. Q.48: What are the scores for “better” matches for your BLAST search with the PV92 amplicon sequence? What for “lesser” matches? ______________________ VI. Examine results in the ‘Descriptions’ section. Q.49: How many matches have E values of less than 0.00000001? Provide some information about these. _______ Description E-Value (see Q.9) Number of matches (see Q.15) 9 Accession Sequence Length of GenBank entry (see Q.20) Published (see Q.20) Jumping Genes Lead the Way Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Q.50: Which of these matches is the same as the match in the PV92 primer BLAST? (Part B.) ___________ Q.51: What do the other matches represent that have E values of less than 0.00000001? (Tip: Start with the finding that was published in 1990, than the one from 2001. Use the more detailed information and alignments that are presented further down in the BLAST result page, as well as the information presented in the GenBank data sheets. Draw out the three different alleles.) _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ Sketch an alignment of the three matching sequences 10 Jumping Genes Lead the Way Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Q.52: How are the three best matches related to each other? Which is ancestral? Which most recent? (Tip: Comparing the sequences from the three original GenBank sheets is very complicated. Instead, combine the drawings for the amplicons from Q.51 into one drawing. Alternatively, open the file that contains the three amplicon sequences for the PV92 locus and align the sequences using DNA Subway and/or MEGA.) _______________________________________________________ MUSCLE Alignment from DNA Subway MEGA Phylogenetic Analysis Sketch alignment of the three amplicons Sketch the relationship between the amplicons _____________________________________________________________________________________________________ 11 Jumping Genes Lead the Way Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ Additional Investigation: Determine the sizes of the amplicons that would be amplified from the three PV92 alleles if the same primers were used as in the wet lab. Explain how you arrived at your numbers. PV92 = _____ bp _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ PV92::Alu = _____ bp _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ (PV92::Alu)::Alu = ______ bp _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ _____________________________________________________________________________________________________ 12 Jumping Genes Lead the Way Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Students Discover Biological Concepts Using Bioinformatics Genomes and Repetitive DNA A genome is an organism’s entire complement of DNA. DNA is a directional molecule composed of two anti-parallel strands. The genetic code is read in a 5’ to 3’ direction, referring to the 5’ and 3’ carbons of deoxyribose. Eukaryotic genomes contain large amounts of repetitive DNA, including simple repeats and transposons. Transposons can be located in intergenic regions (between genes) or in introns (within genes). Genes and transposons are directional, and can be encoded on either DNA strand. Repeats are non-directional, and, in effect, do occur on both strands. Transposons can mutate like any other DNA sequence. Genes and Proteins Protein-coding information in DNA and RNA begins with a start codon, is followed by codons, and ends with a stop codon. Codons in mRNA (5’-AUG-3’, etc.) have sequence equivalents in DNA (5’-ATG-3’, etc.). The DNA strand that is equivalent to mRNA is called the “coding strand.” The complementary strand is called the “template strand,” because it serves as the template for synthesizing mRNA. Non-spliced genes, which are characteristic of prokaryotes, are also found in eukaryotes. Even in a spliced gene, the protein-coding information may be organized as Open Reading Frame (ORF). Most eukaryotic genes are spliced, whereby intervening segments (introns) are removed and the remaining segments (exons) are spliced together. Splice sites (exon-intron boundaries) have sequence patterns that are recognized by the splicing apparatus (spliceosome). Gene prediction programs use consensus sequences around splice sites to predict exon-intron boundaries. Over 90% of eukaryotic introns have “canonical splice sites,” whereby introns begin with GT (mRNA: GU) and end in AG (mRNA: AG). The protein coding sequence of a eukaryotic mRNA (or gene) is flanked by 5’- and 3’-untranslated regions (UTRs); introns can be located in UTRs. In most eukaryotic genes, transcripts are alternatively spliced, yielding different mRNAs and proteins. UTRs hold information for the half-lives of mRNAs and for regulatory purposes. Gene > mRNA > CDS. CDS = nucleotides that encode amino acid sequence. In mRNA: CDS = ORF. BLAST Searches Basic Local Alignment Search Tool (BLAST) searches databases for matches to a query DNA or protein sequence. Gene or protein homologs share sequence similarities due to descent from a common ancestor. Biological evidence is needed to edit and confirm gene models predicted by computer algorithms. Biological evidence is most often derived from mRNA transcripts (ESTs, cDNAs, RNAseq). Protein sequence data are available, too, but much less common. Many ESTs and cDNAs are disrupted by “introns” when they are aligned against genomic DNA. ESTs & cDNAs may be incomplete. The BLAST algorithm does not resolve intron/exon boundaries. The BLAST algorithm is not restricted to detecting sequences that fully match a query (“global” matches) but, instead, matches query subsequences as well (“local” matches). The BLAST algorithm matches sequences to the fullest extent possible and, often, realigns the same sequence twice. 13