Part A. Fetch PV92 Primer Sequences

Jumping Genes Lead the Way ACTEZ Bioinformatics Workshop, July 19, 2015 Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Part A. Fetch PV92 Primer Sequences Example Sequence: PV 92 Forward and Reverse Primer Sequences from Wetlab Tool(s): Google Concept(s): Sequences are accessible online AZCTE BioScience Standard(s): 12.1 PCR: Polymerase Chain Reaction; a method to amplify defined stretch of DNA from within a larger target DNA complex. I. Find sequences 1. Open browser to http://www.geneticorigins.org/. 2. Click on ‘PV-92 Alu Insertion’ image PCR Primers: Short oligo-nucleotides that match the sequences that frame the target sequence to be amplified by PCR . 3. Click on ‘Continue…’ 4. Click on ‘Recipes.’ 5. Click on Amplicon: Amplified sequence. 6. Copy both PV92 PCR primer sequences. Cycle: PCR requires cycles of varying temperatures at which the reaction: melts the strands apart (T1); allows primers to hybridize to target sequences (T2); and Polymerase enzyme to extend the sequences (T3). II. Store sequences 1. Open a text document. 2. Paste the primer sequences into the document . 3. Safe the document in a place that you’ll remember. Questions: Q.1: What are the lengths of the two PV92 primer sequences in nucleotides [nt]? a. Forward PV92 primer: 25 nucleotides 5’-GGATCTCAGGGTGGGTGGCAATGCT-3’ b. Reverse PV92 primer: 26 nucleotides 5’-GAAAGGCAAGCTACCAGAAGCCCCAA-3’ Q.2: Describe the function of the two primers to amplify DNA by PCR in the wet lab. How does it work? PCR is a method to amplify specific DNA fragments. The primers hybridize with sequences that frame the region to be amplified (=”Amplicon”). Once hybridized they serve a primers to provide the polymerase with 3’ ends to which they can anneal additional nucleotides to extend the primers into strands complementary to the template strand. A PCR cycle utilizes these steps signified by different temperatures to facilitate the 3 functions required for amplification. PCR cycle Step 1: to separate DNA strands @ T > 94 C PCR cycle Step 2: to allow primers to hybridize @ T = 60 C – 80 C (depends on GC content of primers) PCR cycle Step 3: to allow primers to be extended @ T = 72 C (depends on polymerase used) Q.2: What is the significance of the two PV92 primer sequences in relation to the human genome? (Tip: Draw a sketch.) The primer sequences frame the region to be amplified. They hybridize to the amplicon ends and become part of it. 5’-end Primer Primer Amplicon 1 3’-end Jumping Genes Lead the Way ACTEZ Bioinformatics Workshop, July 19, 2015 Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Additional Investigation: Write out the sequences in the human genome to which the primers anneal. Forward Primer 5'-GGATCTCAGGGTGGGTGGCAATGCT-3' (Forward Primer; FP) 3’-CCTAGAGTCCCACCCACCGTTACGA-5’ (Human Genome; HG) 5’-AGCATTGCCACCCACCCTGAGACCT-3’ (Human Genome; HG) Reverse Primer 5'-GAAAGGCAAGCTACCAGAAGCCCCAA-3' (Reverse Primer; RP) 3’-CTTTCCGTTCGATGGTCTTCGGGGTT-5’ (Human Genome; HG) 5’-TTGGGGCTTCTGGTAGCTTGCCTTTC-3’ (Human Genome; HG; nucleotide sequences are usually written out in 5’  3’ direction.) PCR RP: 3’-AACCCCGAAGACCATCGAACGGAAAG-5’ HG: 5’-GGATCTCAGGGTGGGTGGCAATGCTNNNNNNNN……NNNNNNNNTTGGGGCTTCTGGTAGCTTGCCTTTC-3’ 3’-CCTAGAGTCCCACCCACCGTTACGANNNNNNNN……NNNNNNNNAACCCCGAAGACCATCGAACGGAAAG-5’ FP: 5'-GGATCTCAGGGTGGGTGGCAATGCT-3’ 2 Jumping Genes Lead the Way ACTEZ Bioinformatics Workshop, July 19, 2015 Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Part B. Conduct PCR Electronically Example Sequence: PV 92 Forward and Reverse Primer Sequences from Wetlab Tool(s): BLAST (blastn) Concept(s): BLAST sequence (text) queries are equivalent to primers hybridizing to target sequences in PCR AZCTE BioScience Standard(s): 12.1, 12.2, 12.3, 12.5, 12.7, 12.9 BLAST: Basic Local Alignment Search Tool; an algorithm to search databases of biological sequence information (e.g. DNA, RNA, or amino acid sequence) and return matches. III. Conduct a BLAST search (Query) 1. Open browser to http://ncbi.nlm.nih.gov. 2. Click on ‘BLAST.’ 3. Click on ‘nucleotide blast.’ 4. Enter both PV92 primer sequences into the search field. blastn: An algorithm that uses nucleotides to query data bases that contain nucleotide sequences. Query: Sequence use for BLAST search. 5. Enter ‘Organism’ ‘Homo sapiens’ (taxid: 9606). 6. Select ‘Optimize for’ radio button ‘Somewhat similar sequences (blastn).’ Subject: Matching sequence identified by BLAST search. 7. Check ‘Show results in new window.’ 8. Click Score: Measure for how many matches were present in query and subject together. (Higher scores better) IV. Examine results in the ‘Nucleotide Sequence’ section. Q.4: How many query sequences did BLAST use? One, of 51 nucleotides E-value: Measure for likelihood that a match could have arisen just by chance. (Small Evalues = high significance) Q.5: How does the BLAST query sequence relate to the 2 primer sequences? QS = FP + RP Q.6: Would you expect the query sequence used by BLAST to be present as such in the human genome? No V. Examine results in the ‘Graphic Summary’ section. Q.7: What is the color key for higher-scoring matches (alignments)? What for lower-scoring? Red vs. Black Q.8: What are the scores for “better” matches for your BLAST search with the PV92 primer sequences? What for “lesser” matches? Blue for better, Black for lesser VI. Examine results in the ‘Descriptions’ section. Q.9: In the “E value” column, what numerical value does “4e-04” represent? 0.0004 Q.10: Small E values are more significant - are there matches with E values of less than 0.01? Yes, one. Q.11: Do these matches represent the “better” or the “lesser” matches in the graphic section above? The better Q.12: Identify in the “Description” column information about the origin and location of the “best” match. Hs16 3 Jumping Genes Lead the Way ACTEZ Bioinformatics Workshop, July 19, 2015 Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 VII. View more detailed information about the best match Click on the URL in the ‘Description’ column for the best match. Q.13: What is the human chromosome that contains the subject sequence(s) matching the query? Chr 16 Q.14: What is the length of the DNA that contains the sequence(s) that matched the query? 166,271 Q.15: What is the “Number of Matches:” that the query sequence matches on this human genome sequence? Two Q.16: What is the percentage of similarity between the query and the matches? 100%; the match sequences and the primer sequences are identical. Q.17: Match up the nucleotides in the query sequence with those in the human genome sequence. (Record the coordinates then, draw a sketch of how the matches align to the query sequences.) FP (=Query 1-25): From 56722 to 56746 RP (=Query 26-51): From 57137 to 57112 5’-end 3’-end 56722 57112 56746 57137 365 nt 416 nt How far are the two matches apart from each other? 3’-Border Sbjct 1: 56746; 5’-Border Sbjct 2: 57112  Distance of matches = 365 nt (=57112-56746-1) What sequence length do the two matches span? (Include the matching query sequences in the count.) 5’-Border Sbjct 1: = 56722; 3’-Border Sbjct 2 = 57137  Match span = 416 nt (=57137-56722+1) RP: 3’-AACCCCGAAGACCATCGAACGGAAAG-5’ 56722 57137 HG: 5’-GGATCTCAGGGTGGGTGGCAATGCTNNNNNNNN……NNNNNNNNTTGGGGCTTCTGGTAGCTTGCCTTTC-3’ 3’-CCTAGAGTCCCACCCACCGTTACGANNNNNNNN……NNNNNNNNAACCCCGAAGACCATCGAACGGAAAG-5’ FP: 5'-GGATCTCAGGGTGGGTGGCAATGCT-3’ Q.18: Does the answer to Q.17 confirm your answer to Q.6? How so? (If it doesn’t think again about Q.6.) Yes, it does: even the best BLAST hit that results from concatenating the two primer sequences into one 51 nt query sequence contains the primer sequences separated from each other by 364 base pairs. BLAST did not identify a match in which the two primers occur adjacent to each other. Q.19: Predict the size of the amplicon sequence in the wet lab. (Refer to your sketch from Q.17.) 416 base pairs 4 Jumping Genes Lead the Way ACTEZ Bioinformatics Workshop, July 19, 2015 Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 VIII. View the GenBank data sheet for the sequence that holds the best matches Click on the URL ‘AC009028’ in the ‘Accession’ column for the best match to view the GenBank entry for this sequence. Q.20: Identify the following information about the GenBank data sheet for the sequence that contains the sequences that matched the query: Length of sequence in GenBank data sheet: 166,271 bp Year of publication in GenBank: 2000 Definition: Homo sapiens chromosome 16 clone RP11-131F3, complete sequence Organism from which sequence is derived: Homo sapiens Authors: DOE Joint Genome Institute and Stanford Human Genome Center Title: Direct Submission Journal: Unpublished Q.21: Identify the DNA that matches the primer sequences used as query. (Tip: use coordinates from Q.17.) RP: 3’-AACCCCGAAGACCATCGAACGGAAAG-5’ 56722 57137 HG: 5’-GGATCTCAGGGTGGGTGGCAATGCTNNNNNNNN……NNNNNNNNTTGGGGCTTCTGGTAGCTTGCCTTTC-3’ 3’-CCTAGAGTCCCACCCACCGTTACGANNNNNNNN……NNNNNNNNAACCCCGAAGACCATCGAACGGAAAG-5’ FP: 5'-GGATCTCAGGGTGGGTGGCAATGCT-3’ IX. Isolate the PV92 amplicon sequence Copy and paste the sequence for the amplicon by using the coordinates from the sketch above (Q.18) to copy and paste into your text document the nucleotide sequence that the two PV92 primer sequences span in the human genome. Delete all non-nucleotide characters (e.g. numbers), spaces and line breaks. Q.22: How long is this sequence? (There’s a count function in Microsoft Word…) 416 nucleotides Q.23: What is the equivalent of this sequence in the PCR wet lab experiment? The amplicon Additional Investigation: Identify the sequences in the amplicon to which the primers hybridize and compare them to the PV92 primer sequences used for the BLAST search as well as for the wet lab. 5 Jumping Genes Lead the Way ACTEZ Bioinformatics Workshop, July 19, 2015 Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Part C. Examine the PV92 Locus in the Human Genome Example Sequence: Amplicon sequence from Part B, IX. Tool(s): NCBI Genome Viewer, BLAST Concept(s): Genome/Chromosome/Gene organization, gene structure, introns/exons, genome search engines and browsers, genes and function/phenotypes/disorders AZCTE BioScience Standard(s): 12.1, 12.2, 12.5, 12.7, 12.9 Q.24a: How many chromosomes are in the human genome? (Write down your answer.) X. View the human genome 1. Open browser to http://ncbi.nlm.nih.gov. 2. Click on ‘Genome.’ OMIM: Online Mendelian Inheritance in Man. A database that contains all known loci in the human genome that have been found associated with human phenotypes, including diseases and disorders. 3. Click on ‘Map Viewer.’ Track: The individual regions of the display 4. Find human in the ‘Scientific Name’ column. where information of certain types is mapped, 5. Click on the most recent ‘Annotation Release’ in the ‘Build’ column. such as genes, genetic Q.24b: How many chromosomes are in the human genome? (Does the number of chromosomes displayed distances, RNAs, etc.in the Homo sapiens (human) genome view match your response above? If not, what causes the differences?) 25 XI. BLAST search the human genome 1. Click on ‘BLAST search the human genome.’ 2. Paste the amplicon sequence from Step IX. into the search field. 3. Select ‘Chose Search Set’ drop-down ‘Genome (all assemblies…).’ 4. Select ‘Optimize for’ radio button ‘Somewhat similar sequences (blastn).’ 5. Check ‘Show results in new window.’ 6. Click XII. Examine results Click on ‘[Human genome view].’ Q.25: What is the color key for higher-scoring matches? What for lower-scoring? Same as in regular BLAST: black for lower scores, red for higher scores Q.26: Locate the highest scoring match and describe the location. Close to telomere of the long arm of Chr. 16. 6 Jumping Genes Lead the Way ACTEZ Bioinformatics Workshop, July 19, 2015 Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 XII. Move into the single-chromosome view Click on the number for the chromosome that carries the highest scoring match. Q.27: Approximately, how long is the chromosome (in nucleotides). 90,000,000 base pairs (Or 90 Mbp) Q.28: Approximately, at what nucleotide position does the BLAST hit map? The match maps at 83 M XII. Move into the sub-chromosome view Set the ‘Region Shown’ utility on the left to the 10,000,000-nucleotide window that surrounds the match identified in Q.28. (e.g., if the match maps approximately at 83 M, set ‘Region Shown’ from 78M to 88M. click ‘Go.’) Q.29: Approximately, at what nucleotide position does the BLAST hit map now? Match maps at 82.9 M XIII. Rearrange the view 1. Click on 2. Remove from ‘Tracks Displayed’ anything but the ‘Gene’ track, using the ‘-‘ icon at the right. 3. Add a ruler to ‘Gene’ by clicking the ‘R’ icon to the left of the word ‘Gene’ 4. Click on ‘Cytogenetic Maps’ in ‘Available Tracks.’ 5. Click on the ‘+’ icon next to ‘Ideogram.’ (This will add ‘Ideogram’ to ‘Tracks Displayed.’) 6. Click on ‘OK.’ 7. Identify the red indicator for the match in the overview. (Tip: use the ruler to find the position from Q.29.) Q.30: Using the graphs in the ‘Gene_Seq’ column, determine whether PV92 is located in a gene or in the region between genes (=intergenic region). Hard to say Q.31: Approximately, at what nucleotide position does the BLAST hit map? Match maps at 82.85 M XIV. Move into an even more detailed view Set the ‘Region Shown’ utility to the 4,000,000-nucleotide window that surrounds the match identified in Q.28. (e.g., if the match maps approximately at 82.85 M, set ‘Region Shown’ from 80.85M to 84.85M.) Q.32: Using the graphs in the ‘Gene_Seq’ column, determine whether PV92 is located in a gene or in the region between genes (=intergenic region). Could be in a gene Q.33: Approximately, at what nucleotide position does the BLAST hit map now? Match maps at 82.85 M XV. Move into an even more detailed view Set the ‘Region Shown’ utility to the 2,000,000-nucleotide window that surrounds the match identified in Q.28. (e.g., if the match maps approximately at 82.85 M, set ‘Region Shown’ from 81.85M to 83.85M.) 7 Jumping Genes Lead the Way ACTEZ Bioinformatics Workshop, July 19, 2015 Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Q.34: Using the graphs in the ‘Gene_Seq’ column, determine whether PV92 is located in a gene or in the region between genes (=intergenic region). A gene Q.35: Approximately, at what nucleotide position does the BLAST hit map now? 82.87 M Q.36: Approximately, how long is the gene? 1,200,000 bp (=83.8M-82.63M) Q.37: What is the name of the gene? CDH 13 Q.38: Determine whether the gene is a spliced gene or not. Spliced gene with 13 exons (dots), 12 introns (lines) Q.39: Is the PV92 locus in an exon or an intron of the gene? (Zoom in until you can decide that question.) 3rd Intron XVI. Determine the function of a gene Click on the ‘OMIM’ link for the gene Q.40: What is the name of the gene? CDH 13 or Cadherin H 13 Q.41: What is the function of the gene? CDH13 function is not fully understood. CDH13 protein may act as a coreceptor for a signaling receptor through which adiponectin transmits metabolic signals. Q.42: What is the length of the protein that the gene encodes? 731 amino acids Q.43: What is the length of the gene’s coding sequence? 2,196 nt Q.44: What disease(s) has the gene been found associated with? Various cancers Q.45: Would you anticipate a change in phenotype/health if a small (ca. 300 bp) transposon is being inserted into the PV92 locus? Mutations in introns have generally no phenotypic effects. However, this is not always the case as a) mutations in introns can influence splicing, and b) if a gene is alternatively spliced (majority of human genes) an intron in one splice form may become part of an exon in an alternative splice form, in which case an alteration of the intron can have a significant effect of the resulting protein and its function. Additional Investigation: Use the amplicon and NCBI map viewer to identify the PV92 locus in the chimpanzee genome. Follow the procedure from X. through XVI. and answer Q24-Q.55 accordingly. The chimp genome has 26 chromosomes. The human 416 nt PV92 amplicon probe matches a sequence on the long arm of chromosome 16, close to telomer. The match is located in 2nd intron on a gene called cadherin 13. The chimp CDH protein is determined to have 713 amino acids. 8 Jumping Genes Lead the Way ACTEZ Bioinformatics Workshop, July 19, 2015 Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Part D. Identify All States of PV92 Known to Man Example Sequence: Amplicon sequence from Part B, IX. Tool(s): BLAST, DNA Subway, MUSCLE Concept(s): Transposons, human evolution, human relatedness, DNA behavior, Junk DNA AZCTE BioScience Standard(s): 12.1, 12.2, 12.3, 12.7, 12.9 XVII. Conduct a BLAST search (Query) Not sure what could go here – by now you are all so smart… 1. Open browser to http://ncbi.nlm.nih.gov. 2. Click on ‘BLAST.’ 3. Click on ‘nucleotide blast.’ 4. Enter the PV92 amplicon sequence into the search field. (From Part B., IX.) 5. Enter ‘Organism’ ‘Homo sapiens’ (taxid: 9606). 6. Select ‘Optimize for’ radio button ‘Somewhat similar sequences (blastn).’ 7. Check ‘Show results in new window.’ 8. Click XVIII. Examine results in the ‘Nucleotide Sequence’ section. Q.46: How does the BLAST query sequence relate to the PV92 wet lab? QS = amplicon Q.47: In whose genome would you expect the query sequence to be present as a contiguous DNA stretch? Human and other primates. However, the PV92 locus is in an intron, so it may not be conserved enough to be discovered by BLAST. XIX. Examine results in the ‘Graphic Summary’ section. Q.48: What are the scores for “better” matches for your BLAST search with the PV92 amplicon sequence? What for “lesser” matches? The score for one match is above 200, for two others it’s 80—200, and for another one 50—80. VI. Examine results in the ‘Descriptions’ section. Q.49: How many matches have E values of less than 0.00000001? Provide some information about these. Three Description E-Value (see Q.9) Number of matches (see Q.15) Accession Sequence Length of GenBank entry (see Q.20) Published (see Q.20) Homo sapiens chromosome 16 clone RP11-131F3… Homo sapiens isolate BAS101 AluPV92 repeat… Human Alu repeat 0.0 7e-35 2e-15 One Two Two AC009028.3 AF302689.1 M57427.1 166,271 bp 788 bp 1,002 bp No Yes, 2001 Yes, 1990 9 Jumping Genes Lead the Way ACTEZ Bioinformatics Workshop, July 19, 2015 Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Q.50: Which of these matches is the same as the match in the PV92 primer BLAST? (Part B.) The best match, AC009028 Q.51: What do the other matches represent that have E values of less than 0.00000001? (Tip: use the more detailed information and alignments that are presented further down in the BLAST result page, as well as the information presented in the GenBank data sheets. Draw out the three different alleles.) The GenBank data sheets for the other two matches contain ‘Alu’ in their titles, so they must both relate to the state of the PV92 locus that carries the Alu insertion. While the PV92 locus was entered as part of the human genome sequence 166,271 clone in gb|AC009028.3|AC009028 in 2000, the other two entries were published in 1990 and 2001, respectively. They both also contain much shorter sequences than gb|AC009028.3|AC009028, indicating that these two entries were generated specifically in association with work on the PV92::Alu form of the PV92 locus. Closer analysis reveals that gb|AF302689.1|AF302689 describes a 306 bp Alu insertion into the PV92 locus, that is accompanied by a 7 bp duplication of the original PV92 sequence GAAAGAA. This form of the PV92 locus, if amplified by the primers listed above, would yield a 731 bp amplicon (416 + 308 + 7 = 731). Close inspection of the gb| M57427.1| M57427 match reveals that the PV92 locus in this GenBank entry contains a 335 bp Alu plus a 11 bp duplication inserted into the Alu described in gb AF302689.1| AF302689. Amplified by PCR, this PV92 allele would yield a 1,075 bp amplicon. The following graph compares the amplicons that would amplify from the three different alleles using the PV92 primers from Part A. 10 Jumping Genes Lead the Way ACTEZ Bioinformatics Workshop, July 19, 2015 Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Q.52: How are the three best matches related to each other? Which is ancestral? Which most recent? (Tip: Comparing the sequences from the three original GenBank sheets is very complicated. Instead, combine the drawings for the amplicons from Q.51 into one drawing. Alternatively, open the file that contains the three amplicon sequences for the PV92 locus and align the sequences using DNA Subway and/or MEGA.) A combined analysis using DNA Subway and MEGA reveals these relationships between the PV92 alleles: 11 Jumping Genes Lead the Way ACTEZ Bioinformatics Workshop, July 19, 2015 Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 From oldest to most recent: PV92 > PV92::Alu > (PV92::Alu)::Alu PV92 is the ancestral locus in the human genome. PV92::Alu came about through an insertion of the retro-transposon Alu into the PV92 locus. This form of the locus is present in all human populations around the world and must therefore have occurred in the population that gave rise to all modern humans, prior to the migration out of Africa. This insertion is not fixed, however (meaning that the PV92 locus is also present in its ancestral form), and allows using PV92 to conduct population studies. (PV92::Alu)::Alu came about through an Alu insertion into a PV92::Alu allele. Close analysis reveals that the secondary Alu insertion is located almost in the middle of the original Alu. Alu elements can be grouped into a number of different families and the two Alu elements in (PV92::Alu)::Alu belong to different Alu families. This secondary insertion is estimated to have occurred in a person living somewhere in the Basque region of Europe, in the mountainous region between northern Spain and southern France. Despite the neutral character of this insertion it has been passed on and can now sometimes be seen in people that can trace their origin to ancestors from the Basque region. (PV92::Alu)::Alu can therefore be used to study a single population. Additional Investigation: Determine the sizes of the amplicons that would be amplified from the three PV92 alleles if the same primers were used as in the wet lab. PV92 = 416 bp As calculated in Q.17 the PV92 amplicon is 416 bp long. PV92::Alu = 731 bp The data in GenBank entry gb|AF302689.1| AF302689 reveal a 306 bp Alu insertion into the PV92 locus, that is accompanied by a 7 bp duplication of genomic sequence at the end of the insertion (GAAAGAA). This PV92::Alu form of the PV92 locus, if amplified by the primers listed above, generates a 416 + 308 + 7 = 731 bp amplicon. (PV92::Alu)::Alu = 1,075 bp GenBank entry gb| M57427.1| M57427 reveals that the PV92 allele (PV92::Alu)::Alu contains a 335 bp Alu insertion in the primary Alu plus a 11 bp duplication (TACCAAAAATT) inserted into the Alu insertion in gb AF302689.1| AF302689. Amplified by PCR, the (PV92::Alu)::Alu allele generates a 1,075 bp amplicon. 12 Jumping Genes Lead the Way ACTEZ Bioinformatics Workshop, July 19, 2015 Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A hilgert@email.arizona.edu; (520) 626-1367 Students Discover Biological Concepts Using Bioinformatics Genomes and Repetitive DNA  A genome is an organism’s entire complement of DNA.  DNA is a directional molecule composed of two anti-parallel strands.  The genetic code is read in a 5’ to 3’ direction, referring to the 5’ and 3’ carbons of deoxyribose.  Eukaryotic genomes contain large amounts of repetitive DNA, including simple repeats and transposons.  Transposons can be located in intergenic regions (between genes) or in introns (within genes).  Genes and transposons are directional, and can be encoded on either DNA strand.  Repeats are non-directional, and, in effect, do occur on both strands.  Transposons can mutate like any other DNA sequence. Genes and Proteins  Protein-coding information in DNA and RNA begins with a start codon, is followed by codons, and ends with a stop codon.  Codons in mRNA (5’-AUG-3’, etc.) have sequence equivalents in DNA (5’-ATG-3’, etc.).  The DNA strand that is equivalent to mRNA is called the “coding strand.” The complementary strand is called the “template strand,” because it serves as the template for synthesizing mRNA.  Non-spliced genes, which are characteristic of prokaryotes, are also found in eukaryotes.  Even in a spliced gene, the protein-coding information may be organized as Open Reading Frame (ORF).  Most eukaryotic genes are spliced, whereby intervening segments (introns) are removed and the remaining segments (exons) are spliced together.  Splice sites (exon-intron boundaries) have sequence patterns that are recognized by the splicing apparatus (spliceosome).  Gene prediction programs use consensus sequences around splice sites to predict exon-intron boundaries.  Over 90% of eukaryotic introns have “canonical splice sites,” whereby introns begin with GT (mRNA: GU) and end in AG (mRNA: AG).  The protein coding sequence of a eukaryotic mRNA (or gene) is flanked by 5’- and 3’-untranslated regions (UTRs); introns can be located in UTRs.  In most eukaryotic genes, transcripts are alternatively spliced, yielding different mRNAs and proteins.  UTRs hold information for the half-lives of mRNAs and for regulatory purposes.  Gene > mRNA > CDS.  CDS = nucleotides that encode amino acid sequence.  In mRNA: CDS = ORF. BLAST Searches  Basic Local Alignment Search Tool (BLAST) searches databases for matches to a query DNA or protein sequence.  Gene or protein homologs share sequence similarities due to descent from a common ancestor.  Biological evidence is needed to edit and confirm gene models predicted by computer algorithms.  Biological evidence is most often derived from mRNA transcripts (ESTs, cDNAs, RNAseq). Protein sequence data are available, too, but much less common.  Many ESTs and cDNAs are disrupted by “introns” when they are aligned against genomic DNA.  ESTs & cDNAs may be incomplete.  The BLAST algorithm does not resolve intron/exon boundaries.  The BLAST algorithm is not restricted to detecting sequences that fully match a query (“global” matches) but, instead, matches query subsequences as well (“local” matches).  The BLAST algorithm matches sequences to the fullest extent possible and, often, realigns the same sequence twice. 13

Part A. Fetch PV92 Primer Sequences

Related documents

Products

Support

Part A. Fetch PV92 Primer Sequences

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib