TGAC * Sequence Polymorphisms Module

advertisement
Jumping Genes Lead the Way
Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A
hilgert@email.arizona.edu; (520) 626-1367
Part A. Fetch PV92 Primer Sequences
Example Sequence:
PV 92 Forward and Reverse Primer Sequences from Wetlab
Tool(s):
Google
Concept(s):
Sequences are accessible online
AZCTE BioScience Standard(s): 12.1
PCR: Polymerase Chain Reaction; a
method to amplify defined stretch of
DNA from within a larger target DNA
complex.
I. Find sequences
1. Open browser to http://www.geneticorigins.org/.
2. Click on ‘PV-92 Alu Insertion’ image
PCR Primers: Short oligo-nucleotides that
match the sequences that frame the
target sequence to be amplified by PCR .
3. Click on ‘Continue…’
4. Click on ‘Recipes.’
5. Click on
Amplicon: Amplified sequence.
6. Copy both PV92 PCR primer sequences.
Cycle: PCR requires cycles of varying
temperatures at which the reaction:
melts the strands apart (T1); allows
primers to hybridize to target sequences
(T2); and Polymerase enzyme to extend
the sequences (T3).
II. Store sequences
1. Open a text document.
2. Paste the primer sequences into the document .
3. Safe the document in a place that you’ll remember.
Questions:
Q.1:
What are the lengths of the two PV92 primer sequences in nucleotides [nt]?
a. Forward PV92 primer: ________
b. Reverse PV92 primer: ________
Q.2:
Describe the function of the two primers to amplify DNA by PCR in the wet lab. How does it work?
_________________________________________________________________________________________________
_________________________________________________________________________________________________
_________________________________________________________________________________________________
Q.2:
What is the significance of the two PV92 primer sequences in relation to the human genome? (Tip: Draw)
_________________________________________________________________________________________________
Sketch how the primers appear in the human genome in relation to the amplicon.
1
Jumping Genes Lead the Way
Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A
hilgert@email.arizona.edu; (520) 626-1367
Additional Investigation: Write out the sequences in the human genome to which the primers anneal.
Use the sequences from the Lab Manual and write out the nucleotides on the respectively other strand; indicate orientations
(sequences are usually written in 5’  3’ direction.)
2
Jumping Genes Lead the Way
Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A
hilgert@email.arizona.edu; (520) 626-1367
Part B. Conduct PCR Electronically
Example Sequence:
PV 92 Forward and Reverse Primer Sequences from Wetlab
Tool(s):
BLAST (blastn)
Concept(s):
BLAST sequence (text) queries are equivalent to primers hybridizing to target
sequences in PCR
AZCTE BioScience Standard(s): 12.1, 12.2, 12.3, 12.5, 12.7, 12.9
BLAST: Basic Local Alignment Search Tool; an
algorithm to search databases of biological
sequence information (e.g. DNA, RNA, or
amino acid sequence) and return matches.
III. Conduct a BLAST search (Query)
1. Open browser to http://ncbi.nlm.nih.gov.
2. Click on ‘BLAST.’
3. Click on ‘nucleotide blast.’
4. Enter both PV92 primer sequences into the search field.
blastn: An algorithm that uses nucleotides to
query data bases that contain nucleotide
sequences.
Query: Sequence use for BLAST search.
5. Enter ‘Organism’ ‘Homo sapiens’ (taxid: 9606).
6. Select ‘Optimize for’ radio button ‘Somewhat similar sequences (blastn).’ Subject: Matching sequence identified by
BLAST search.
7. Check ‘Show results in new window.’
8. Click
Score: Measure for how many matches were
present in query and subject together. (Higher
scores better)
IV. Examine results in the ‘Nucleotide Sequence’ section.
Q.4: How many query sequences did BLAST use? How long? ___________
E-value: Measure for likelihood that a match
could have arisen just by chance. (Small Evalues = high significance)
Q.5: How does the BLAST query sequence relate to the 2 primer sequences? ________
Q.6: Would you expect the query sequence used by BLAST to be present as such in the human genome? _____
V. Examine results in the ‘Graphic Summary’ section.
Q.7: What is the color key for higher-scoring matches (alignments)? What for lower-scoring? ________________
Q.8: What are the scores for “better” matches for your BLAST search with the PV92 primer sequences? What for
“lesser” matches? _____________________
VI. Examine results in the ‘Descriptions’ section.
Q.9: In the “E value” column, what numerical value does “4e-04” represent? _______________
Q.10: Small E values are more significant - are there matches with E values of less than 0.01? _______________
Q.11: Do these matches represent the “better” or the “lesser” matches in the graphic section above? __________
Q.12: Identify in the “Description” column information about the origin and location of the “best” match. _______
3
Jumping Genes Lead the Way
Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A
hilgert@email.arizona.edu; (520) 626-1367
VII. View more detailed information about the best match
Click on the URL in the ‘Description’ column for the best match.
Q.13: What is the human chromosome that contains the subject sequence(s) matching the query? ________
Q.14: What is the length of the DNA that contains the sequence(s) that matched the query? ________
Q.15: What is the “Number of Matches:” that the query sequence matches on this human genome sequence? ___
Q.16: What is the percentage of similarity between the query and the matches? _________
Q.17: Match up the nucleotides in the query sequence with those in the human genome sequence. (Record the
coordinates then, draw a sketch of how the matches align to the query sequences.)
_____________________________________________________________________________________________________
Sketch out how the primers match the human genome sequence
How far are the two matches apart from each other?
_____________________________________________________________________________________________________
What sequence length do the two matches span? (Include the matching query sequences in the count.)
_____________________________________________________________________________________________________
Q.18: Does the answer to Q.17 confirm your answer to Q.6? How so? (If it doesn’t think again about Q.6.)
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
Q.19: Predict the size of the amplicon sequence in the wet lab. (Refer to your sketch from Q.17.) ___________
4
Jumping Genes Lead the Way
Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A
hilgert@email.arizona.edu; (520) 626-1367
VIII. View the GenBank data sheet for the sequence that holds the best matches
Click on the URL ‘AC009028’ in the ‘Accession’ column for the best match to view the GenBank entry for this
sequence.
Q.20: Identify the following information about the GenBank data sheet for the sequence that contains the
sequences that matched the query:
Length of sequence in GenBank data sheet: ___________
Year of publication in GenBank: ___________
Definition: ___________
Organism from which sequence is derived: ___________
Authors: ______________________
Title: ___________
Journal: ___________
Q.21: Identify the DNA that matches the primer sequences used as query. (Tip: use coordinates from Q.17.)
IX. Isolate the PV92 amplicon sequence
Copy and paste the sequence for the amplicon by using the coordinates from the sketch above (Q.18) to copy and
paste into your text document the nucleotide sequence that the two PV92 primer sequences span in the
human genome. Delete all non-nucleotide characters (e.g. numbers), spaces and line breaks.
Q.22:
How long is this sequence? (There’s a count function in Microsoft Word…) ___________
Q.23:
What is the equivalent of this sequence in the PCR wet lab experiment? ___________
Additional Investigation: Identify the sequences in the amplicon to which the primers hybridize and compare them to the PV92
primer sequences used for the BLAST search as well as for the wet lab.
_________________________________________________________________________________________________
_________________________________________________________________________________________________
_________________________________________________________________________________________________
_________________________________________________________________________________________________
5
Jumping Genes Lead the Way
Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A
hilgert@email.arizona.edu; (520) 626-1367
Part C. Examine the PV92 Locus in the Human Genome
Example Sequence:
Amplicon sequence from Part B, IX.
Tool(s):
NCBI Genome Viewer, BLAST
Concept(s):
Genome/Chromosome/Gene organization, gene structure, introns/exons, genome
search engines and browsers, genes and function/phenotypes/disorders
AZCTE BioScience Standard(s): 12.1, 12.2, 12.5, 12.7, 12.9
Q.24a: How many chromosomes are in the human genome? (Write down your answer.)
X. View the human genome
1. Open browser to http://ncbi.nlm.nih.gov.
2. Click on ‘Genome.’
3. Click on ‘Map Viewer.’
4. Find human in the ‘Scientific Name’ column.
5. Click on the most recent ‘Annotation Release’ in the ‘Build’ column.
OMIM: Online Mendelian
Inheritance in Man. A
database that contains all
known loci in the human
genome that have been
found associated with
human phenotypes,
including diseases and
disorders.
Track: The individual
regions of the display
where information of
certain types is mapped,
such as genes, genetic
distances, RNAs, etc.
Q.24b: How many chromosomes are in the human genome? (Does the number of chromosomes displayed in the
Homo sapiens (human) genome view match your response above? If not, what causes the differences?)
________
XI. BLAST search the human genome
1. Click on ‘BLAST search the human genome.’
2. Paste the amplicon sequence from Step IX. into the search field.
3. Select ‘Chose Search Set’ drop-down ‘Genome (all assemblies…).’
4. Select ‘Optimize for’ radio button ‘Somewhat similar sequences (blastn).’
5. Check ‘Show results in new window.’
6. Click
XII. Examine results
Click on ‘[Human genome view].’
Q.25: What is the color key for higher-scoring matches? What for lower-scoring? ___________
Q.26: Locate the highest scoring match and describe the location. ___________
6
Jumping Genes Lead the Way
Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A
hilgert@email.arizona.edu; (520) 626-1367
XII. Move into the single-chromosome view
Click on the number for the chromosome that carries the highest scoring match.
Q.27: Approximately, how long is the chromosome (in nucleotides). ___________
Q.28: Approximately, at what nucleotide position does the BLAST hit map? ___________
XII. Move into the sub-chromosome view
Set the ‘Region Shown’ utility on the left to the 10,000,000-nucleotide window that surrounds the match identified
in Q.28. (e.g., if the match maps approximately at 83 M, set ‘Region Shown’ from 78M to 88M. click ‘Go.’)
Q.29: Approximately, at what nucleotide position does the BLAST hit map now? ___________
XIII. Rearrange the view
1. Click on
2. Remove from ‘Tracks Displayed’ anything but the ‘Gene’ track, using the ‘-‘ icon at the right.
3. Add a ruler to ‘Gene’ by clicking the ‘R’ icon to the left of the word ‘Gene’
4. Click on ‘Cytogenetic Maps’ in ‘Available Tracks.’
5. Click on the ‘+’ icon next to ‘Ideogram.’ (This will add ‘Ideogram’ to ‘Tracks Displayed.’)
6. Click on ‘OK.’
7. Identify the red indicator for the match in the overview. (Tip: use the ruler to find the position from Q.29.)
Q.30: Using the graphs in the ‘Gene_Seq’ column, determine whether PV92 is located in a gene or in the region
between genes (=intergenic region). ___________
Q.31: Approximately, at what nucleotide position does the BLAST hit map? ___________
XIV. Move into an even more detailed view
Set the ‘Region Shown’ utility to the 4,000,000-nucleotide window that surrounds the match identified in Q.28.
(e.g., if the match maps approximately at 82.85 M, set ‘Region Shown’ from 80.85M to 84.85M.)
Q.32: Using the graphs in the ‘Gene_Seq’ column, determine whether PV92 is located in a gene or in the region
between genes (=intergenic region). ___________
Q.33: Approximately, at what nucleotide position does the BLAST hit map now? ___________
XV. Move into an even more detailed view
Set the ‘Region Shown’ utility to the 2,000,000-nucleotide window that surrounds the match identified in Q.28.
(e.g., if the match maps approximately at 82.85 M, set ‘Region Shown’ from 81.85M to 83.85M.)
7
Jumping Genes Lead the Way
Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A
hilgert@email.arizona.edu; (520) 626-1367
Q.34: Using the graphs in the ‘Gene_Seq’ column, determine whether PV92 is located in a gene or in the region
between genes (=intergenic region___________
Q.35: Approximately, at what nucleotide position does the BLAST hit map now? ___________
Q.36: Approximately, how long is the gene? ______________________
Q.37: What is the name of the gene? ___________
Q.38: Determine whether the gene is a spliced gene or not. ______________________
Q.39: Is the PV92 locus in an exon or an intron of the gene? (Zoom in until you can decide that question.)
___________
XVI. Determine the function of a gene
Click on the ‘OMIM’ link for the gene
Q.40: What is the name of the gene? ___________
Q.41: What is the function of the gene? ________________________________________________________________
_____________________________________________________________________________________________________
Q.42: What is the length of the protein that the gene encodes? ___________
Q.43: What is the length of the gene’s coding sequence? ___________
Q.44: What disease(s) has the gene been found associated with? ___________
Q.45: Would you anticipate a change in phenotype/health if a small (ca. 300 bp) transposon is being inserted into
the PV92 locus?
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
Additional Investigation: Use the amplicon and NCBI map viewer to identify the PV92 locus in the chimpanzee genome. Follow the
procedure from X. through XVI. and answer Q24-Q.55 accordingly.
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
8
Jumping Genes Lead the Way
Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A
hilgert@email.arizona.edu; (520) 626-1367
Part D. Identify All States of PV92 Known to Man
Example Sequence:
Amplicon sequence from Part B, IX.
Tool(s):
BLAST, DNA Subway, MUSCLE
Concept(s):
Transposons, human evolution, human relatedness, DNA behavior, Junk DNA
AZCTE BioScience Standard(s): 12.1, 12.2, 12.3, 12.7, 12.9
XVII. Conduct a BLAST search (Query)
Not sure what could go here –
by now you are all so smart…
1. Open browser to http://ncbi.nlm.nih.gov.
2. Click on ‘BLAST.’
3. Click on ‘nucleotide blast.’
4. Enter the PV92 amplicon sequence into the search field. (From Part B., IX.)
5. Enter ‘Organism’ ‘Homo sapiens’ (taxid: 9606).
6. Select ‘Optimize for’ radio button ‘Somewhat similar sequences (blastn).’
7. Check ‘Show results in new window.’
8. Click
XVIII. Examine results in the ‘Nucleotide Sequence’ section.
Q.46: How does the BLAST query sequence relate to the PV92 wet lab? ___________
Q.47: In whose genome would you expect the query sequence to be present as a contiguous DNA stretch?
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
XIX. Examine results in the ‘Graphic Summary’ section.
Q.48: What are the scores for “better” matches for your BLAST search with the PV92 amplicon sequence? What for
“lesser” matches?
______________________
VI. Examine results in the ‘Descriptions’ section.
Q.49: How many matches have E values of less than 0.00000001? Provide some information about these. _______
Description
E-Value
(see Q.9)
Number of matches
(see Q.15)
9
Accession
Sequence Length of
GenBank entry (see Q.20)
Published
(see Q.20)
Jumping Genes Lead the Way
Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A
hilgert@email.arizona.edu; (520) 626-1367
Q.50: Which of these matches is the same as the match in the PV92 primer BLAST? (Part B.)
___________
Q.51: What do the other matches represent that have E values of less than 0.00000001? (Tip: Start with the finding
that was published in 1990, than the one from 2001. Use the more detailed information and alignments
that are presented further down in the BLAST result page, as well as the information presented in the
GenBank data sheets. Draw out the three different alleles.)
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
Sketch an alignment of the three matching sequences
10
Jumping Genes Lead the Way
Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A
hilgert@email.arizona.edu; (520) 626-1367
Q.52: How are the three best matches related to each other? Which is ancestral? Which most recent? (Tip:
Comparing the sequences from the three original GenBank sheets is very complicated. Instead, combine
the drawings for the amplicons from Q.51 into one drawing. Alternatively, open the file that contains the
three amplicon sequences for the PV92 locus and align the sequences using DNA Subway and/or MEGA.)
_______________________________________________________
MUSCLE Alignment from DNA Subway
MEGA Phylogenetic Analysis
Sketch alignment of the three amplicons
Sketch the relationship between the amplicons
_____________________________________________________________________________________________________
11
Jumping Genes Lead the Way
Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A
hilgert@email.arizona.edu; (520) 626-1367
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
Additional Investigation: Determine the sizes of the amplicons that would be amplified from the three PV92 alleles if the same
primers were used as in the wet lab. Explain how you arrived at your numbers.
PV92 = _____ bp
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
PV92::Alu = _____ bp
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
(PV92::Alu)::Alu = ______ bp
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
12
Jumping Genes Lead the Way
Dr. Uwe Hilgert, BIO5 Institute/iPlant Collaborative, U of A
hilgert@email.arizona.edu; (520) 626-1367
Students Discover Biological Concepts Using Bioinformatics
Genomes and Repetitive DNA
 A genome is an organism’s entire complement of DNA.
 DNA is a directional molecule composed of two anti-parallel strands.
 The genetic code is read in a 5’ to 3’ direction, referring to the 5’ and 3’ carbons of deoxyribose.
 Eukaryotic genomes contain large amounts of repetitive DNA, including simple repeats and transposons.
 Transposons can be located in intergenic regions (between genes) or in introns (within genes).
 Genes and transposons are directional, and can be encoded on either DNA strand.
 Repeats are non-directional, and, in effect, do occur on both strands.
 Transposons can mutate like any other DNA sequence.
Genes and Proteins
 Protein-coding information in DNA and RNA begins with a start codon, is followed by codons, and ends with a
stop codon.
 Codons in mRNA (5’-AUG-3’, etc.) have sequence equivalents in DNA (5’-ATG-3’, etc.).
 The DNA strand that is equivalent to mRNA is called the “coding strand.” The complementary strand is called the
“template strand,” because it serves as the template for synthesizing mRNA.
 Non-spliced genes, which are characteristic of prokaryotes, are also found in eukaryotes.
 Even in a spliced gene, the protein-coding information may be organized as Open Reading Frame (ORF).
 Most eukaryotic genes are spliced, whereby intervening segments (introns) are removed and the remaining
segments (exons) are spliced together.
 Splice sites (exon-intron boundaries) have sequence patterns that are recognized by the splicing apparatus
(spliceosome).
 Gene prediction programs use consensus sequences around splice sites to predict exon-intron boundaries.
 Over 90% of eukaryotic introns have “canonical splice sites,” whereby introns begin with GT (mRNA: GU) and
end in AG (mRNA: AG).
 The protein coding sequence of a eukaryotic mRNA (or gene) is flanked by 5’- and 3’-untranslated regions
(UTRs); introns can be located in UTRs.
 In most eukaryotic genes, transcripts are alternatively spliced, yielding different mRNAs and proteins.
 UTRs hold information for the half-lives of mRNAs and for regulatory purposes.
 Gene > mRNA > CDS.
 CDS = nucleotides that encode amino acid sequence.
 In mRNA: CDS = ORF.
BLAST Searches
 Basic Local Alignment Search Tool (BLAST) searches databases for matches to a query DNA or protein sequence.
 Gene or protein homologs share sequence similarities due to descent from a common ancestor.
 Biological evidence is needed to edit and confirm gene models predicted by computer algorithms.
 Biological evidence is most often derived from mRNA transcripts (ESTs, cDNAs, RNAseq). Protein sequence data
are available, too, but much less common.
 Many ESTs and cDNAs are disrupted by “introns” when they are aligned against genomic DNA.
 ESTs & cDNAs may be incomplete.
 The BLAST algorithm does not resolve intron/exon boundaries.
 The BLAST algorithm is not restricted to detecting sequences that fully match a query (“global” matches) but,
instead, matches query subsequences as well (“local” matches).
 The BLAST algorithm matches sequences to the fullest extent possible and, often, realigns the same sequence
twice.
13
Download