Lesson/Week 3 - Middle Tennessee State University

advertisement
Genetics 3250 Sequence Analysis
Cahoon – Genetics 3250 Lab
Middle Tennessee State University
Chloroplast Genome Project – Week 3
Interpreting sequence information
Our goal is to eventually sequence an entire chloroplast genome. To begin, we need some clues
which will enable us to take these first sequences and fragments and begin the process of piecing
it all together. To do this we have to make some assumptions about our experimental genome.
The initial design of the project was based on the maize chloroplast genome therefore I will ask
you to make the first comparison based on corn. Subsequent searches will tell you if you really
have chloroplast genome, if the sequence is more related to a species other than maize, and if
there are any genes in your fragment.
Please keep in mind that we have no idea what we’ll find and there is no guarantee that any clone
will even be a piece of the chloroplast genome. Random pieces could have been inserted into the
vector or no piece at all. Also, portions of the genome may be arranged in ways we can’t predict.
If you would like an analogy…
We are attempting to piece together a clear landscape photograph while using an
impressionistic painting of a similar landscape as your reference picture (happy clouds
included).
STEP 1: Is the sequence really from the chloroplast?
You may recall that the piece of DNA you are sequencing was randomly cloned. Although I
thought we were working with chloroplast DNA it’s always possible that some other can sneak
in. Or that the cloning vector closed on itself without an insert at all.
Your first first hypothesis is…
My clone carries a piece of chloroplast DNA.
Test your hypothesis…
First, open your sequence file. The data file is in .txt format and should open into any text editor.
If you double click on the file name with a PC it will probably open in the “Notepad” program.
Next, go to the National Center for Biotechnology Information website.
http://www.ncbi.nlm.nih.gov/
1
Genetics 3250 Sequence Analysis
Cahoon – Genetics 3250 Lab
Middle Tennessee State University
We want to perform a BLAST (Basic Local Alignment Search Tool) search. BLAST is a
program which compares DNA and protein sequences and aligns them by similarity. Two
similar sequences are called homologues.
Choosy
geneticists
pick
BLAST
You want to compare your DNA sequence to others so choose nucleotide blast.
Choose
nucleotide blast
2
Genetics 3250 Sequence Analysis
Cahoon – Genetics 3250 Lab
Middle Tennessee State University
Go to your sequence file (the one you opened in the text editor program) and highlight the entire
sequence.
Copy the sequence.
Move your cursor within the highlighted area press the right-hand button (right-click) on
the mouse so a menu pops up, click copy from the list of commands.
Paste the sequence into the box at the top of the nucleotide blast page.
Left-click inside the Enter Query Sequence box and then right-click to open the list of
commands, choose Paste. Your string of As Ts Gs, and Cs should now appear in the
box.
For this first test I want you to compare your sequence to corn (Zea mays). Beside ‘database’
choose other. In the “organism” box type in “Zea mays”. Now click on the “BLAST” button to
begin your search.
1. Paste your sequence here
2. Choose “Others” database
3. Type “Zea mays” in
this box
4. Click on BLAST button
You will get a search screen that will serially update until the search and comparison is
completed.
3
Genetics 3250 Sequence Analysis
Cahoon – Genetics 3250 Lab
Middle Tennessee State University
When the search is completed the results page will pop up. Scroll down until you get an image
similar to the one shown below. Inside the boxed area is a number line, this represents your
piece of DNA. The colored lines below represent pieces of DNA that are similar to your
sequence.
The most likely hit will be “Zea mays complete chloroplast genome”.
Your sequence
(aka query)
represented as a
number line
Sequences which
matched the one
you entered
Find the lineup that has this header…
>gi|11990232|emb|X86563.2|ZMA86563
Zea mays complete chloroplast genome
Length = 140384
How to Make Your Report Document
1.
2.
3.
4.
Start a new document in Word or your favorite word processing program.
At the top of the first page put your name and the name of the sequence you are analyzing.
Paste the sequence you are analyzing in your document.
Skip a few lines and make a subject heading called “Zea mays Sequence Homology Results”
4
Genetics 3250 Sequence Analysis
Cahoon – Genetics 3250 Lab
Middle Tennessee State University
If you DO NOT have a chloroplast homologue then…
5. Put NO MAIZE CHLOROPLAST HOMOLOGUES in your report document
There are several other possibilities at this stage…
a. Your sequence is not found in the corn chloroplast genome
b. Perhaps sequence other than chloroplast DNA is in the cloning vector
c. Your clone is the cloning vector without an insert.
It’s not time to reject the original hypothesis yet. To run the next test, skip the rest of this
section and go to the portion below where it says, “Search the Whole Database”
If you DO have a chloroplast homologue then…
Scroll down until you see nucleotide alignments like the one shown in the screen shot below.
This alignment shows how your sequence (Query) matches with the sequence found in the data
base (Sbjct). These are two different species so the match will not be perfect but it is close
enough to tell that they are related.
5. Copy and paste the alignment from the “Zea mays complete chloroplast genome”
match Do not use the one with B73 in the title line.
The two sequences in the lineup
are the one you entered (Query)
and its closest match (Sbjct).
Was the sequence you found homologous to a maize region between 82,000 and 105,000? If
so then the sequence falls in a region called the inverted repeat which has been found in all
chloroplast genomes. The genes in this region are duplicated in reverse order in a nearby region.
If your clone came from the inverted repeat then include both regions in your report. The second
region will probably fall between bases 117,000 and 140,000.
6. Do you think your sequence is from the inverted repeat region?
5
Genetics 3250 Sequence Analysis
Cahoon – Genetics 3250 Lab
Middle Tennessee State University
Search the Whole Database
Now repeat your search using the whole database. This time when you enter your sequence do
not limit your search to Zea mays, instead allow the program to search all organisms (this is the
default setting so do not enter a name in the “organism” box).
Your Report Document
7. Make another subject heading
called “Whole Database Homology
Search”.
The results page may resemble
the one shown on the right. You
may notice that the top hit is not
Zea mays but Oryza sativa (rice).
Your Report Document
9. Just under the box with the number line there’s a list of similar sequences. Copy the top 2 (or
so) hits and paste them into your report document. Just copy the names for these hits, not the
alignments.
10. If you didn’t get a maize chloroplast homologue… but you did get a chloroplast
homologue from another species for this step go to the base-for base lineup described above
(Step 6 for “Your Report Document”) and copy and paste the top one or two alignments into
your report.
What if my sequence doesn’t have a chloroplast homologue?
If there are no chloroplast homologues at all take a close look at the matches that do appear.
If the matches are a list of cloning vectors then your plasmid contained no insert.
Reject the original hypothesis
If the sequence is novel then the BLAST program will try to make your sequence fit other
sequences even if they have very little in common. For this reason another number (the E-value)
is included so that you (the human with the critical thinking skills) can determine if the match
represents a bona fide homologous region or was forced by the program.
6
Genetics 3250 Sequence Analysis
Cahoon – Genetics 3250 Lab
Middle Tennessee State University
The E-value that is assigned to each potential homologue represents the number of times your
database match may have occurred by random chance. The lower the e-value the more confident
you can be that you are looking at two similar sequences with a common ancestry. Typically, Evalues equal or greater than 10-4 are not similar enough to represent true homologues.
If you get a match with human DNA that is less than 10-4 then we probably cloned a contaminant
(a piece of DNA from me or someone else from the class). No matter what you find, copy and
paste the top hits into your report so the analysis can continue.
Your Report Document
11. What is the status of your hypothesis after running two experiments with your sequence?
12. What is the significance of any matches you found (chloroplast, non-chloroplast, or vector)?
13. If sequences from two different species are very similar what can we infer about their
relationship / ancestry?
7
Download