Document

advertisement
Genetics 2120 Sequence Analysis Extra Credit Assignment – Spring 2005
Genetics 2120 Genome Project – Part VI
Interpreting sequence information
Our goal is to eventually sequence the entire Fescue chloroplast genome. What we need initially
is a set of clues which will enable us to take these first sequences and fragments and begin the
process of piecing it all together. To do this we have to make some assumptions about our
experimental genome. All of the initial design was based on the maize chloroplast genome
therefore I will ask you to make all initial comparisons based on that one. Some of your searches
will tell you that other plants may be more related to Fescue than maize, which is important in
the long haul, but for the moment we need to get an idea of how to build our map and it’s easiest
to stick with a single reference.
Please keep in mind that we have no idea what we’ll find and there is no guarantee that any clone
will even be a piece of the chloroplast genome. Random pieces could have been amplified and
cloned. Also, portions of the genome will be arranged in ways no one can predict.
If you would like an analogy…
We are attempting to piece together a clear landscape photograph while using an
impressionistic painting of a similar landscape as your reference picture (happy clouds
included).
STEP 1: Is the sequence homologous to any other known sequences?
Welcome to the world of Bioinformatics. (note the lack of an exclamation point)
Go to the National Center for Biotechnology Information website.
http://www.ncbi.nlm.nih.gov/
We want to perform a BLAST (Basic Local Alignment Search Tool) search. BLAST is a
program which compares DNA and protein sequences and aligns them by similarity. Two
similar sequences are called homologues.
Discerning
geneticists
choose
BLAST
1
Genetics 2120 Sequence Analysis Extra Credit Assignment – Spring 2005
You want to compare your DNA sequence to others so choose a nucleotide-nucleotide BLAST
(blastn).
Choose
Nucleotide-nucleotide
BLAST (blastn)
You were provided a sequence in a text file. Open the file using Word; highlight the sequence,
with your cursor in the highlighted area press the right-hand button (right-click) on the mouse so
a menu pops up, click copy from the list of commands. Back at the NCBI page left-click inside
the Search box and then right-click to open the list of commands, choose Paste. For the initial
search I want you to only search against the corn database (Zea mays).
1. Paste your sequence here
3. Click on BLAST! button
2. Choose Zea mays [org] from the pull-down menu.
2
Genetics 2120 Sequence Analysis Extra Credit Assignment – Spring 2005
You will get a formatting screen. We will use the default settings.
Before you click on the Format! button, I want you to copy the request ID number and paste it
into the report you will hand in. I want specific information in your report so follow the format I
outline below.
How to Make Your Report Document
1.
2.
3.
4.
5.
Start a new document in Word.
At the top of the first page put your name and the name of the sequence you are analyzing.
Paste the sequence you are analyzing in your document.
Skip a few lines and make a subject heading called “Zea mays Sequence Homology Results”
Copy and paste the request ID number from your formatting page.
Press Format! and wait
When the search is completed the results page will pop up in place of the formatting page. Scroll
down until you get an image similar to the one shown below. Inside the boxed area is a number
line, this represents your piece of DNA. The colored lines below represent pieces of DNA that
are similar to your piece.
Below the boxed area is a list
of the homologues.
The most likely hit will be the
complete chloroplast genome
of Zea mays.
If this is not listed you either
have something very interesting
or a mistake was made in the
cloning process.
3
Genetics 2120 Sequence Analysis Extra Credit Assignment – Spring 2005
If you have a chloroplast homologue scroll down until you get to the actual base for base lineups
like the one shown below (if you don’t have a chloroplast homologue then skip to the portion
below where it says, “Search the Whole Database”).
Find the lineup that has this header…
>gi|11990232|emb|X86563.2|ZMA86563
Zea mays complete chloroplast genome
Length = 140384
The two sequences in the lineup
are the one you entered (Query)
and its closest match (Sbjct).
Your Report Document
6. Copy the text starting with the
header and ending with the final
base pairing and paste it in your
report document below the request
ID.
Search the Whole Database
Now repeat your search using the whole database. This time when you enter your sequence do
not limit your search to Zea mays, instead allow the program to search all organisms (this is the
default setting so you just enter your sequence and press BLAST!).
Your Report Document
7. Make another subject heading
called “Whole Database Homology
Search”.
8. Copy the request ID.
The results page may resemble
the one shown on the right. You
may notice that the top hit is not
Zea mays but Oryza sativa (rice).
4
Genetics 2120 Sequence Analysis Extra Credit Assignment – Spring 2005
Your Report Document
9. Just under the box with the number line the similar sequences are listed. Copy the top 5 (or
so) and paste them into your report document.
10. If you didn’t get a maize chloroplast homologue… but you did get a chloroplast
homologue from another species then go to the base-for base lineup described above (Step 6 for
“Your Report Document”) then copy and paste that sequence into your report.
What if my sequence doesn’t have a chloroplast homologue?
If there are no chloroplast homologues at all take a close look at the matches that do appear.
The BLAST program will try to make your sequence fit other sequences even if they have very
little in common. For this reason another number (the E-value) is included so that you (the
human with the critical thinking skills) can determine if the match represents a bona fide
homologous region or was forced by the program.
The E-value that is assigned to each potential homologue represents number of times your
database match may have occurred by random chance. The lower the e-value the more confident
you can be that you are looking at two similar sequences with a common ancestry. Typically, Evalues equal or greater than 10-4 are not similar enough to represent true homologues.
If you get a match with human DNA that is less than 10-4 then we probably cloned a contaminant
(a piece of DNA from me or someone else from the class). No matter what you find, copy and
paste the top hits into your report so the analysis can continue. Who knows, it may actually be a
part of the fescue genome!
Your Report Document
11. What is the significance of any similar regions you found either chloroplast or nonchloroplast?
5
Genetics 2120 Sequence Analysis Extra Credit Assignment – Spring 2005
STEP 2: Are there any genes in your sequence?
The first step is interesting and essential in our analysis but the genes are the most exciting part.
A variant of BLAST called BlastX will take your sequence, automatically translate it using every
possible reading frame in both directions, and then compare these hypothetical proteins to every
known protein.
From the BLAST page choose the Translated query vs. protein database (blastx).
Choose
Translated query vs.
protein database
(blastx)
Perform the same steps you did with the blastn search before.
Note: If you found maize chloroplast hits before then you want to begin by limiting your search
to the maize database using the pull-down menu under “Options”. After that search, repeat the
search using all species. If you found hits to another species you should limit your search to that
species’ database before searching the entire database.
Key to
monocot
species
names.
Oryza sp. – Rice
Triticum sp. – Wheat
Saccharum sp. – Sugarcane
Zea mays - Corn
Your Report Document
12. Start a new subject heading titled “BlastX homology search”.
13. For each search include the Request ID, and the top matches (use the same criterion of an
E-value less than 10-4).
6
Genetics 2120 Sequence Analysis Extra Credit Assignment – Spring 2005
Your results page will resemble this one.
Notice that the lineup is now
between amino acids instead
of nucleotides.
There may be pieces of more than one gene on the fragment you are analyzing. Be sure to
include all of the hits in your report.
Your Report Document
14. What is the significance of any homologues you found during the searches?
7
Genetics 2120 Sequence Analysis Extra Credit Assignment – Spring 2005
Report Document Checklist
Summary of Report Document Requests, Include…
General Info.
1. Start a new document in Word.
2. At the top of the first page put your name and the name of the sequence you are analyzing.
3. Paste the sequence you are analyzing in your document.
BLAST Search Results
4. Skip a few lines and make a subject heading called “Zea mays Sequence Homology Results”
5. Copy and paste the request ID number from your formatting page.
6. Copy the text starting with the header and ending with the final base pairing and paste it in
your report document below the request ID.
7. Make another subject heading called “Whole Database Homology Search”.
8. Copy the request ID.
9. Just under the box with the number line the similar sequences are listed. Copy the top 5 (or
so) and paste them into your report document.
10. If you didn’t get a maize chloroplast homologue… but you did get a chloroplast
homologue from another species then go to the base-for base lineup described above (Step 6 for
“Your Report Document”) then copy and paste that sequence into your report.
11. What is the significance of any similar regions you found either chloroplast or nonchloroplast?
BLASTX Search Results
12. Start a new subject heading titled “BlastX homology search”.
13. For each search include the Request ID, and the top matches (use the same criterion of an
E-value less than 10-4).
14. What is the significance of any homologues you found during the searches?
Your report is DUE APRIL 15TH BY 3:00 P.M.
Late submissions will be accepted but no credit will be awarded.
8
Download