Genetics 3250 Sequence Analysis Cahoon – Genetics 3250 Lab Middle Tennessee State University Chloroplast Genome Project – Week 3 Interpreting sequence information Our goal is to eventually sequence an entire chloroplast genome. To begin, we need some clues which will enable us to take these first sequences and fragments and begin the process of piecing it all together. To do this we have to make some assumptions about our experimental genome. The initial design of the project was based on the maize chloroplast genome therefore I will ask you to make the first comparison based on corn. Subsequent searches will tell you if you really have chloroplast genome, if the sequence is more related to a species other than maize, and if there are any genes in your fragment. Please keep in mind that we have no idea what we’ll find and there is no guarantee that any clone will even be a piece of the chloroplast genome. Random pieces could have been inserted into the vector or no piece at all. Also, portions of the genome may be arranged in ways we can’t predict. If you would like an analogy… We are attempting to piece together a clear landscape photograph while using an impressionistic painting of a similar landscape as your reference picture (happy clouds included). STEP 1: Is the sequence really from the chloroplast? You may recall that the piece of DNA you are sequencing was randomly cloned. Although I thought we were working with chloroplast DNA it’s always possible that some other can sneak in. Or that the cloning vector closed on itself without an insert at all. Your first first hypothesis is… My clone carries a piece of chloroplast DNA. Test your hypothesis… First, open your sequence file. The data file is in .txt format and should open into any text editor. If you double click on the file name with a PC it will probably open in the “Notepad” program. Next, go to the National Center for Biotechnology Information website. http://www.ncbi.nlm.nih.gov/ 1 Genetics 3250 Sequence Analysis Cahoon – Genetics 3250 Lab Middle Tennessee State University We want to perform a BLAST (Basic Local Alignment Search Tool) search. BLAST is a program which compares DNA and protein sequences and aligns them by similarity. Two similar sequences are called homologues. Choosy geneticists pick BLAST You want to compare your DNA sequence to others so choose nucleotide blast. Choose nucleotide blast 2 Genetics 3250 Sequence Analysis Cahoon – Genetics 3250 Lab Middle Tennessee State University Go to your sequence file (the one you opened in the text editor program) and highlight the entire sequence. Copy the sequence. Move your cursor within the highlighted area press the right-hand button (right-click) on the mouse so a menu pops up, click copy from the list of commands. Paste the sequence into the box at the top of the nucleotide blast page. Left-click inside the Enter Query Sequence box and then right-click to open the list of commands, choose Paste. Your string of As Ts Gs, and Cs should now appear in the box. For this first test I want you to compare your sequence to corn (Zea mays). Beside ‘database’ choose other. In the “organism” box type in “Zea mays”. Now click on the “BLAST” button to begin your search. 1. Paste your sequence here 2. Choose “Others” database 3. Type “Zea mays” in this box 4. Click on BLAST button You will get a search screen that will serially update until the search and comparison is completed. 3 Genetics 3250 Sequence Analysis Cahoon – Genetics 3250 Lab Middle Tennessee State University When the search is completed the results page will pop up. Scroll down until you get an image similar to the one shown below. Inside the boxed area is a number line, this represents your piece of DNA. The colored lines below represent pieces of DNA that are similar to your sequence. The most likely hit will be “Zea mays complete chloroplast genome”. Your sequence (aka query) represented as a number line Sequences which matched the one you entered Find the lineup that has this header… >gi|11990232|emb|X86563.2|ZMA86563 Zea mays complete chloroplast genome Length = 140384 How to Make Your Report Document 1. 2. 3. 4. Start a new document in Word or your favorite word processing program. At the top of the first page put your name and the name of the sequence you are analyzing. Paste the sequence you are analyzing in your document. Skip a few lines and make a subject heading called “Zea mays Sequence Homology Results” 4 Genetics 3250 Sequence Analysis Cahoon – Genetics 3250 Lab Middle Tennessee State University If you DO NOT have a chloroplast homologue then… 5. Put NO MAIZE CHLOROPLAST HOMOLOGUES in your report document There are several other possibilities at this stage… a. Your sequence is not found in the corn chloroplast genome b. Perhaps sequence other than chloroplast DNA is in the cloning vector c. Your clone is the cloning vector without an insert. It’s not time to reject the original hypothesis yet. To run the next test, skip the rest of this section and go to the portion below where it says, “Search the Whole Database” If you DO have a chloroplast homologue then… Scroll down until you see nucleotide alignments like the one shown in the screen shot below. This alignment shows how your sequence (Query) matches with the sequence found in the data base (Sbjct). These are two different species so the match will not be perfect but it is close enough to tell that they are related. 5. Copy and paste the alignment from the “Zea mays complete chloroplast genome” match Do not use the one with B73 in the title line. The two sequences in the lineup are the one you entered (Query) and its closest match (Sbjct). Was the sequence you found homologous to a maize region between 82,000 and 105,000? If so then the sequence falls in a region called the inverted repeat which has been found in all chloroplast genomes. The genes in this region are duplicated in reverse order in a nearby region. If your clone came from the inverted repeat then include both regions in your report. The second region will probably fall between bases 117,000 and 140,000. 6. Do you think your sequence is from the inverted repeat region? 5 Genetics 3250 Sequence Analysis Cahoon – Genetics 3250 Lab Middle Tennessee State University Search the Whole Database Now repeat your search using the whole database. This time when you enter your sequence do not limit your search to Zea mays, instead allow the program to search all organisms (this is the default setting so do not enter a name in the “organism” box). Your Report Document 7. Make another subject heading called “Whole Database Homology Search”. The results page may resemble the one shown on the right. You may notice that the top hit is not Zea mays but Oryza sativa (rice). Your Report Document 9. Just under the box with the number line there’s a list of similar sequences. Copy the top 2 (or so) hits and paste them into your report document. Just copy the names for these hits, not the alignments. 10. If you didn’t get a maize chloroplast homologue… but you did get a chloroplast homologue from another species for this step go to the base-for base lineup described above (Step 6 for “Your Report Document”) and copy and paste the top one or two alignments into your report. What if my sequence doesn’t have a chloroplast homologue? If there are no chloroplast homologues at all take a close look at the matches that do appear. If the matches are a list of cloning vectors then your plasmid contained no insert. Reject the original hypothesis If the sequence is novel then the BLAST program will try to make your sequence fit other sequences even if they have very little in common. For this reason another number (the E-value) is included so that you (the human with the critical thinking skills) can determine if the match represents a bona fide homologous region or was forced by the program. 6 Genetics 3250 Sequence Analysis Cahoon – Genetics 3250 Lab Middle Tennessee State University The E-value that is assigned to each potential homologue represents the number of times your database match may have occurred by random chance. The lower the e-value the more confident you can be that you are looking at two similar sequences with a common ancestry. Typically, Evalues equal or greater than 10-4 are not similar enough to represent true homologues. If you get a match with human DNA that is less than 10-4 then we probably cloned a contaminant (a piece of DNA from me or someone else from the class). No matter what you find, copy and paste the top hits into your report so the analysis can continue. Your Report Document 11. What is the status of your hypothesis after running two experiments with your sequence? 12. What is the significance of any matches you found (chloroplast, non-chloroplast, or vector)? 13. If sequences from two different species are very similar what can we infer about their relationship / ancestry? 7