Task 1.2 Background “The taxonomy browser is an NCBI-derived search tool that allows an individual to search the taxonomy database. Using the taxonomy browser, information may be retrieved on available nucleotide, protein, and structure records for a particular species or higher taxon. The taxonomy database contains the names and lineages of every organism that is represented by at least one nucleotide or protein sequence in the NCBI genetic databases.”i Task 1.2 Scenario One of your fictitious coworkers at Clearsighted Biotechnologies, Inc., Dr. Paul E. Morrfism is studying insect genes. He hopes to better understand the genetic changes that take place as an insect population evolves the resistance to toxins (pesticides). In addition he is interested in the possibility of using this understanding to develop insects that would have the ability to clean up toxins in polluted environments. Currently he is studying how insect genes have changed over very long periods of time. In 1993 it was reported that DNA had been recovered from an extinct organism, Lebanorhinus succinusii. This insect was preserved for more than 120 million years in amber. One of the genes from this extinct organism was amplified and sequenced, and then it was submitted to the Genbank database. Dr. Morrfism would like to know how/if this gene has changed over time. Your task is to use the Taxonomy Browser at NCBI to learn more about this organism and the DNA sequence that was recovered from it. Then you are to find matching sequences from extant insects and describe any differences between the genes of the extinct and modern species. Step A To get started you need to learn a little more about the taxonomy of Lebanorhinus succinus. Use the Taxonomy Browser at the NCBI website and locate the list of Extinct Organisms in the Taxonomy Database. Hints There are many ways to find this information here is one method. Navigate to the Taxonomy database home by selecting “Taxonomy” from the pull down database menu bar at the top of the NCBI homepage and click “go”. (1)Scroll down and search for the link to “Taxonomy Home” along the left-hand side of the page. Follow this link to the home page then look for the Extinct Organisms link on the lower left side of this page. B Find and view the taxonomic lineage of an extinct “beetle from Lebanese amber”. This will show you all the categories (taxons) that are used to classify Lebanorhinus succinus. (1) Scroll down the list of extinct organisms until you find the section labeled Insects: (2) Click on the link for Libanorhinus succinus C Dr. Morrfism would like an update on your progress. He is curious to know more about this extinct organism and other organisms that are closely related to it. In the space below provide the name for each of the following taxonomical categories of the extinct Lebanorhinus succinus. Superkingdom: Kingdom: Phylum: Class: Order: Family: Genus: List the common names of 3 other organisms in this same phylum. 1. 2. 3. D In order to learn about the gene sequence that was recovered from the amber, visit the Genbank report to get a description of this DNA and learn about the gene’s product. (1)Click on the word “Lineage” to alternate between the abbreviated and the full lineage. Note that the lineage of an organism may contain extra descriptive categories that are not assigned a taxonomic rank. (2)Scroll over each word in the lineage to see a pop-up label describing its taxonomical rank. In the example below the superkingdom (domain) is the category Eukaryota (1)Identify the phylum and click on this word. (2)Click on the phylum link depicted below. Then select 3 common names of organisms that are listed as sub headings below the Arthropoda heading. (1) There are many ways to get to the Genbank summary report. One method is to click the direct link to the nucleotide database which is provided in the window you should still have open. See the image below for help: Then click the Genbank accession number, which is a link to the report summary and nucleotide sequence. (2)Another method to find the Genbank record is to search the nucleotide sequence database for L succinus. This can be done easily from the homepage. The search bar is at the top of the page, see the image below. E Note the definition line in this summary report, it provides some information about the product of this DNA sequence. Follow the PubMed link to read the abstract of the scientific paper published about the DNA extracted from the amber. (1) Click on the link depicted in the image below. F Its time to report to Dr. Morrfism what you have learned from the Genbank record and the abstract provided by PubMed. Record you answers on the hanout. (1) Refer to Genbank report or the abstract provided by PubMed for this information. (2) From the Genbank record: 1. Describe the product or function of this gene? From the Abstract provided by PubMed: 2. What is different about the age of this DNA when compared to DNA previously extracted from extinct organisms? G Now its time to determine what modern organisms contain a similar gene using a nucleotide BLAST search. Another method to search the Nucleotide Database is to use the accession number instead of the nucleotide sequence. Use the accession number to complete a BLASTn search. This will allow you to discover sequences that closely match the DNA recovered from the extinct L. succinus. (1) Copy the accession number from the Genbank summary report, navigate to the BLASTn page and paste the number into BLAST Query text box. Select “Nucleotide Collection” from the database pull-down menu and enter a job title before clicking the BLAST button. (2) Copy the accession number top of the then click on Click this double helix icon at the page to return to the NCBI homepage and the link in the menu bar. Run nucleotide BLAST program by clicking Paste the accession number into the query text box Change the search name if you would like Select the Nucleotide Collection from the pull-down menu of databases Click the BLAST button at the bottom of the page H Visit the Taxonomy Report to determine the type of organisms that have similar DNA to that recovered from the amber. You can might be able to find images of these organisms with a Google image search. . (1) From the BLAST results page click on the link to the taxonomy reports. It will open in a new window or tab. (2) See the image below to find the link to the Taxonomy Reports I View the alignment data for several of the closest matching sequences to see how closely they are aligned. Recall that high Bit Scores and low E-values indicate significant alignment. E-values are often given in scientific notation. For example 2e-163=2x10-163. J Dr. Morrfism has some questions about the results of your BLAST search. (1) Close the taxonomy report window or tab to see the BLAST results again. Click on the Max Score number in the table for one of the hits near the top of the table. (2) View the image below to get a sense of which links to follow in order to view the detailed alignments data. (1)Refer to the Taxonomy Report for help with this question (2)Look on the report for the common name of the organisms that are listed there, especially those with the highest alignment scores 1. What group of organisms (common name) with DNA in the database is the source of the closest matching sequences? K 1. Looking at the alignment data for the first hit in the table. Notice how the index numbers in both sequences are perfectly aligned Why is the first hit in the BLAST results table a perfect match for the DNA you submitted? (1)Look closely at the first record, the definition or description will give you some hints about the source of the DNA. Then click the max score for that record to link to the alignment data. (2)Look at the image below to see how the two sequences of DNA are the same DNA. L Find the alignment data for the sequence with Accession Number AJ841532.1 (it is in the top 10 hits). Describe how well it aligns with the DNA from L. succinus. (1)The number of identities (identical matches) will also tell you how many mismatches there are by comparing it to the total number of letters in the query sequence. The number of gaps is also given in these details. (2)Look at the image below for help finding the data to answer these questions. Recall that the E-value is the number of chance matches expected to exist in a database that would have the same (or higher) alignment score. 1. How many gaps and mismatches? 2. How does the hit score compare to perfect match? 3. How many matches of this quality would be expected in a randomly generated database of the same size? M Congratulations! You have completed this task for Dr. Morrfism. Now its time for Assessment 1. i National Library of Medicine and National Institutes of Health. National Center for Biotechnology Information Website. http://www.ncbi.nlm.nih.gov/About/primer/phylo.html ii Nature. 1993 Jun 10; 363(6429):536-8.