Learning to Use PubMed pp. 22-25 Starting link: http://www.ncbi.nlm.nih.gov It is the responsibility of scientists to communicate their findings with their peers and also with the world at large. The major means of doing so is the publication of scientific papers and based upon these, textbooks. One of the most difficult challenges for the modern researcher is to keep up with the current research literature. Researchers must relate appropriate publications to the problem at hand and use the findings of others to help direct their own progress. It is not an easy task. Even if someone had all day every day to sit and read, it would be a formidable accomplishment to say that they had read all relevant material within their own field, let alone that of others. The scientific literature is growing in size so rapidly that computer searches of various types are indispensable. A second aspect of this is that access through the computer increasingly allows direct access to electronic publications. You have probably already become familiar with searching the databases at your institution’s library. In this tutorial we will take a closer look at the research literature database at NCBI. Within the Entrez program at NCBI is the PubMed database, which is produced in collaboration with the National Library of Medicine MEDLINE database. Most importantly, all of the publications at PubMed are cross-indexed to gene and protein sequences and all of the other databases at NCBI. They are also cross-referenced to textbooks in the form of key word searches. Notable in the list of accessible textbooks is Griffiths et al., Modern Genetic Analysis (MGA). (1) Click PubMed on the top menu. Let’s try a number of other searches to demonstrate the potential of this system. Let’s start with the words in the title of Chapter 6 of MGA: ‘Genetic Recombination in Eukaryotes.’ (2) First type recombination in the search window and press Go. The result is a daunting 120,000 plus research articles that mention the word in title, abstract, or key words. It is a popular and important area, but this is far too general a search term to be useful. (3) How about genetic recombination? Try it. We still get more than 110,000 returns. We are retrieving articles actually studying the process of recombination but also many that are of more peripheral interest to us, for instance, simply using recombination as a method to insert genes into chromosomes. (4) Try the whole title genetic recombination in eukaryotes. Now the output is cut to less than one thousand. That number would certainly be far fewer publications than are actually concerned with recombination in eukaryotes. The problem is probably with the term eukaryotes. The key words of publications might include phylogenetic group, but not be as broad as all eukaryotes. (5) Try genetic recombination in mammals. We get almost 50,000 entries. Many of these clearly do not have recombination as the primary subject of the paper. Let’s have a look at how our terms might be related in the formal subject headings within the National Library of Medicine. These Medical Subject Headings (MeSH) are used to classify publications. (6) Click on MeSH Browser under PubMed Services on the left panel. (7) At the new screen, type recombination in the search window and press Go. Note the definition of the term and the display of various subheadings. This is more like the list of topics relevant to recombination that you would learn in a genetics course. Having a look at these terms sometimes helps narrow your search field by presenting you with some alternative (and narrower) terms. If we searched using the MeSH term Recombination, Genetic, we would get more than 100,000 returns. When we look at the subheadings covered we can see why this might be so. (8) For example, transfection is included. Click on it. The definition shows that it includes all papers that use transfection of DNA or plasmids into cells for experimental purposes. Notice that it is included in the techniques tree as well as the recombination tree. Many of the papers would be using it solely as a technique and the publication would not be about the process of recombination itself. (9) Click Recombination, Genetic, in the lower tree to go back to our original search. (10) Now let’s try another term by clicking Crossing Over (Genetics). This is narrower in scope and focuses on the chromosomal recombination process. Notice that it is listed under Cytogenetics in one tree, and under Recombination, Genetic in the other. (11) Press the ‘Add’ button to open a PubMed search window for this term. Press PubMed Search. This yields a few thousand entries and a glance at the titles shows us that most are directly concerned with chromosomal recombination.We could also focus our search by excluding some of the subheadings in theRecombination tree. (12) Open the MeSH browser again (from the left-hand menu) and search Genetic Recombination. (13) Click Add this term to the search. (14) Now click Gene Transfer, Horizontal in the tree, and then toggle the drop-down menu to the right of Add to NOT, then click Add. (15) Next press PubMed Search. This returns an extensive list of papers dealing with various aspects of chromosomal recombination. (16) Now click the Preview/Index link in the bar under the search text box. This feature shows you a list of the searches you have run and the number of results for each. We could add more terms at this point. (17) Toggle the drop-down menu at the bottom of the pageto Author and type Hartwell LH in the text box. Press ’AND’, then ‘Preview’. A new result appears in your list of searches with only a few entries. If you click on this link for the number of results, you will see a subset of Hartwell papers dealing with recombination processes. There is a lesson here. It is relatively easy to define a narrowly focused search. For instance, we could search recombination AND Hartwell LH in PubMed and get a dozen or so journal references. It is often difficult however, to define a broad topic that does not include a substantial amount of irrelevant material (at least to you). If the search is narrowed to keep the amount of irrelevant material low, then it will almost invariably exclude material that you would want included in your list.This means that you must use a somewhat broader search and then make a judicious choice of material to keep. (18) Let’s do a new PubMed search for phylogeography AND vertebrates. Over 500 entries are returned. (19) Move down the list and choose the paper “Phylogeography of a stream-dwelling frog (Pseudacris cadaverina) in southern California.” (20) You can view the related articles in the box to the right. (21) For your selected article check the little box to the left of the reference. You could send the citation to yourself via email. By clicking here. (22) Open the Entrez database. (23) Type ìPseudacris cadaverina” into the Entrez and press go. The power of Entrez should become immediately evident. The species is cross-listed to several different databases. For example: (a) there are three articles in pubmed (one of which is that above in #19) (b) there are 73 nucleotide sequences in GenBank (c) there are 67 protein sequences (d) there is one listing in the taxonomy database (which you would expect since there is only one species with this specific name). (e) if we were to delete “cadaverina” we would still get one entry for taxonomy, but all species in the genus Pseudacris would be listed. (f) If you click on taxonomy for Pseudacris cadaverina you would recover the entire classification for this species. (g) there are 13 entries for PopSet. Here you will see the entire dataset for the paper in #19 (and some similar studies that include the same species). (24) In the PopSet database click on the first entry (25) Click on the hyperlinked GI#13386615. (26) Scroll down a bit and under the heading ìThe sequences in the study areî. You will see each nucleotide sequence associated with this study. The hyperlinked numbers to the left are called GenBank accession numbers. Click on one. (27) Here you will see everything you need and want to know about this sequence: the authors, the publication, the taxonomy, the genes, the protein sequence, and the nucleotide sequence. (28) Go back to where the sequences are listed under the heading ìthe sequences in the study areî. Click here and then press ìgenerate alignmentî. (29) You have just generated a multiple sequence alignment for the entire data set. There are a total of 61 sequences, with the alignment for each 1098 characters long. (30) At the top under viewing options select ìshow variations onlyî and re-do the ìgenerate alignmentî. The first sequence is used as a reference and wherever there is a deletion in other sequence you will see a gap ì-ì inserted. If the nucleotide residue is identical to the reference sequence you will see a dot. If there was a mutation (relative to the reference sequence) or a different nucleotide present you will see the letter code for that particular nucleotide. (see below) So in sequence 20, column (or base #) 68 there is a ìTî instead of ìCî present in most other sequences. This is called a substitution. What kind? Transition or transversion? (31) at the top of the page select ìfastaî under the Display menu (near top on the left). Copy the alignment into a word document and save it. We will use this file later. (32) Now let’s examine the links to books from the main PubMed page. Click on the Books link at the upper right. You could open any of the books by clicking on the book cover. From the book’s table of contents you can also open the chapters directly. You will see various terms underlined in the text. Click on one of these. A glossary appears to provide you with a precise definition. There is nothing to turn in for this tutorial. Just make sure you save the alignment of the frog dataset and we will use it later. You should also obtain the paper associated with this dataset. CSUB subscribes to Molecular Phylogenetics and Evolution via academic Search Elite. Save the PDF for this paper.