Learning to Use PubMed pp

advertisement
Learning to Use PubMed pp. 22-25
Starting link: http://www.ncbi.nlm.nih.gov
It is the responsibility of scientists to communicate their findings with their peers
and also with the world at large. The major means of doing so is the publication of
scientific papers and based upon these, textbooks. One of the most difficult challenges for
the modern researcher is to keep up with the current research literature.
Researchers must relate appropriate publications to the problem at hand and use the
findings of others to help direct their own progress. It is not an easy task. Even if
someone had all day every day to sit and read, it would be a formidable accomplishment
to say that they had read all relevant material within their own field, let alone that of
others. The scientific literature is growing in size so rapidly that computer searches of
various types are indispensable. A second aspect of this is that access through the
computer increasingly allows direct access to electronic publications.
You have probably already become familiar with searching the databases at your
institution’s library. In this tutorial we will take a closer look at the research literature
database at NCBI. Within the Entrez program at NCBI is the PubMed database, which
is
produced in collaboration with the National Library of Medicine MEDLINE database.
Most importantly, all of the publications at PubMed are cross-indexed to gene and protein
sequences and all of the other databases at NCBI. They are also cross-referenced to
textbooks in the form of key word searches. Notable in the list of accessible textbooks is
Griffiths et al., Modern Genetic Analysis (MGA).
(1) Click PubMed on the top menu.
Let’s try a number of other searches to demonstrate the potential of this system. Let’s
start with the words in the title of Chapter 6 of MGA: ‘Genetic Recombination in
Eukaryotes.’
(2) First type recombination in the search window and press Go.
The result is a daunting 120,000 plus research articles that mention the word in title,
abstract, or key words. It is a popular and important area, but this is far too general a
search term to be useful.
(3) How about genetic recombination? Try it. We still get more than 110,000 returns.
We are retrieving articles actually studying the process of recombination but also many
that are of more peripheral interest to us, for instance, simply using recombination as a
method to insert genes into chromosomes.
(4) Try the whole title genetic recombination in eukaryotes.
Now the output is cut to less than one thousand. That number would certainly be far
fewer publications than are actually concerned with recombination in eukaryotes. The
problem is probably with the term eukaryotes. The key words of publications might
include phylogenetic group, but not be as broad as all eukaryotes.
(5) Try genetic recombination in mammals.
We get almost 50,000 entries. Many of these clearly do not have recombination as the
primary subject of the paper.
Let’s have a look at how our terms might be related in the formal subject headings within
the National Library of Medicine. These Medical Subject Headings (MeSH) are used to
classify publications.
(6) Click on MeSH Browser under PubMed Services on the left panel.
(7) At the new screen, type recombination in the search window and press Go.
Note the definition of the term and the display of various subheadings. This is more like
the list of topics relevant to recombination that you would learn in a genetics course.
Having a look at these terms sometimes helps narrow your search field by presenting you
with some alternative (and narrower) terms. If we searched using the MeSH term
Recombination, Genetic, we would get more than 100,000 returns. When we look at the
subheadings covered we can see why this might be so.
(8) For example, transfection is included. Click on it.
The definition shows that it includes all papers that use transfection of DNA or plasmids
into cells for experimental purposes. Notice that it is included in the techniques tree as
well as the recombination tree. Many of the papers would be using it solely as a
technique and the publication would not be about the process of recombination itself.
(9) Click Recombination, Genetic, in the lower tree to go back to our original search.
(10) Now let’s try another term by clicking Crossing Over (Genetics).
This is narrower in scope and focuses on the chromosomal recombination process. Notice
that it is listed under Cytogenetics in one tree, and under Recombination, Genetic in the
other.
(11) Press the ‘Add’ button to open a PubMed search window for this term. Press
PubMed Search.
This yields a few thousand entries and a glance at the titles shows us that most are
directly concerned with chromosomal recombination.We could also focus our search by
excluding some of the subheadings in theRecombination tree.
(12) Open the MeSH browser again (from the left-hand menu) and
search Genetic Recombination.
(13) Click Add this term to the search.
(14) Now click Gene Transfer, Horizontal in the tree, and then toggle the drop-down
menu to the right of Add to NOT, then click Add.
(15) Next press PubMed Search. This returns an extensive list of papers dealing with
various aspects of chromosomal recombination.
(16) Now click the Preview/Index link in the bar under the search text box.
This feature shows you a list of the searches you have run and the number of results for
each. We could add more terms at this point.
(17) Toggle the drop-down menu at the bottom of the pageto Author and type Hartwell
LH in the text box. Press ’AND’, then ‘Preview’.
A new result appears in your list of searches with only a few entries. If you click on this
link for the number of results, you will see a subset of Hartwell papers dealing with
recombination processes.
There is a lesson here. It is relatively easy to define a narrowly focused search. For
instance, we could search recombination AND Hartwell LH in PubMed and get a dozen
or so journal references. It is often difficult however, to define a broad topic that does not
include a substantial amount of irrelevant material (at least to you). If the search is
narrowed to keep the amount of irrelevant material low, then it will almost invariably
exclude material that you would want included in your list.This means that you must use
a somewhat broader search and then make a judicious choice of material to keep.
(18) Let’s do a new PubMed search for phylogeography AND vertebrates. Over 500
entries are returned.
(19) Move down the list and choose the paper “Phylogeography of a stream-dwelling
frog (Pseudacris cadaverina) in southern California.”
(20) You can view the related articles in the box to the right.
(21) For your selected article check the little box to the left of the reference. You could
send the citation to yourself via email. By clicking here.
(22) Open the Entrez database.
(23) Type ìPseudacris cadaverina” into the Entrez and press go. The power of Entrez
should become immediately evident. The species is cross-listed to several different
databases. For example:
(a) there are three articles in pubmed (one of which is that above in #19)
(b) there are 73 nucleotide sequences in GenBank
(c) there are 67 protein sequences
(d) there is one listing in the taxonomy database (which you would expect since
there is only one species with this specific name).
(e) if we were to delete “cadaverina” we would still get one entry for taxonomy,
but all species in the genus Pseudacris would be listed.
(f) If you click on taxonomy for Pseudacris cadaverina you would recover the
entire classification for this species.
(g) there are 13 entries for PopSet. Here you will see the entire dataset for the
paper in #19 (and some similar studies that include the same species).
(24) In the PopSet database click on the first entry
(25) Click on the hyperlinked GI#13386615.
(26) Scroll down a bit and under the heading ìThe sequences in the study areî. You will
see each nucleotide sequence associated with this study. The hyperlinked numbers to the
left are called GenBank accession numbers. Click on one.
(27) Here you will see everything you need and want to know about this sequence: the
authors, the publication, the taxonomy, the genes, the protein sequence, and the
nucleotide sequence.
(28) Go back to where the sequences are listed under the heading ìthe sequences in the
study areî. Click here and then press ìgenerate alignmentî.
(29) You have just generated a multiple sequence alignment for the entire data set. There
are a total of 61 sequences, with the alignment for each 1098 characters long.
(30) At the top under viewing options select ìshow variations onlyî and re-do the
ìgenerate alignmentî. The first sequence is used as a reference and wherever there is a
deletion in other sequence you will see a gap ì-ì inserted. If the nucleotide residue is
identical to the reference sequence you will see a dot. If there was a mutation (relative to
the reference sequence) or a different nucleotide present you will see the letter code for
that particular nucleotide. (see below)
So in sequence 20, column (or base #) 68 there is a ìTî instead of ìCî present in most
other sequences. This is called a substitution. What kind? Transition or transversion?
(31) at the top of the page select ìfastaî under the Display menu (near top on the left).
Copy the alignment into a word document and save it. We will use this file later.
(32) Now let’s examine the links to books from the main PubMed page. Click on the
Books link at the upper right.
You could open any of the books by clicking on the book cover. From the book’s table
of contents you can also open the chapters directly. You will see various terms
underlined in the text. Click on one of these. A glossary appears to provide you with a
precise definition.
There is nothing to turn in for this tutorial. Just make sure you save the alignment of the
frog dataset and we will use it later. You should also obtain the paper associated with this
dataset. CSUB subscribes to Molecular Phylogenetics and Evolution via academic Search
Elite. Save the PDF for this paper.
Download