Searching the literature

advertisement
Competências Básicas de Investigação
Científica e de Publicação
Lecture 3: Searching the Literature
14 May 2013
Ganesha Associates
1
Types of scientific output
• Abstracts
• Primary journal articles
– peer-reviewed interpretations of original research
•
•
•
•
•
•
•
Reviews
Book chapters, monographs
Conference proceedings
Lectures, seminars
Sequences, data sets
Patents, other forms of intellectual property
Blogs, tweets…
14 May 2013
Ganesha Associates
2
Usage of output differs
5 July 2012
Copyright: Ganesha Associates 2012
3
Some sources of scientific content
•
•
•
•
•
•
Google
PubMed/Medline (NLM)
Scopus (Elsevier)
Web of Science (Thomson Reuters)
Google Scholar
PubMed Central, PubMed Central Europe
• SciELO, Biblioteca Virtual em Saude
• Science Direct, Ovid, SpringerLink, Wiley Online
Library, BiomedCentral, Public Library of Science,
SWETSwise…
• CAPES Portal de Periódicos
14 May 2013
Ganesha Associates
4
Each source is different
• Free
– Google, Google Scholar, Pubmed Central
• Subscription
– Scopus, ScienceDirect
• Abstracts and citations only
– PubMed, Web of Science
• Full text, single publisher
– SpringerLink
• Full text, many publishers
– Pubmed Central, SwetsWise Online Content
Classify sources of content
Abstract
only
Full
text
Free access Subscription
You can get access if…
• The journal is subscribed to by CAPES
• You have a personal subscription
• The journal is of the ‘Open Access’ type
– Note: some journals only make their content ‘Open Access’ after 6 or
longer months. Some journals contain a mixture of OA and non-OA
articles. See http://europepmc.org/journalList for more info.
• Journals in the ‘red’ categories are available anywhere.
• Most journals subscribed to by CAPES will be available from
more than one source.
• CAPES journals are only available from computers within the
University network unless you have remote access privileges.
14 May 2013
Ganesha Associates
7
So which sources should I use ?
• No single source contains all of the articles
relevant to your research
• Google has the broadest coverage, but not all
of the documents you find will be peerreviewed articles
• Scopus, WoS and PubMed give you the best
balance between quality and quantity, and, in
theory, should link to all the content
subscribed to by CAPES, plus OA content.
14 May 2013
Ganesha Associates
8
So usually you will visit several sources to
find the information you are looking for
?
Scielo
CAPES
Portal
Web of
Science
Scopus
National
Literature
OA: BMC
Or PLoS
Science
Direct
Springer
Link
24 August 2012
Ganesha Associates
PubMed
Google
Other
HighWire Databases,
e.g. NCBI
9
Components of a bibliographic database
• Content such as abstracts and full-text articles
[or a pointer to where these may be found]
• Metadata [data about data]
• Index
• Search engine
• Ranking/relevance algorithm
• Plus many additional features
14 May 2013
Ganesha Associates
10
Content (Basic PDF)
14 May 2013
Ganesha Associates
11
Content (HTML)
14 May 2013
Ganesha Associates
12
Content (Page source)
14 May 2013
Ganesha Associates
13
Content (metadata)
14 May 2013
Ganesha Associates
14
Sources of article metadata
• Journal name, publisher, ISSN
• Date of publication, volume and page
numbers
• Document object identifier [DOI]
• Article title
• Authors names
• Address, affiliation, contact details
• Article section identifiers
• Sources of funding
• Semantic tagging, e.g. protein name
14 May 2013
Ganesha Associates
15
The basis of search: Indexing
• The purpose of an index is to optimize speed and performance
in finding relevant documents for a search query.
• Without an index, the search engine would have to scan every
document in the corpus, which would require considerable time
and computing power.
• Metadata helps the indexing algorithm to select different
classes of terminology from which to make an index, so a search
can be carried out on just the authors names, for example
24 August 2012
Ganesha Associates
16
Search: how the result list is ranked
• Date of publication
• Relevance
– Frequency with which search terms occur in the
document
– Proximity of search terms
• Google’s PageRank algorithm uses "link
popularity”- a document is ranked higher if
there are more links to it
14 May 2013
Ganesha Associates
19
The question behind the query
• Search engines think in terms of words, but users
think in terms of sentences!
– How do you spell Bousfield?
– What do we know about BRCA1?
– Given these symptoms, what is the most likely
diagnosis?
– What are the side effects of aspirin?
– Has this chemical structure been synthesized before?
• “Cancer causes X” vs. “Y causes cancer”
What real queries look like - Google
•
•
•
•
•
•
•
pharmacogenomics and disorders
bacteria growth casein media effect
waal pseudomonas
TRPM2 PCR mouse
Chitinases in carnivorous plants
glycerophosphoinositol 4-phosphate
Dai N, Gubler C, Hengstler P, Meyenberger C,
Bauerfeind P. Improved capsule endoscopy after
bowel preparation. Gastrointest Endosc 2005;61(1)
28-31.
24 August 2012
Ganesha Associates
22
Query changes people actually make
• Query series 1
–
–
–
–
–
latrunculin
latrunculin fm3a cell arrest
latrunculin fm3a arrest
latrunculin fm3a
latrunculin FM3A
• Query series 2
–
–
–
–
cytokinin signalling in arabidopsis
"cytokinin signalling in arabidopsis"
cytokinin delta
spindly arabidopsis
• Results
– Remember to look beyond the first page. Compare the results of
Query 1 in PubMed and Google (add the term PubMed)
24 August 2012
Ganesha Associates
24
Improving search accuracy
• Wild card characters
– "a * saved is a * earned"
• Operators
– jaguar speed -car
– Pandas -site:wikipedia.org
– “ribosome”
• Synonyms
– MeSH terms
• Boolean terms
– AND, OR, NOT
• Faceted search
– GO terms
Anatomy of a query - Pubmed
• invasive fungal infections in young children
• invasive[All Fields] AND ("mycoses"[MeSH
Terms] OR "mycoses"[All Fields] OR
("fungal"[All Fields] AND "infections"[All
Fields]) OR "fungal infections"[All Fields]) AND
("Young Child"[Journal] OR ("young"[All Fields]
AND "children"[All Fields]) OR "young
children"[All Fields])
14 May 2013
Ganesha Associates
26
So…
• Using the same search terms will produce
different results in different databases
because:
– Content different
– Preparation of search terms will be different, e.g.
only Pubmed uses MeSH terms
– Indexing process, implementation of stemming,
removal of stop words will be different
– Ranking algorithms will be different
Quick tour
Break
Other types of database
• Some databases contain mainly text, but others contain image, sequence
or structural data
• The technologies required to search and retrieve these different data
types are very different.
• There is a growing amount of information in publicly available databases.
• For example, in 2013 the Nucleic Acids Research journal online Molecular
Biology Database Collection listed 1512.
• The National Center for Biotechnology Information (NCBI) and the
European Bioinformatics Institute(EBI) host some of the most important
databases used for biomedical research.
24 August 2012
Ganesha Associates
57
Linking different data types is a challenge
Gene
Expression
Warehouse
OMIM
Disease
ExPASy
SwissProt
PDB
ExPASy
Enzyme
Protein
Enzyme
LocusLink
Affy Fragment
Known Gene
MGD
Sequence
Metabolite
SNP
24 August 2012
SPAD
Sequence
Cluster
NCBI
dbSNP
Genbank
NMR
Pathway
KEGG
UniGene
Ganesha Associates
58
Databases available at NCBI
24 August 2012
Ganesha Associates
59
Other ways to search – BLAST, PubChem, UCSC
Genome Browser
By sequence – BLAST:
>DinoDNA from JURASSIC PARK p. 103 nt 1-1200
GAATTCCGGAAGCGAGCAAGAGATAAGTCCTGGCATCAGATACAGTTGGAGATAAGGACGGACGT
GTGGCAGCTCCCGCAGAGGATTCACTGGAAGTGCATTACCTATCCCATGGGAGCCATGGAGTTCGT
GGCGCTGGGGGGGCCGGATGCGGGCTCCCCCACTCCGTTCCCTGATGAAGCCGGAGCCTTCCTG
GGGCTGGGGGGGGGCG
By structure – PubChem:
24 August 2012
Ganesha Associates
63
Example of BLAST search results
24 August 2012
Ganesha Associates
64
PC Compound Record
24 August 2012
Ganesha Associates
65
Learning points
• Google is a good place to start
• Learn to use several information resources
• Modify your search terms during the
course of a search session
• Understand how the results are ranked
and don’t just look on the first page
13/08/2013
Ganesha Associates
Download