Competências Básicas de Investigação Científica e de Publicação Lecture 3: Searching the Literature 14 May 2013 Ganesha Associates 1 Types of scientific output • Abstracts • Primary journal articles – peer-reviewed interpretations of original research • • • • • • • Reviews Book chapters, monographs Conference proceedings Lectures, seminars Sequences, data sets Patents, other forms of intellectual property Blogs, tweets… 14 May 2013 Ganesha Associates 2 Usage of output differs 5 July 2012 Copyright: Ganesha Associates 2012 3 Some sources of scientific content • • • • • • Google PubMed/Medline (NLM) Scopus (Elsevier) Web of Science (Thomson Reuters) Google Scholar PubMed Central, PubMed Central Europe • SciELO, Biblioteca Virtual em Saude • Science Direct, Ovid, SpringerLink, Wiley Online Library, BiomedCentral, Public Library of Science, SWETSwise… • CAPES Portal de Periódicos 14 May 2013 Ganesha Associates 4 Each source is different • Free – Google, Google Scholar, Pubmed Central • Subscription – Scopus, ScienceDirect • Abstracts and citations only – PubMed, Web of Science • Full text, single publisher – SpringerLink • Full text, many publishers – Pubmed Central, SwetsWise Online Content Classify sources of content Abstract only Full text Free access Subscription You can get access if… • The journal is subscribed to by CAPES • You have a personal subscription • The journal is of the ‘Open Access’ type – Note: some journals only make their content ‘Open Access’ after 6 or longer months. Some journals contain a mixture of OA and non-OA articles. See http://europepmc.org/journalList for more info. • Journals in the ‘red’ categories are available anywhere. • Most journals subscribed to by CAPES will be available from more than one source. • CAPES journals are only available from computers within the University network unless you have remote access privileges. 14 May 2013 Ganesha Associates 7 So which sources should I use ? • No single source contains all of the articles relevant to your research • Google has the broadest coverage, but not all of the documents you find will be peerreviewed articles • Scopus, WoS and PubMed give you the best balance between quality and quantity, and, in theory, should link to all the content subscribed to by CAPES, plus OA content. 14 May 2013 Ganesha Associates 8 So usually you will visit several sources to find the information you are looking for ? Scielo CAPES Portal Web of Science Scopus National Literature OA: BMC Or PLoS Science Direct Springer Link 24 August 2012 Ganesha Associates PubMed Google Other HighWire Databases, e.g. NCBI 9 Components of a bibliographic database • Content such as abstracts and full-text articles [or a pointer to where these may be found] • Metadata [data about data] • Index • Search engine • Ranking/relevance algorithm • Plus many additional features 14 May 2013 Ganesha Associates 10 Content (Basic PDF) 14 May 2013 Ganesha Associates 11 Content (HTML) 14 May 2013 Ganesha Associates 12 Content (Page source) 14 May 2013 Ganesha Associates 13 Content (metadata) 14 May 2013 Ganesha Associates 14 Sources of article metadata • Journal name, publisher, ISSN • Date of publication, volume and page numbers • Document object identifier [DOI] • Article title • Authors names • Address, affiliation, contact details • Article section identifiers • Sources of funding • Semantic tagging, e.g. protein name 14 May 2013 Ganesha Associates 15 The basis of search: Indexing • The purpose of an index is to optimize speed and performance in finding relevant documents for a search query. • Without an index, the search engine would have to scan every document in the corpus, which would require considerable time and computing power. • Metadata helps the indexing algorithm to select different classes of terminology from which to make an index, so a search can be carried out on just the authors names, for example 24 August 2012 Ganesha Associates 16 Search: how the result list is ranked • Date of publication • Relevance – Frequency with which search terms occur in the document – Proximity of search terms • Google’s PageRank algorithm uses "link popularity”- a document is ranked higher if there are more links to it 14 May 2013 Ganesha Associates 19 The question behind the query • Search engines think in terms of words, but users think in terms of sentences! – How do you spell Bousfield? – What do we know about BRCA1? – Given these symptoms, what is the most likely diagnosis? – What are the side effects of aspirin? – Has this chemical structure been synthesized before? • “Cancer causes X” vs. “Y causes cancer” What real queries look like - Google • • • • • • • pharmacogenomics and disorders bacteria growth casein media effect waal pseudomonas TRPM2 PCR mouse Chitinases in carnivorous plants glycerophosphoinositol 4-phosphate Dai N, Gubler C, Hengstler P, Meyenberger C, Bauerfeind P. Improved capsule endoscopy after bowel preparation. Gastrointest Endosc 2005;61(1) 28-31. 24 August 2012 Ganesha Associates 22 Query changes people actually make • Query series 1 – – – – – latrunculin latrunculin fm3a cell arrest latrunculin fm3a arrest latrunculin fm3a latrunculin FM3A • Query series 2 – – – – cytokinin signalling in arabidopsis "cytokinin signalling in arabidopsis" cytokinin delta spindly arabidopsis • Results – Remember to look beyond the first page. Compare the results of Query 1 in PubMed and Google (add the term PubMed) 24 August 2012 Ganesha Associates 24 Improving search accuracy • Wild card characters – "a * saved is a * earned" • Operators – jaguar speed -car – Pandas -site:wikipedia.org – “ribosome” • Synonyms – MeSH terms • Boolean terms – AND, OR, NOT • Faceted search – GO terms Anatomy of a query - Pubmed • invasive fungal infections in young children • invasive[All Fields] AND ("mycoses"[MeSH Terms] OR "mycoses"[All Fields] OR ("fungal"[All Fields] AND "infections"[All Fields]) OR "fungal infections"[All Fields]) AND ("Young Child"[Journal] OR ("young"[All Fields] AND "children"[All Fields]) OR "young children"[All Fields]) 14 May 2013 Ganesha Associates 26 So… • Using the same search terms will produce different results in different databases because: – Content different – Preparation of search terms will be different, e.g. only Pubmed uses MeSH terms – Indexing process, implementation of stemming, removal of stop words will be different – Ranking algorithms will be different Quick tour Break Other types of database • Some databases contain mainly text, but others contain image, sequence or structural data • The technologies required to search and retrieve these different data types are very different. • There is a growing amount of information in publicly available databases. • For example, in 2013 the Nucleic Acids Research journal online Molecular Biology Database Collection listed 1512. • The National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute(EBI) host some of the most important databases used for biomedical research. 24 August 2012 Ganesha Associates 57 Linking different data types is a challenge Gene Expression Warehouse OMIM Disease ExPASy SwissProt PDB ExPASy Enzyme Protein Enzyme LocusLink Affy Fragment Known Gene MGD Sequence Metabolite SNP 24 August 2012 SPAD Sequence Cluster NCBI dbSNP Genbank NMR Pathway KEGG UniGene Ganesha Associates 58 Databases available at NCBI 24 August 2012 Ganesha Associates 59 Other ways to search – BLAST, PubChem, UCSC Genome Browser By sequence – BLAST: >DinoDNA from JURASSIC PARK p. 103 nt 1-1200 GAATTCCGGAAGCGAGCAAGAGATAAGTCCTGGCATCAGATACAGTTGGAGATAAGGACGGACGT GTGGCAGCTCCCGCAGAGGATTCACTGGAAGTGCATTACCTATCCCATGGGAGCCATGGAGTTCGT GGCGCTGGGGGGGCCGGATGCGGGCTCCCCCACTCCGTTCCCTGATGAAGCCGGAGCCTTCCTG GGGCTGGGGGGGGGCG By structure – PubChem: 24 August 2012 Ganesha Associates 63 Example of BLAST search results 24 August 2012 Ganesha Associates 64 PC Compound Record 24 August 2012 Ganesha Associates 65 Learning points • Google is a good place to start • Learn to use several information resources • Modify your search terms during the course of a search session • Understand how the results are ranked and don’t just look on the first page 13/08/2013 Ganesha Associates