Competências Básicas de Investigação Científica e de Publicação Lecture 2: Hypotheses and Search August 2014 13/08/2013 Ganesha Associates Experimental vs. Observational studies No modification of experimental variables Useful to discover trends and associations Cannot directly be used to infer causality Compare responses different treatments Designed to avoid misleading results e.g. randomisation Can be used to infer cause and effect 9 September 2013 Ganesha Associates CC BY 3.0 2 Experimental and observational types of research The scientific process involves making models of how things work • These evolving models are described in the scientific literature • Sometimes the models are wrong, often they are incomplete • Scientific progress is driven by the communication and publication of the results of new research, and the reinterpretation of older work • The tool which makes all of this possible is the hypothesis 9 September 2013 Ganesha Associates CC BY 3.0 4 9 September 2013 Ganesha Associates CC BY 3.0 5 9 September 2013 Ganesha Associates CC BY 3.0 6 9 September 2013 Ganesha Associates CC BY 3.0 7 Main learning points • Student projects fall into three categories – No hypothesis, i.e. observational – Weak hypothesis – Strong hypothesis • The work will be published in a – National journal – Low impact factor journal – High impact factor journal • Starting with strong hypothesis improves your chances of getting published in a good journal 9 September 2013 Ganesha Associates CC BY 3.0 8 9 September 2013 Ganesha Associates CC BY 3.0 9 What is a strong hypothesis ? • A strong hypothesis is based on a series of premises – things that are already known with some certainty • Each premise must be supported by references back to the (international) primary literature • So a strong hypothesis will be backed by references to recent papers in high quality journals 9 September 2013 Ganesha Associates CC BY 3.0 10 9 September 2013 Ganesha Associates CC BY 3.0 11 Coin-tossing - an example • I wonder how many heads or tails I will get if I toss this coin 100 times – No model • The frequency distribution of heads and tails will be approximated by a binomial distribution with n=100 and p=0.5 – Simple model, based on symmetry • A detailed analysis of the dynamics reveals that the probability of a head is 0.51 – Complex model, based on asymmetry, aerodynamics, etc 9 September 2013 Ganesha Associates CC BY 3.0 12 Coin-tossing – impact on CV 1. None, or possibly negative 2. R. A. Fisher and others did perform this experiment in the early days of biological statistics, before the advent of computers, as a proof that the binomial distribution tended towards a normal one at high levels of n. Interestingly they all found that the probability of a head p was usually slightly higher than 0.5, but this difference was ignored. 3. Persi Diacusis, Susan Holmes and Richard Montgomery (Stanford, 2004) publish a paper on the ‘Dynamical bias in the coin toss’ proving that the lack of total symmetry in a coin means that the probability of a head will always be slightly greater than 0.5. 9 September 2013 Ganesha Associates CC BY 3.0 13 Coin tossing - relevance • • • • Children with unilateral hearing loss (UHL) have been found to have lower language scores, and increased rate of speech therapy, grade failures, or needing Individualized Education Plans . The objective of this study was to determine whether language skills and educational performance improved or worsened over time in a cohort of children with UHL. To determine factors associated with physical therapy or occupational therapy evaluation and speech or swallow therapy evaluation in hospitalized children with traumatic brain injury; to describe when during the hospital stay the initial therapy evaluations typically occur; and to quantify any between-hospital variation in therapy evaluation. Articulation disorders in young children are due to defects occurring at a certain stage in sensory and motor development. Some children with functional articulation disorders may also have sensory integration dysfunction (SID). We hypothesized that speech therapy would be less efficacious in children with SID than in those without SID The present study provides data that support the hypothesis that children who stutter and typically developing children differ on both composite temperament factors and temperament scales. The findings were interpreted within existing frameworks of temperament development, as well as with regard to previous studies of temperament in CWS. 9 September 2013 Ganesha Associates CC BY 3.0 14 Case study: Hummingbird territorial behaviour 9 September 2013 Ganesha Associates CC BY 3.0 15 Hummingbird territorial behaviour Most hummingbird species demonstrate strong territorial behavior If a bluffing charge attack does not work, the resident bird may engage the trespasser in a brief but intense physical battle So why do hummingbirds defend territories ? H0: Hummingbirds are randomly distributed in space and time. 9 September 2013 Ganesha Associates CC BY 3.0 16 Hummingbird territorial behaviour H1: If territory = F(energy), then behavior seasonal but not speciesdependent H2: If territory = F(mating), then behavior should be species and sex dependent H3: If… H4: If… 9 September 2013 Ganesha Associates CC BY 3.0 17 Territorial behaviour: status 1971 • Time, Energy, and Territoriality of the Anna Hummingbird (Calypte anna) Science 173 (1971) 818821. • When territory quality decreases defenders may switch to less expensive forms of defense because the energy savings outweigh the loss of resources • Augmented territorial defense during the breeding season is made possible by increased feeding efficiency due to the availability at this time of very nectar-rich flowers. • Individuals with large territories are more successful reproductively. 9 September 2013 Ganesha Associates CC BY 3.0 18 Hummingbird territoriality since • Hovering performance of hummingbirds in hyperoxic gas mixtures. J Exp Biol. 2001 Jun;204(Pt 11):2021-7. • Adipose energy stores, physical work, and the metabolic syndrome: lessons from hummingbirds. Nutr J. 2005 Dec 13;4:36. • Neural specialization for hovering in hummingbirds: hypertrophy of the pretectal nucleus Lentiformis mesencephali. J Comp Neurol. 2007 Jan 10;500(2):211-21. • Three-dimensional kinematics of hummingbird flight. J Exp Biol. 2007 Jul;210(Pt 13):2368-82. 9 September 2013 Ganesha Associates CC BY 3.0 19 Hypothesis lecture learning points • Hypotheses can be weak (observational) or strong (mechanism-based) • For example, a hypothesis which predicts that a tossed coin will end up ‘heads’ 50% of the time is much weaker than one that can predict the exact sequence of ‘heads’ and ‘tails’ • So hypothesis ‘quality’ is important • A quick test for quality? 9 September 2013 Ganesha Associates CC BY 3.0 20 Hypothesis lecture learning points • Good hypotheses build directly onto previous work • So they need to become technically more sophisticated over time moving from the general to the particular • A given problem can be associated with a number of very different hypotheses – your experiments should include tests to exclude these alternative explanations 9 September 2013 Ganesha Associates CC BY 3.0 21 Search 13/08/2013 Ganesha Associates Some sources of scientific content • • • • • • Google PubMed/Medline (NLM) Scopus (Elsevier) Web of Science (Thomson Reuters) Google Scholar PubMed Central, PubMed Central Europe • SciELO, Biblioteca Virtual em Saude • Science Direct, Ovid, SpringerLink, Wiley Online Library, BiomedCentral, Public Library of Science, SWETSwise… • CAPES Portal de Periódicos 14 May 2013 Ganesha Associates 23 Each source is different • Free – Google, Google Scholar, Pubmed Central • Subscription – Scopus, ScienceDirect • Abstracts and citations only – PubMed, Web of Science • Full text, single publisher – SpringerLink • Full text, many publishers – Pubmed Central, SwetsWise Online Content Classify sources of content Abstract only Full text Free access Subscription You can get access if… • The journal is subscribed to by CAPES • You have a personal subscription • The journal is of the ‘Open Access’ type – Note: some journals only make their content ‘Open Access’ after 6 or longer months. Some journals contain a mixture of OA and non-OA articles. See http://europepmc.org/journalList for more info. • Journals in the ‘red’ categories are available anywhere. • Most journals subscribed to by CAPES will be available from more than one source. • CAPES journals are only available from computers within the University network unless you have remote access privileges. 14 May 2013 Ganesha Associates 26 So which sources should I use ? • No single source contains all of the articles relevant to your research • Google has the broadest coverage, but not all of the documents you find will be peerreviewed articles • Scopus, WoS and PubMed give you the best balance between quality and quantity, and, in theory, should link to all the content subscribed to by CAPES, plus OA content. 14 May 2013 Ganesha Associates 27 Components of a bibliographic database • Content such as abstracts and full-text articles [or a pointer to where these may be found] • Metadata [data about data] • Index • Search engine • Ranking/relevance algorithm • Plus many additional features 14 May 2013 Ganesha Associates 28 Content (Basic PDF) 14 May 2013 Ganesha Associates 29 Content (HTML) 14 May 2013 Ganesha Associates 30 The basis of search: Indexing • The purpose of an index is to optimize speed and performance in finding relevant documents for a search query. • Without an index, the search engine would have to scan every document in the corpus, which would require considerable time and computing power. • Metadata helps the indexing algorithm to select different classes of terminology from which to make an index, so a search can be carried out on just the authors names, for example 24 August 2012 Ganesha Associates 31 Search: how the result list is ranked • Date of publication • Relevance – Frequency with which search terms occur in the document – Proximity of search terms • Google’s PageRank algorithm also uses "link popularity”- a document is ranked higher if there are more links to it 14 May 2013 Ganesha Associates 32 13/08/2013 Ganesha Associates The question behind the query • Search engines think in terms of words, but users think in terms of sentences, specific problems! – How do you spell Bousfield? – What do we know about BRCA1? – Given these symptoms, what is the most likely diagnosis? – What are the side effects of aspirin? – Has this chemical structure been synthesized before? • “Cancer causes X” vs. “Y causes cancer” What real queries look like - Google • • • • • • • pharmacogenomics and disorders bacteria growth casein media effect waal pseudomonas TRPM2 PCR mouse Chitinases in carnivorous plants glycerophosphoinositol 4-phosphate Dai N, Gubler C, Hengstler P, Meyenberger C, Bauerfeind P. Improved capsule endoscopy after bowel preparation. Gastrointest Endosc 2005;61(1) 28-31. 24 August 2012 Ganesha Associates 35 Query changes people actually make • Query series 1 – – – – – latrunculin latrunculin fm3a cell arrest latrunculin fm3a arrest latrunculin fm3a latrunculin FM3A • Query series 2 – – – – cytokinin signalling in arabidopsis "cytokinin signalling in arabidopsis" cytokinin delta spindly arabidopsis • Results – Remember to look beyond the first page. Compare the results of Query 1 in PubMed and Google (add the term PubMed) 24 August 2012 Ganesha Associates 36 13/08/2013 Ganesha Associates 13/08/2013 Ganesha Associates Anatomy of a query - Pubmed • invasive fungal infections in young children • invasive[All Fields] AND ("mycoses"[MeSH Terms] OR "mycoses"[All Fields] OR ("fungal"[All Fields] AND "infections"[All Fields]) OR "fungal infections"[All Fields]) AND ("Young Child"[Journal] OR ("young"[All Fields] AND "children"[All Fields]) OR "young children"[All Fields]) 14 May 2013 Ganesha Associates 40 Boolean terms 13/08/2013 Ganesha Associates Improving search accuracy • Wild card characters – "a * saved is a * earned" • Operators – jaguar speed -car – Pandas -site:wikipedia.org – “ribosome” • Synonyms – MeSH terms • Boolean terms – AND, OR, NOT • Faceted search – GO terms So… • Using the same search terms will produce different results in different databases because: – Content different – Preparation of search terms will be different, e.g. only Pubmed uses MeSH terms – Indexing process, implementation of stemming, removal of stop words will be different – Ranking algorithms will be different Quick tour Learning points • Google, Pubmed, Scopus and WoS are good places from which to start building an hypothesis • Learn to use several information resources because they are all different! • Modify your search terms during the course of a search session • Understand how the results are ranked and don’t just look on the first page 13/08/2013 Ganesha Associates