Towards Evidence-Based Discovery Informatics Tools for Synthesis Guest Speaker : Tim Cary Catherine Blake School of Information and Library Science University of North Carolina at Chapel Hill http://www.ils.unc.edu/~cablake cablake@email.unc.edu Systematic Review Process – – – – – – – Formulate the problem Locate and select studies Assess quality of studies Collect data Analyze and present results Interpret results Improve and update review 28 months from initial idea to publication Increased demand due to evidencebased medicine Guesswork guided by scientifically trained intuition Manual Synthesis Rescher (1978) MEDLINE Hypothesis Projection Embase Retrieval Select Context Information Corpus Extraction Extract Facts Collaboration Iteration Verification Verify Analysis Analyze Cochrane - RevMan • Review Manager (RevMan) is the software used for preparing and maintaining Cochrane reviews. • You can use RevMan for protocols and full reviews. It is most useful when you have formulated the question for the review, and allows you to prepare the text, build the tables showing the characteristics of studies and the comparisons in the review, and add study data. It can perform meta-analyses and present the results graphically. • Source: http://www.cc-ims.net/RevMan 4 Cochrane - GRADEpro • GRADEpro (GRADEprofiler) is the software used to create Summary of Findings (SoF) tables in Cochrane systematic reviews. It can retrieve data of the systematic review and meta-analyses from a Review Manager 5 file, combine these data with user-entered data, and then export a Summary of Findings table ready for import into Review Manager 5. It performs many of the calculations necessary to present the key results of systematic reviews in a table format and guides users through the process of grading the quality of the evidence using the GRADE approach. • Source: http://www.cc-ims.net/gradepro 5 Reporting Guidelines • CONSORT - reporting of RCTs • PRISMA (formerly QUOROM) [PDF document] preferred reporting items for systematic reviews and meta-analyses • STROBE - reporting of observational studies in epidemiology • EQUATOR Network - collection of reporting guidelines • Source: http://www.cochrane.org/index_authors_researchers.htm 6 Selection Step • Typical information retrieval framing – Input: MEDLINE – Output: Articles included in previous studies – Goal: identify weighting schemes that identify only articles included in a traditional analysis • Examples – Cohen AM, Hersh WR, Peterson K, Yen PY. Reducing Workload in Systematic Review Preparation Using Automated Citation Classification. JAMIA 2006;13(2):206-219. – Demner-Fushman D, Seckman C, Fisher C, Hauser S, Clayton J, Thoma G. Prototype System To Support Evidence-based Practice. AMIA Annu Symp Proc. November 2008:151-5. 7 Context Information • Study Information – e.g. date, location, ... • Population Information Loosely coupled to review focus – e.g. gender, age, ... • Risk Factor or Intervention – e.g. duration of exposure, confounders • Disease – e.g. stage, confounders Tightly coupled to review focus Collaborative Information Synthesis MEDLINE Embase Hypothesis Projection Retrieval External Data Context Information Corpus Extraction Facts Collaboration Iteration Verification Analysis Key: Estimate Missing Information 1 What are people with Breast Cancer exposed to? Studies with Breast Cancer patients Facts for each study •number of patients •age of patients •geographic location •risk-factor exposure … 2 What are people in a similar population exposed to? Database of risk factors BRFSS Codebook •question asked •age, gender •% responses 3 Are these rates significantly different? T. Tengs & N. D. Osgood (2001) “The link between smoking and Impotence: Two Decades of Evidence”, Preventive Medicine, 32:447-52 More than Automated Meta-Analysis • Traditional analysis – same study design – medicine = RCT – epidemiology = cohort Systematic Review • Information Synthesis Information Synthesis Key External database Entire study Main topic Secondary Information – any study that includes required information – augment missing information Natural Language Core Processing Genomics News Chemistry DocSouth Human-assisted Discovery and Synthesis Education Discovery Science Evidence-based Practice Human Discovery and Synthesis Breast Cancer Heterogeneous Literature Synthesis and Discovery Work Practices 12 METIS Information Extractor • Semantic Grammar • Features: words, numbers, and semantic types in the Unified Medical Language System (UMLS) {term;’age’} {term:’of’} {number;10<n2<110}{term;’to’}{number;10<n2<110} The age of breast cancer subjects ranged between 20 to 64 years old. {semantic type: neoplastic process, or disease} • Information extracted : • risk factor exposure (tobacco and alcohol ) • age (min, max, mean) • number of subjects with medical condition gender start and end dates 13 geographical location METIS Info Extractor – Evaluation • Diverse text corpus – epidemiology, surgery, biology, ... – cohort studies, case-control trials, ... • Evaluation – Metrics (precision, recall) – Annotators (developer, domain expert, expert annotator, novice) – Primary topic (breast cancer, impotence) – Secondary information (tobacco and alcohol consumption) METIS Info Extractor – Recall 1.0 0.9 0.8 Recall 0.7 0.6 0.5 Development Domain Expert Expert Annotator Novice Annotator 0.4 0.3 0.2 0.1 0.0 1 2 Rank 3 4 5 METIS Info Extractor – Precision Precision 1.0 0.9 Development 0.8 Domain Expert 0.7 Expert Annotator 0.6 Novice Annotator 0.5 0.4 0.3 0.2 0.1 0.0 1 2 3 Rank 4 5 METIS Verifier Converted Article Electronic version of article Verify information extracted METIS Verifier METIS Analyzer • Meta-Analysis – – – – Developed for agricultural application Requires empirical studies with a quantitative outcome Unit of study is an article - not a person Result – a unitless metric called an effect size • Two common meta-analysis techniques – Fixed effects – Randomized-effects model Evaluation: Compared generated effect size with examples in text books and published articles , Result: Same effect size Synthetic Estimate Evaluation Actual Estimated Tobacco Consumption Control Rate 1 0.8 0.6 0.4 0.2 0 1 2 3 4 Average 4 Average Article Identifier Alcohol Consumption Control Rate 1 0.8 Actual Estimated 0.6 0.4 0.2 0 1 2 3 Article Identifier