Content BDK12-2 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University BDK12-2 1 A classification of knowledge-based content • Bibliographic – By definition rich in metadata • Full-text – Everything on-line • Annotated – Non-text or structured text annotated with text • Aggregations – Bringing together all of the above • These categories are admittedly fuzzy, and increasing numbers of resources have more than one type BDK12-2 2 Bibliographic content • Bibliographic databases – The old (e.g., MEDLINE) have been revitalized with new features – New ones (e.g., National Guidelines Clearinghouse) have emerged • Web catalogs – Share many characteristics of traditional bibliographic databases • Real simple syndication/Rich site summary (RSS) – “Feeds” provide information about new content BDK12-2 3 Bibliographic databases • Contain metadata about (mostly) journal articles and other resources typically found in libraries • Produced by – U.S. government – most produced by National Library of Medicine (NLM, www.nlm.nih.gov) • e.g., MEDLINE, genomics information, etc. – Commercial publishers, e.g., • EMBASE – part of larger SciVal • CINAHL – Cumulative Index to Nursing and Allied Health Literature • ACM Guide to Computing Literature – computer science and related areas BDK12-2 4 MEDLINE • References to biomedical journal literature – Original medical IR application – system for searching MEDLINE launched in 1971 with literature maintained in MEDLARS system dating back to 1966 • Name derives from MEDLARS On-Line – MEDLINE – Free to world since 1997 via PubMed – http://pubmed.gov • Now with links to full text of articles and other resources • Statistics – – – – http://www.nlm.nih.gov/bsd/bsd_key.html Over 22 million references to peer-reviewed literature Over 5,000 journals, mostly English language About 750,000 new references added yearly BDK12-2 5 National Guidelines Clearinghouse • Produced by Agency for Healthcare Research and Quality (AHRQ) – www.guideline.gov • Contains detailed information about guidelines – Including degree they are evidence-based – Interface allows comparison of elements in database for multiple guidelines • Has links to those that are free on Web and links to producers when proprietary BDK12-2 6 Web catalogs • Generally aim to provide quality-filtered Web sites aimed at specific audiences – Distinction between catalogs and sites blurry • Some are aimed towards clinicians – HON Select – http://www.hon.ch/HONselect/ – Translating Research into Practice – www.tripdatabase.com • Others are aimed towards patients/consumers – Healthfinder – www.healthfinder.gov BDK12-2 7 RSS • RSS “feeds” provide short summaries, typically of news, journal articles, or other recent postings on Web sites • Users receive RSS feeds by an RSS aggregator that can typically be configured for the site(s) desired and to filter based on content – Work as standalone, in Web browsers, in email clients, etc. • Two versions (1.0, 2.0) but basically provide – Title – name of item – Link – URL of full page – Description – brief description of page BDK12-2 8 Full-text content • Contains complete text as well as tables, figures, images, etc. • If there is corresponding print version, both are usually identical • Includes – Periodicals – Books – Web sites – may include either of above BDK12-2 9 Full-text primary literature • Almost all biomedical journals available electronically – Many published by Highwire Press (www.highwire.org), which adds value to content of original publisher, including British Medical Journal, Journal of the American Medical Association, New England Journal of Medicine, etc. – Also published by leading commercial scientific publishers, e.g., Elsevier, Kluwer, Springer, etc. – Growing number available via open-access model, e.g., Biomed Central (BMC), Public Library of Science (PLoS) – Another source of full-text papers is PubMed Central (PMC; http://pubmedcentral.gov) BDK12-2 10 Books • Textbooks – Most well-known clinical textbooks are now available electronically • e.g., Harrison’s Principles of Internal Medicine – Most are bundled into large collections by publishers • e.g., Access Medicine (McGraw-Hill), Elsevier, Kluwer – NLM has developed books site as part of Entrez • http://www.ncbi.nlm.nih.gov/books • Compendia of drugs, diseases, evidence, etc. • Handbooks – very popular with clinicians • Increasingly published on mobile devices BDK12-2 11 Value added for electronic books • Multimedia, e.g., skin lesions, shuffling gait of Parkinson’s Disease, etc. • Bundling of multiple books • Can be updated in between “editions” • Linkage to other information, e.g., to references, selfassessments, updates, other resources, etc. BDK12-2 12 Web sites • Defined more narrowly here to refer to coherent collections of information on Web • Usually take advantage of Web features, such as linking, multimedia • Increasingly integrated with other resources and available on different platforms (e.g., integrated into electronic health records [EHRs], on smartphones, etc.) BDK12-2 13 Some notable full-text content on Web sites • Government agencies – National Cancer Institute • www.cancer.gov – Centers for Disease Control – travel and infection information • http://www.cdc.gov/DiseasesConditions • http://www.cdc.gov/travel/ – Other NIH institutes, e.g., National Heart, Lung, and Blood Institute (NHLBI) • www.nhlbi.nih.gov BDK12-2 14 Full-text Web sites (cont.) • Physician-oriented medical news and overviews, e.g., – Medscape – www.medscape.com – PEPID – www.pepid.com – Many professional societies provide to members, e.g., http://www.acponline.org/clinical_information/ • Patient/consumer-oriented, e.g., – Intelihealth – www.intelihealth.com – NetWellness – www.netwellness.com – WebMD – www.webmd.com • Many mobile apps provide health information, e.g., – iTriage – www.itriagehealth.com BDK12-2 15 Other interesting types of Web content • Wikipedia – www.wikipedia.org – Encyclopedia with free access and distributed authorship – Some concerns about manipulation (McHenry, 2004) but • Comparable to Encyclopedia Britannica? (Giles, 2005 – rebuttal: Anonymous, 2006) • Health information quality is reasonably good (Nicholson, 2006) • Content retrieved prominently in most Web searches (Laurent, 2009) • Making attempt to improve quality of medical content (Heilman, 2013) • Body of knowledge – Software Engineering Body of Knowledge (SWEBOK, www.swebok.org) organizes knowledge of field • Social media/Web 2.0 and beyond (Lee, 2011) BDK12-2 16 Annotated • Non-text or structured text annotated with text • Includes – Image collections – Citation databases – Evidence-based medicine databases – Clinical decision support – Genomics databases – Other databases BDK12-2 17 Image collections • Most prominent in the “visual” medical specialties, such as radiology, pathology, and dermatology • Well-known collections include – Visible Human – http://www.nlm.nih.gov/research/visible/visible_human.html – Lieberman’s eRadiology – http://eradiology.bidmc.harvard.edu – WebPath – http://library.med.utah.edu/WebPath/webpath.html – More pathology – PEIR, www.peir.net – DermIS – www.dermis.net – More dermatology, also a decision-support system – www.visualdx.com • Many have associated text, which assists with indexing and retrieval BDK12-2 18 Citation databases • Science Citation Index and Social Science Citation Index – Database of journal articles that have been cited by other journal articles – Now part of a package called Web of Science, which itself is part of a larger product, Web of Knowledge (Thomson-Reuters) • http://wokinfo.com • SCOPUS – http://www.elsevier.com/onlinetools/scopus • Google Scholar – http://scholar.google.com BDK12-2 19 Evidence-based medicine databases • Cochrane Database of Systematic Reviews – http://www.cochrane.org – Collection of systematic reviews, kept updated • Evidence “formularies” – Clinical Evidence (BMJ) – http://clinicalevidence.bmj.com/x/index.html – JAMAevidence – http://jamaevidence.com • Up to Date – www.uptodate.com – Clinically oriented overviews of medicine • Essential Evidence Plus (formerly InfoPOEMS, “Patient-oriented evidence that matters”) – www.essentialevidenceplus.com • PubMed Health – https://www.ncbi.nlm.nih.gov/pubmedhealth/ – Systematic reviews and summaries of systematic reviews BDK12-2 20 Clinical decision support (CDS) • Content used in CDS systems, usually part of EHRs – Order sets (usually “evidence-based”) – CDS rules – Health/disease management templates • Growing and evolving commercial market for such tools, especially as EHR adoption increases; leaders include – Zynx – www.zynxhealth.com – Thomson Reuters Cortellis – http://cortellis.thomsonreuters.com – EHR vendors themselves and partners BDK12-2 21 Genomics databases • National Center for Biotechnology Information (NCBI, www.ncbi.nlm.nih.gov; NCBI, 2015) collection links – Literature references – MEDLINE – Textbook of genetic diseases – On-Line Mendelian Inheritance in Man – Sequence databases – Genbank – Structure databases – Molecular Modeling Database – Genomes – Catalog of genes – Maps – Locations of genes on chromosomes • More in bioinformatics unit… BDK12-2 22 Other databases • Cases (BMC, from Journal of Medical Case Reports and others) – www.casesdatabase.com • ClinicalTrials.gov – www.clinicaltrials.gov – Originally database of clinical trials funded by NIH – Now used as register for clinical trials, with results reporting for some (DeAngelis, 2005; Laine, 2007; Zarin, 2013; Zarin, 2015) • NIH RePORTER – http://projectreporter.nih.gov/reporter.cfm – Database of all research grants funded by NIH – Replaced the CRISP database BDK12-2 23 Aggregations – integrating many resources • Clinical – growing tendency of publishers to aggregate resources into comprehensive products – Merck Medicus – www.merckmedicus.com • Collection of many resources available to any licensed US physician – ACP Smart Medicine – http://smartmedicine.acponline.org • Bundle of resources – Evidence compendium – formerly called Physician’s Information and Education Resource (PIER) – Journals – Annals of Internal Medicine, ACP Journal Club – Clinical guidelines BDK12-2 24 Other aggregations • Biomedical research: Model organism databases, e.g., Mouse Genome Informatics – www.informatics.jax.org – Combines genomics and related data, bibliographic database, gene references, etc. • Consumer: MEDLINEplus – http://medlineplus.gov – Integrates a variety of licensed resources and public Web sites BDK12-2 25