INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D 1 Today • • • • • • Introductions Administration Course Structure Learning Objectives Assessment Motivation 2 Introductions • Dr Catherine Blake – Email - cablake@email.unc.edu – Office - 214A Manning Hall • Lecture Time – 214 Manning Hall – Thurs 5:00-7:30pm • Office Time – Email – anytime – By Appointment – Tues and Wed 3 Operational Details • Web Page – http://www.ils.unc.edu/~cablake/INLS890-EBD – Username = ebd Password = spr2009 • Email – Fastest response time – Please email from your UNC account – Start the subject with INLS890 • University Honor Code is in effect 4 Course Objectives • This course combines theoretical models from discovery science, with a survey of informatics tools that support discovery. • The seminar will show-case the discovery process via a lecture series comprising both discipline and policy champions and thus reveal the synergy between synthesis and discovery and the need for interdisciplinary collaboration. 5 Theory Theme • Kuhn – Normal versus Revolutionary Science – Abnormalities • Chalmers – Observation – Falsification • Information Quality – Meta-analysis – Information quality 6 Informatics Theme • Language tools – – • Information Extraction Document Summarization - Text Mining - Entailment Social Networking – • Bibliometrics - Visualizations Workflow – • Myexperiment Domain specific software – Chrystallography - BLAST 7 Practice Theme • Synthesis – Timothy S. Carey, MD, MPH Sarah Graham Kenan Professor of Medicine Director, Cecil G Sheps Center for Health Services Research – Ila Cote, PhD, DABT Acting Division Director US Environmental Protection Agency National Center for Environmental Assessment • Discovery – Paul Jones Clinical Associate Professor School of Information and Library Science Director of ibiblio.org • Michael T Crimmins PhD. Mary Ann Smith Distinguished Professor of Chemistry UNC and Department Chair Department of Chemistry – Rudy L Juliano PhD. Boshamer Distinguished Professor of Pharmacology Principal Investigator, Carolina Center of Cancer Nanotechnology Excellence 8 Practice Theme • Discovery – Robert C Millikan DVM PhD Barbara Sorenson Hulka Distinguished Professor Department of Epidemiology School of Public Health – Jan F. Prins PhD. Professor of Computer Science and Chairman, Department of Computer Science – Alexander Tropsha, Ph.D. Professor and Chair Director, Laboratory for Molecular Modeling – Suzanne West, PhD Research Associate Professor Department of Epidemiology Acting Director, UNC-GSK Center of Excellence in Pharmacoepidemiology and Public Health • To be confirmed – Humanities Scholar – Steven W. Matson Ph.D. Professor and Chair Department of Biology 9 Typical Class Structure • Before class (All): Post expert questions • First Hour – Presentation by domain expert – Anointed domain expert – engage the presenter ! • Second Hour – Anointed informatics expert - present technologies – Discuss the intersection between theory, practice and informatics • Last 30 mins – Anointed domain expert – introduce next expert 10 Assignments • Informatics Review – What domain specific tools are used in your discipline ? – What generic tools exist for your discipline • Information extraction • Text mining tool kits – Post results to the wiki 11 Assignments • Engage the presenter – Introduce the presenter the week before – Read their materials ahead of time – Find out what else they do – Give us any context you can about the person – What are the key journals in the field 12 Assignment • Gap analysis – What informatics tools work in your discipline ? – What gaps exist between the academic work being done by these researchers and the informatics tools that are currently available ? 13 Assignments • Scientific practice in your domain – Conduct Interviews – Transcribe the interviews – Summarize your findings • Group activities • • • • Create wiki Review questions Submit IRB Keep track of reference 14 Dissemination • Dissemination – How are we going to get this to people in the field ? • • • • Health Science Library Paper in their conference Face to face visits … what other mechanisms 15 Assignments • Class participation – Read the assigned readings – Participate in class discussion – Contribute to the wiki 16 Assessment • Class Participation 20% – Attendance and contributions to discussion • Informatics Review 20% • Introducing and Engaging your speaker 20% • Gap Analysis – Data collection activities – Final report • Class contributions 10% 20% 10% 17 • Questions, Issues, Comments ? 18 Why are you here ? 19 Motivation • Massive increase in electronic text – MEDLINE • Abstracts from more than 5,000 journals • Current: more than 17 million citations • Growth: ~12000 new citations every week – Chemistry – more than 110,000 articles in 2002 alone • Consequences – Hundreds of thousands of relevant articles – Implicit connections between literature go Shift focus from Retrieval to Synthesis unnoticed Source: MEDLINE factsheet http://www.nlm.nih.gov/pubs/factsheets/medline.html Source: Calculated from ISI’s 418 highest ranked chemistry articles 20 Information Overload “One of the diseases of this age is the multiplicity of books; they doth so overcharge the world that it is not able to digest the abundance of idle matter that is every day hatched and brought forth into the world” - Barnaby Rich, 1613 21 Existing Text Mining • Clustering • Categorization • Association Rules IBM Intelligent Miner for text (Clustering) SAS Text Miner (Association Rules) 22 Example Pattern : Decision Tree 23 person P, P.degree = masters and P.income > 75,000 P.credit = excellent Kohonen Maps • Articles represented as vectors • Assign n random articles • Assign remaining articles to closest cluster Snowy peaks indicate highly funded research Blake,C and Tengs,T (2001) “The Nation’ Breast Cancer Research Portfolio: A view from 30,000 feet”, Avon Symposium, UC Irvine. NCI-funded research 1995-present 24 Knowledge Discovery in Literature B-Platelet Activity Target Literature A Magnesium B-Calcium Channel Blockers B-Serotonin Source Literature C Migraine ... Swanson, DR (1988) “Migraine and magnesium: eleven neglected connections”, Perspect. Biol. Med., 31: 526-57. Blake, C. & Pratt, W. (2002). A Semantic Approach to Identify Candidate Treatments from Existing Medical Literature. In AAAI 25 Symposium on Knowledge-based Approaches, Stanford, CA.