Evidence-Based Discovery

advertisement
INLS890
Evidence-Based Discovery
Spring, 2009
Catherine Blake, Ph.D
1
Today
•
•
•
•
•
•
Introductions
Administration
Course Structure
Learning Objectives
Assessment
Motivation
2
Introductions
• Dr Catherine Blake
– Email - cablake@email.unc.edu
– Office - 214A Manning Hall
• Lecture Time
– 214 Manning Hall
– Thurs 5:00-7:30pm
• Office Time
– Email – anytime
– By Appointment – Tues and Wed
3
Operational Details
• Web Page
– http://www.ils.unc.edu/~cablake/INLS890-EBD
– Username = ebd Password = spr2009
• Email
– Fastest response time
– Please email from your UNC account
– Start the subject with INLS890
• University Honor Code is in effect
4
Course Objectives
• This course combines theoretical models from
discovery science, with a survey of informatics
tools that support discovery.
• The seminar will show-case the discovery
process via a lecture series comprising both
discipline and policy champions and thus
reveal the synergy between synthesis and
discovery and the need for interdisciplinary
collaboration.
5
Theory Theme
• Kuhn
– Normal versus Revolutionary Science
– Abnormalities
• Chalmers
– Observation
– Falsification
• Information Quality
– Meta-analysis
– Information quality
6
Informatics Theme
•
Language tools
–
–
•
Information Extraction
Document Summarization
- Text Mining
- Entailment
Social Networking
–
•
Bibliometrics
- Visualizations
Workflow
–
•
Myexperiment
Domain specific software
–
Chrystallography
- BLAST
7
Practice Theme
• Synthesis
– Timothy S. Carey, MD, MPH Sarah Graham Kenan Professor of
Medicine Director, Cecil G Sheps Center for Health Services Research
– Ila Cote, PhD, DABT Acting Division Director US Environmental
Protection Agency National Center for Environmental Assessment
• Discovery
– Paul Jones Clinical Associate Professor School of Information and
Library Science Director of ibiblio.org
• Michael T Crimmins PhD. Mary Ann Smith Distinguished Professor
of Chemistry UNC and Department Chair Department of Chemistry
– Rudy L Juliano PhD. Boshamer Distinguished Professor of
Pharmacology Principal Investigator, Carolina Center of Cancer
Nanotechnology Excellence
8
Practice Theme
• Discovery
– Robert C Millikan DVM PhD Barbara Sorenson Hulka Distinguished
Professor Department of Epidemiology School of Public Health
– Jan F. Prins PhD. Professor of Computer Science and Chairman,
Department of Computer Science
– Alexander Tropsha, Ph.D. Professor and Chair Director, Laboratory
for Molecular Modeling
– Suzanne West, PhD Research Associate Professor Department of
Epidemiology Acting Director, UNC-GSK Center of Excellence in
Pharmacoepidemiology and Public Health
• To be confirmed
– Humanities Scholar
– Steven W. Matson Ph.D. Professor and Chair Department of Biology
9
Typical Class Structure
• Before class (All): Post expert questions
• First Hour
– Presentation by domain expert
– Anointed domain expert – engage the presenter !
• Second Hour
– Anointed informatics expert - present technologies
– Discuss the intersection between theory, practice
and informatics
• Last 30 mins
– Anointed domain expert – introduce next expert
10
Assignments
• Informatics Review
– What domain specific tools are used in
your discipline ?
– What generic tools exist for your discipline
• Information extraction
• Text mining tool kits
– Post results to the wiki
11
Assignments
• Engage the presenter
– Introduce the presenter the week before
– Read their materials ahead of time
– Find out what else they do
– Give us any context you can about the
person
– What are the key journals in the field
12
Assignment
• Gap analysis
– What informatics tools work in your
discipline ?
– What gaps exist between the academic
work being done by these researchers and
the informatics tools that are currently
available ?
13
Assignments
• Scientific practice in your domain
– Conduct Interviews
– Transcribe the interviews
– Summarize your findings
• Group activities
•
•
•
•
Create wiki
Review questions
Submit IRB
Keep track of reference
14
Dissemination
• Dissemination
– How are we going to get this to people in
the field ?
•
•
•
•
Health Science Library
Paper in their conference
Face to face visits
… what other mechanisms
15
Assignments
• Class participation
– Read the assigned readings
– Participate in class discussion
– Contribute to the wiki
16
Assessment
• Class Participation
20%
– Attendance and contributions to discussion
• Informatics Review
20%
• Introducing and Engaging your speaker
20%
• Gap Analysis
– Data collection activities
– Final report
• Class contributions
10%
20%
10%
17
• Questions, Issues, Comments ?
18
Why are you here ?
19
Motivation
• Massive increase in electronic text
– MEDLINE
• Abstracts from more than 5,000 journals
• Current: more than 17 million citations
• Growth: ~12000 new citations every week
– Chemistry – more than 110,000 articles in 2002
alone
• Consequences
– Hundreds of thousands of relevant articles
– Implicit connections between literature go
Shift
focus from Retrieval to Synthesis
unnoticed
Source: MEDLINE factsheet http://www.nlm.nih.gov/pubs/factsheets/medline.html
Source: Calculated from ISI’s 418 highest ranked chemistry articles
20
Information Overload
“One of the diseases of this age is the
multiplicity of books; they doth so
overcharge the world that it is not able
to digest the abundance of idle matter
that is every day hatched and brought
forth into the world”
- Barnaby Rich, 1613
21
Existing Text Mining
• Clustering
• Categorization
• Association Rules
IBM Intelligent Miner
for text (Clustering)
SAS Text Miner
(Association Rules)
22
Example Pattern : Decision Tree
23
 person P, P.degree = masters and P.income > 75,000  P.credit = excellent
Kohonen Maps
• Articles represented as
vectors
• Assign n random
articles
• Assign remaining
articles to closest
cluster
Snowy peaks
indicate highly
funded research
Blake,C and Tengs,T (2001) “The Nation’ Breast Cancer Research
Portfolio: A view from 30,000 feet”, Avon Symposium, UC Irvine.
NCI-funded research 1995-present 24
Knowledge Discovery in
Literature
B-Platelet Activity
Target
Literature
A
Magnesium
B-Calcium Channel Blockers
B-Serotonin
Source
Literature
C
Migraine
...
Swanson, DR (1988) “Migraine and magnesium: eleven neglected connections”, Perspect. Biol. Med., 31: 526-57.
Blake, C. & Pratt, W. (2002). A Semantic Approach to Identify Candidate Treatments from Existing Medical Literature. In AAAI 25
Symposium on Knowledge-based Approaches, Stanford, CA.
Download