What is Information Retrieval?

advertisement
Information Retrieval and its
Application in Biomedicine
Sept 4 Introduction
Hong Yu1,2, PhD
Susan McRoy1, PhD
1Department of Computer Science
2Department of Health Sciences
University of Wisconsin-Milwaukee
What is Information Retrieval?

The field concerned with the acquisition,
organization, and searching of knowledge-based
information. (Hersh, 2003)
Speed Up Communication
Information






World Wide Web
Company Documentations
Drug Descriptions
Medical Records
Books
Everything that is text, image, video,
and sound, and that can be
transformed digitally
Information in Biomedicine




Literature (over 17 million publications)
WWW
Electronic medical records
Genomics data
– DNA sequences, etc.

Knowledge representation
– Gene Ontology

Company databases
– Micromedex drug database
IR in Biomedicine







Index Medicus (Billings 1879)
MEDLARS (NLM 1966)
SAPHIRE (Hersh 1990)
PubMed (NLM 1996)
Arrowsmith (Smalheiser 1998)
BioText (Hearst 2003)
BioMedQA (Yu 2006)
Electronic and Open Publishing



Internet and Web have a profound impact on the
publishing of knowledge-based information
Most of literature can be electronically available
Open-access
– The Bethesda Statement on Open Access Publishing
(http://www.earlham.edu/~peters/fos/bethesda.htm)
(April 11, 2003)
– The Berlin Declaration on Open Access to Knowledge in
the Sciences and Humanities
(http://www.zim.mpg.de/openaccessberlin/berlindeclaration.html). (2003)
– PubMedCentra (NLM 2004)
Quality of Information

A lack of quality control
– Anyone can publish online
– A wealthy of studies concluded that Web
has a poor quality for healthcare
information

Readability
– Hard to read
Information Needs and Seeking

Unrecognized needs
– Clinicians unaware of information needs or knowledge
deficit

Recognized needs
– Clinicians aware of needs but may or may not pursue
them

Pursued needs
– Information seeking occurs but may or may not be
successful

Satisfied needs
– Information seeking successful
Evidence-Based Medicine
What You Will Learn

IR algorithms
– Indexing
– Query and Retrieval
– Evaluation
– Text Classification
– XML retrieval
– Web retrieval
What You Will Learn (Cont.)

Open-Source IR tools
– What open-source IR tools are available
Indexing/retrieval
 Part-of-speech and syntactic parsing
 Semantic parsing
 Discourse relations
 Machine-learning classifiers


How to use the tools?
What You Will Learn (Cont.)

State of the art IR systems
– Baruch 1965 [BLIMP http://blimp.cs.queensu.ca/index.html]
– SAPHIRE (Hersh 1990)

Retrieval
– MedLEE (Friedman 1994)

Extraction
– PubMed (NLM 1997)
– ARROSMITH Systems (Smalheiser 1998)

Hidden Relation Discovery Tool
– GENIES (Friedman 2001)

Extraction
BioNLP Systems

BioText (Hearst 2003
http://biotext.berkeley.edu/
)
– Retrieval+Categorization

GeneWays (Rzhetsky 2004
http://geneways.genomecenter.columbia.edu/
)
– Extraction+Visualization

TextPresso (Muller 2004
http://www.textpresso.org/
)
– Retrieval+Extraction

iHOP (Hoffman and Valencia 2005
net.org/UniPub/iHOP/)
http://www.ihop-
– Retrieval

BioMedQA (Yu 2006 http://monkey.ims.uwm.edu/MedQA)
– Question Answering
Advanced NLP applications
Beyond text: Image and Video




Image classification
– Finding concepts in captions and annotations
– Machine learning on textual & visual features
– Determining salient features in text and image
separately and merging the results
Extracting text from image
– Understanding and correcting OCR (handwriting,
equations)
– Finding text in images
Finding document text related to illustrations
Video retrieval
Beyond Extraction: Experimental Tools
Resources




Annotated collections (GENIA, Medstract,
Yapex …)
Ontologies, tools, knowledge bases …
Publications, Conferences, Evaluations …
Centres and web portals
What We Provide

Textbook
– Christopher D. Manning, Prabhakar Raghavan
and Hinrich Schutze. Introduction to Information
Retrieval. Cambridge University Press, 2007


http://www-csli.stanford.edu/~schuetze/informationretrieval-book.html
Office hour:
– Tuesdays, 3-4 pm EMS 710 and by appointment
– Hong Yu, 414-229-3344
– Susan McRoy, 414-229-6695
What We Expect

Undergraduate:
– 30% Homework, 35% Midterm exam, 35%
Final exam or project

Graduate:
– 20% Midterm exam, 40% Homework, 40%
Project: The project may be done individually or
in a team of 2-3 people. The final project will
include a software system, a 2-3 page written
project report, and an oral presentation. The
report should describe the problem, the
approach, and evaluation and should cite related
work where appropriate.
Download