16 Jun 2004

advertisement
LINC Catalog Research
Dr. Kan Min-Yen
Dr. Danny C.C. Poo
Outline
Introduction
NUS Query Logs
Results so far
Current Research
Current Grants
Research Needs
16 June 2004
LINC Catalog Research
2
Ranganathan’s Five Laws
Books are for use.
For every reader, his or her book.
For every book, its reader.
Save the time of the reader.
Library is a growing organism.
16 June 2004
LINC Catalog Research
3
Ranganathan’s Five Laws
Books are for use.
For every reader, his or her book.
For every book, its reader.
Save the time of the reader.
Library is a growing organism.
Address these issues through optimizing
catalog access
16 June 2004
LINC Catalog Research
4
Who are we?
 Dr Kan Min-Yen
 Roopak Selvanathan,
Programmer
 Kalpana Kumar, UROP
student
 Ng Meichan, HYP student
 Tan Siru, HYP student
 Qiu Long, PhD student
 Dr Danny Poo
 Jeffry Komarjaya, HYP
student
16 June 2004
LINC Catalog Research
5
Query Logs
Thanks to you
Use to learn about query styles
About 300 day’s worth of simple queries
Queries:
 Average length: 2.8 words
 Average query repeated: 2.1 times
16 June 2004
LINC Catalog Research
6
Innopac Properties
No spelling correction
Weak query expansion
No capability to track sessions
Case insensitive
Stopwords also searched for (e.g. ‘the’)
Advanced queries rarely used
Sorting could be improved
16 June 2004
LINC Catalog Research
7
Innopac Properties
No spelling correction
Weak query expansion
No capability to track sessions
Case insensitive
Stopwords also searched for (e.g. ‘the’)
Advanced queries rarely used
Sorting could be improved
16 June 2004
LINC Catalog Research
8
Past Milestones
June 2003 – June 2004
Framework for LINC Research
Need:
 Automated way to send queries to LINC
 Tracking of sessions and transactions by user
 Distinguish queries sent by research and real
users
Solution:
 Build a mirror system at SoC that will send
queries to LINC but track queries
16 June 2004
LINC Catalog Research
10
Mirror (http://linc.comp.nus.edu.sg)
Allows:
 Automated sending of simple queries
 Tracks sessions of users by IP address and
time
 Distinguishes in LINC logs which queries sent
by research from those sent by real users
Command line and Web invocation
Programmer: Roopak Selvanathan
16 June 2004
LINC Catalog Research
11
LCSH-based query expansion
 Find relevant books with same subject headings as initial
retrieval set
 ~30% improvement over original search
 Student: Jeffry Komarjaya
 To be presented at ECDL 2004
16 June 2004
LINC Catalog Research
12
Author spelling correction
Spelling correction
 Uses a dictionary and an author name list
retrieved from LINC.
 Corrects words with one non-initial letter
mistake.
Weakness: corrections not ranked
Student: Qiu Long / Kalpana Kumar
16 June 2004
LINC Catalog Research
13
Questions so far?
Current Research Projects
June 2004 – January 2005
Morphological Query Expansion
Suggest alternative form of query using
different morphology
 bacterial foraging  foraging bacteria
 international tax avoidance  avoiding international taxes
Look for classes of word where
morphological expansion is productive
Student: Tan Siru
16 June 2004
LINC Catalog Research
16
Phrase structure expansion
Improve precision using phrasal
knowledge
 air pollution  pollution of air
 precast concrete structures  (precast concrete) structures
Use mutual information to determine
significant collocations
Will work together with morphing unit
Student: Ng Meichan
16 June 2004
LINC Catalog Research
17
Subject spelling correction
Build upon current system to do subject
based spelling correction
 Add ranking of corrections using likelihood of
mistake
 Suggest repair of catalog entries with
misspellings
Student: Kalpana Kumar
16 June 2004
LINC Catalog Research
18
Current Grants and
Future targets
Corpus-Based Query Expansion
Internal SoC project, emphasis on using
the query logs
2 year project, completing first year now
Milestones left to pursue:
 Integration of various component systems
 Simple to advanced query inference
 Integration with LINC if feasible
16 June 2004
LINC Catalog Research
20
ICITI Research
Interdepartmental research grant for
equipment
Computer equipment to allow SoC to
collaborate more fully and in sync with
Libraries
Funds for separate, dedicated
development and deployment machines
and storage
16 June 2004
LINC Catalog Research
21
Feedback from you
Exchanging our needs with yours
Needs
Continued query logs
 Advanced, author, etc. query logs
Catalog data
 Book records and DB
What would Libraries like to see?
16 June 2004
LINC Catalog Research
23
Any questions?
References:
Mirror system:
 http://linc.comp.nus.edu.sg
Group documentation:
 http://wwwappn.comp.nus.edu.sg/~rpnlpir/twiki/bin/view.cg
i/Query/WebHome
16 June 2004
LINC Catalog Research
24
Download