LINC Catalog Research Dr. Kan Min-Yen Dr. Danny C.C. Poo Outline Introduction NUS Query Logs Results so far Current Research Current Grants Research Needs 16 June 2004 LINC Catalog Research 2 Ranganathan’s Five Laws Books are for use. For every reader, his or her book. For every book, its reader. Save the time of the reader. Library is a growing organism. 16 June 2004 LINC Catalog Research 3 Ranganathan’s Five Laws Books are for use. For every reader, his or her book. For every book, its reader. Save the time of the reader. Library is a growing organism. Address these issues through optimizing catalog access 16 June 2004 LINC Catalog Research 4 Who are we? Dr Kan Min-Yen Roopak Selvanathan, Programmer Kalpana Kumar, UROP student Ng Meichan, HYP student Tan Siru, HYP student Qiu Long, PhD student Dr Danny Poo Jeffry Komarjaya, HYP student 16 June 2004 LINC Catalog Research 5 Query Logs Thanks to you Use to learn about query styles About 300 day’s worth of simple queries Queries: Average length: 2.8 words Average query repeated: 2.1 times 16 June 2004 LINC Catalog Research 6 Innopac Properties No spelling correction Weak query expansion No capability to track sessions Case insensitive Stopwords also searched for (e.g. ‘the’) Advanced queries rarely used Sorting could be improved 16 June 2004 LINC Catalog Research 7 Innopac Properties No spelling correction Weak query expansion No capability to track sessions Case insensitive Stopwords also searched for (e.g. ‘the’) Advanced queries rarely used Sorting could be improved 16 June 2004 LINC Catalog Research 8 Past Milestones June 2003 – June 2004 Framework for LINC Research Need: Automated way to send queries to LINC Tracking of sessions and transactions by user Distinguish queries sent by research and real users Solution: Build a mirror system at SoC that will send queries to LINC but track queries 16 June 2004 LINC Catalog Research 10 Mirror (http://linc.comp.nus.edu.sg) Allows: Automated sending of simple queries Tracks sessions of users by IP address and time Distinguishes in LINC logs which queries sent by research from those sent by real users Command line and Web invocation Programmer: Roopak Selvanathan 16 June 2004 LINC Catalog Research 11 LCSH-based query expansion Find relevant books with same subject headings as initial retrieval set ~30% improvement over original search Student: Jeffry Komarjaya To be presented at ECDL 2004 16 June 2004 LINC Catalog Research 12 Author spelling correction Spelling correction Uses a dictionary and an author name list retrieved from LINC. Corrects words with one non-initial letter mistake. Weakness: corrections not ranked Student: Qiu Long / Kalpana Kumar 16 June 2004 LINC Catalog Research 13 Questions so far? Current Research Projects June 2004 – January 2005 Morphological Query Expansion Suggest alternative form of query using different morphology bacterial foraging foraging bacteria international tax avoidance avoiding international taxes Look for classes of word where morphological expansion is productive Student: Tan Siru 16 June 2004 LINC Catalog Research 16 Phrase structure expansion Improve precision using phrasal knowledge air pollution pollution of air precast concrete structures (precast concrete) structures Use mutual information to determine significant collocations Will work together with morphing unit Student: Ng Meichan 16 June 2004 LINC Catalog Research 17 Subject spelling correction Build upon current system to do subject based spelling correction Add ranking of corrections using likelihood of mistake Suggest repair of catalog entries with misspellings Student: Kalpana Kumar 16 June 2004 LINC Catalog Research 18 Current Grants and Future targets Corpus-Based Query Expansion Internal SoC project, emphasis on using the query logs 2 year project, completing first year now Milestones left to pursue: Integration of various component systems Simple to advanced query inference Integration with LINC if feasible 16 June 2004 LINC Catalog Research 20 ICITI Research Interdepartmental research grant for equipment Computer equipment to allow SoC to collaborate more fully and in sync with Libraries Funds for separate, dedicated development and deployment machines and storage 16 June 2004 LINC Catalog Research 21 Feedback from you Exchanging our needs with yours Needs Continued query logs Advanced, author, etc. query logs Catalog data Book records and DB What would Libraries like to see? 16 June 2004 LINC Catalog Research 23 Any questions? References: Mirror system: http://linc.comp.nus.edu.sg Group documentation: http://wwwappn.comp.nus.edu.sg/~rpnlpir/twiki/bin/view.cg i/Query/WebHome 16 June 2004 LINC Catalog Research 24