CLiMB: Computational Linguistics for Metadata Building Judith L. Klavans CLiMB: Interdisciplinary Research Project at Columbia University Funded by Mellon Foundation 2002-2004 • • • Center for Research on Information Access (CRIA) Libraries Computer Science Department CLiMB - Columbia University 2 Guide • Goals of the project – general terms – CLiMB as an exploratory research project Hypothesis formation and generation Precedes hypothesis testing • What we aim to achieve today – User needs and formative evaluation • Next steps CLiMB - Columbia University 3 Problems in Image Access Cataloging digital images Traditional approach: manual expertise labor intensive expensive Can automated techniques help? CLiMB - Columbia University 4 Can we harvest image descriptors? CLiMB - Columbia University 5 CLiMB Technical Contribution CLiMB will identify and extract •proper nouns •terms and phrases from text related to an image: September 14, 1908, the basis of the Greenes' final design had been worked out. It featured a radically informal, V-shaped plan (that maintained the original angled porch) and interior volumes of various heights, all under a constantly changing roofline that echoed the rise and fall of the mountains behind it. The chimneys and foundation would be constructed of the sandstone boulders that comprised the local geology, and the exterior of the house would be sheathed in stained split-redwood shakes. —Edward R. Bosley. Greene & Greene. London : Phaidon, 2000. p. 127 CLiMB - Columbia University 6 Overall Goals • • • Research: Development of richer retrieval through increased numbers of descriptors Practice: Development of suite of CLiMB tools Resources: Vocabulary list which can be used by other visual resource professionals The essence of CLiMB: • Use scholars themselves as “catalogers” by utilizing scholarly publications • Enhance existing descriptive metadata CLiMB - Columbia University 7 Squeezing Metadata out of Scholarly Texts • Image collection • Associated text • Target object identification (TOI) • CLiMB suite of tools • Evaluation CLiMB - Columbia University 8 CLiMB - Columbia University 9 CLiMB Collections • Greene & Greene Architectural Drawings, Avery Architectural and Fine Arts Library • Chinese Paper Gods, C.V. Starr East Asian Library • Photographs from the Archives, American Institute of Indian Studies CLiMB - Columbia University 10 Greene & Greene Architectural Records and Papers Collection Drawings and Archives Avery Architectural and Fine Arts Library Columbia University Libraries CLiMB - Columbia University 11 Charles Sumner Greene Henry Mather Greene (1868-1957) (1870-1954) CLiMB - Columbia University 12 NYDA.1960.001.00023 All Saints Episcopal Church (Pasadena, Calif.). Alterations 1902-1903 CLiMB - Columbia University 13 Greene & Greene Catalog Record Author: Greene & Greene. Title: [Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.). Alterations.] Residence of Mrs. Dudley P. Allen, 1188 Hillcrest Ave., Pasadena, Cal. [graphic] : Alteration / Greene & Greene, Architects. Published: [1917] Physical Details: 4 sheets : various media ; 87.8 x 57.3 cm. (34 5/8 x 22 5/8 in.) Location: Columbia University, Avery Architectural Drawings Other Authors: Subjects: Greene, Charles Sumner, 1868-1957. Greene, Henry Mather, 1870-1954. Houses Alterations Architecture--Designs and plans--United States. Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.) Component Item: [1] Item no. NYDA.1960.001.03224. [AVERYimage]. Electric lighting -floor plan, part plan of basement : Sheet no. Component Item: [2] Item no. NYDA.1960.001.00073. [AVERYimage]. [Electric lighting] -floor plan, part plan of basement. CLiMB - Columbia University 14 Greene & Greene Bibliography • Bosley, Edward R. Greene & Greene. London : Phaidon, 2000. • Current, William R. Greene & Greene: architects in the residential style. Fort Worth [Tex.] : Amon Carter Museum of Western Art, [1974] • Makinson, Randell L. Greene & Greene: architecture as fine art. Salt Lake City : Peregrine Smith, c1977. • Makinson, Randell L. Greene & Greene: the passion and the legacy. Salt Lake City : Gibbs and Smith, c1998. • Smith, Bruce. Greene & Greene masterworks. San Francisco : Chronicle Books, c1998. • Strand, Janann. A Greene & Greene guide [Pasadena, Calif. : G. Dahlstrom, 1974] CLiMB - Columbia University 15 CLiMB - Columbia University 16 Chinese Paper Gods Anne S. Goodrich Collection C.V. Starr East Asian Library, Columbia University CLiMB - Columbia University 17 Pan-hu chih-shen God of tigers CLiMB - Columbia University 19 Chinese Paper Gods Catalog Record Title: Chuang gong chuang mu [graphic]. Published: [193-] Physical Details: 1 print : wood-engraving, color ; 34 x 30 cm. In: Anne S. Goodrich Collection. Location: Columbia University, C.V. Starr East Asian Library (CJK) EAX GAC 1 no. 16 Subjects: Gods, Chinese, in art. Folk art--China. Genre Or Form: Woodcuts--Chinese. Notes: Date according to time period Anne S. Goodrich collected prints in Beijing. Record ID: NYCP02-F20 CLiMB - Columbia University 20 Chinese Paper Gods Bibliography • Day, Clarence Burton. Chinese peasant cults : being a study of Chinese paper gods. Taipei : Ch'eng Wen Pub. Co., 1974. • Goodrich, Anne Swann. Peking paper gods : a look at home worship. Nettetal : Steyler Verlag, 1991. • Laing, Ellen Johnston. Art and aesthetics in Chinese popular prints: selections from the Muban Foundation collection. Ann Arbor, MI : Center for Chinese Studies, University of Michigan, c2002 CLiMB - Columbia University 21 Chinese gods: selection from LC Authority File HEADING: Used For/See From: Nezha (Chinese deity) Daluoxian (Chinese deity) Jinhuan Yuanshuai (Chinese deity) Jinkang Yuanshuai (Chinese deity) Li Nezha (Chinese deity) Luoche Taizi (Chinese deity) Ne Zha (Chinese deity) Nezhataizi (Chinese deity) No-cha (Chinese deity) Nuozha (Chinese deity) Tailuoxian (Chinese deity) Taizi Yuanshuai (Chinese deity) Taiziyeh (Chinese deity) Yühuang Taizi (Chinese deity) Zhongtan Yuanshuai (Chinese deity) Search Also Under: Gods, Chinese CLiMB - Columbia University 22 CliMB: 2 year timetable • YEAR 1 (Summer 02 to Summer 03) – Evaluating existing computational tools – Developing additional software as needed – Selecting and building (scanning, converting) needed candidate texts – Evaluating initial results – Selecting initial descriptive metadata for loading into end-user system CLiMB - Columbia University 23 CliMB: 2 year timetable • YEAR 2 (Summer 03 to Summer 04) – Use feedback to refine metadata generation & filtering – Prepare additional collections for testing – Incorporate data in a useful platforms – Seek external partners for using CLiMB toolset CLiMB - Columbia University 24 Guide • Goals of the project – general terms • What we aim to achieve today – User needs and formative evaluation • Next steps CLiMB - Columbia University 25 User Input and Feedback • Formative Evaluation – What are we testing? – Are we testing in the right way? – What are we doing right? Wrong? • User Evaluation – Testing with various types of users – Different types of information – And for different purposes CLiMB - Columbia University 26 Your Background • • • • Art Librarian – cataloging Computer Scientist – metadata Art Librarian – reference Image resource specialist Does your background affect how you respond to CLiMB output evaluation? Does your background provide new perspectives on how to use our results? CLiMB - Columbia University 27 Questions to Keep in Mind • • • • • • What are we doing right so far? What have we overlooked? Which other groups might we talk to? What are your opinions on testing? What terms do we need to define? How can the evaluation tasks best be formulated? CLiMB - Columbia University 28 Cataloger’s Interface • Collect ideas on what ways CLiMB output can be embedded into an interface for catalogers • What uses would you make of this data in a more general way (supposing you are not a cataloger) CLiMB - Columbia University 29 A Hands-on Meeting Interactive Format • • • • • • 9:30 – 10:00 This introduction with goals 10:00 – 10:45 Individual work with the data 11:00 – 11:15 Break 11:15 – 11:45 (approx) Small group discussion 11:45 – 1:00 Large group feedback and conclusion 1:00 Lunch CLiMB - Columbia University 30 Thank you! Any questions? www.columbia.edu/cu/cria/climb CLiMB - Columbia University 31