CLiMB: Computational Linguistics for Metadata Building

advertisement
CLiMB:
Computational Linguistics
for
Metadata Building
Judith L. Klavans
CLiMB: Interdisciplinary
Research Project at Columbia
University
Funded by Mellon Foundation 2002-2004
•
•
•
Center for Research on Information
Access (CRIA)
Libraries
Computer Science Department
CLiMB - Columbia University
2
Guide
• Goals of the project – general terms
– CLiMB as an exploratory research project
 Hypothesis formation and generation
 Precedes hypothesis testing
• What we aim to achieve today
– User needs and formative evaluation
• Next steps
CLiMB - Columbia University
3
Problems in Image Access


Cataloging digital images
Traditional approach:
manual expertise



labor intensive
expensive
Can automated techniques help?
CLiMB - Columbia University
4
Can we harvest image descriptors?
CLiMB - Columbia University
5
CLiMB Technical Contribution
CLiMB will identify and extract
•proper nouns
•terms and phrases
from text related to an image:
September 14, 1908, the basis of the Greenes' final
design had been worked out. It featured a radically
informal, V-shaped plan (that maintained the original
angled porch) and interior volumes of various heights,
all under a constantly changing roofline that echoed
the rise and fall of the mountains behind it. The
chimneys and foundation would be constructed of the
sandstone boulders that comprised the local geology,
and the exterior of the house would be sheathed in
stained split-redwood shakes. —Edward R. Bosley.
Greene & Greene. London : Phaidon, 2000. p. 127
CLiMB - Columbia University
6
Overall Goals
•
•
•
Research: Development of richer retrieval through
increased numbers of descriptors
Practice: Development of suite of CLiMB tools
Resources: Vocabulary list which can be used by
other visual resource professionals
The essence of CLiMB:
• Use scholars themselves as “catalogers” by utilizing
scholarly publications
• Enhance existing descriptive metadata
CLiMB - Columbia University
7
Squeezing Metadata out of
Scholarly Texts
• Image collection
• Associated text
• Target object identification (TOI)
• CLiMB suite of tools
• Evaluation
CLiMB - Columbia University
8
CLiMB - Columbia University
9
CLiMB Collections
• Greene & Greene Architectural Drawings,
Avery Architectural and Fine Arts Library
• Chinese Paper Gods,
C.V. Starr East Asian Library
• Photographs from the Archives,
American Institute of Indian Studies
CLiMB - Columbia University
10
Greene & Greene Architectural Records and
Papers Collection
Drawings and Archives
Avery Architectural and Fine Arts Library
Columbia University Libraries
CLiMB - Columbia University
11
Charles Sumner
Greene
Henry Mather
Greene
(1868-1957)
(1870-1954)
CLiMB - Columbia University
12
NYDA.1960.001.00023
All Saints Episcopal Church (Pasadena, Calif.). Alterations
1902-1903
CLiMB - Columbia University
13
Greene & Greene Catalog Record
Author: Greene & Greene.
Title:
[Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.).
Alterations.]
Residence of Mrs. Dudley P. Allen, 1188 Hillcrest Ave., Pasadena, Cal.
[graphic] : Alteration / Greene & Greene, Architects.
Published: [1917]
Physical Details: 4 sheets : various media ; 87.8 x 57.3 cm. (34 5/8 x 22 5/8 in.)
Location: Columbia University, Avery Architectural Drawings
Other Authors:
Subjects:
Greene, Charles Sumner, 1868-1957.
Greene, Henry Mather, 1870-1954.
Houses
Alterations
Architecture--Designs and plans--United States.
Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena,
Calif.)
Component Item: [1] Item no. NYDA.1960.001.03224. [AVERYimage]. Electric lighting -floor plan, part plan of basement : Sheet no.
Component Item: [2] Item no. NYDA.1960.001.00073. [AVERYimage]. [Electric lighting] -floor plan, part plan of basement.
CLiMB - Columbia University
14
Greene & Greene Bibliography
• Bosley, Edward R. Greene & Greene. London : Phaidon, 2000.
• Current, William R. Greene & Greene: architects in the residential
style. Fort Worth [Tex.] : Amon Carter Museum of Western Art,
[1974]
• Makinson, Randell L. Greene & Greene: architecture as fine art. Salt
Lake City : Peregrine Smith, c1977.
• Makinson, Randell L. Greene & Greene: the passion and the legacy.
Salt Lake City : Gibbs and Smith, c1998.
• Smith, Bruce. Greene & Greene masterworks. San Francisco :
Chronicle Books, c1998.
• Strand, Janann. A Greene & Greene guide [Pasadena, Calif. : G.
Dahlstrom, 1974]
CLiMB - Columbia University
15
CLiMB - Columbia University
16
Chinese Paper Gods
Anne S. Goodrich Collection
C.V. Starr East Asian Library,
Columbia University
CLiMB - Columbia University
17
Pan-hu chih-shen
God of tigers
CLiMB - Columbia University
19
Chinese Paper Gods Catalog Record
Title: Chuang gong chuang mu [graphic].
Published: [193-]
Physical Details: 1 print : wood-engraving, color ; 34 x 30 cm.
In: Anne S. Goodrich Collection.
Location: Columbia University, C.V. Starr East Asian Library (CJK)
EAX GAC 1 no. 16
Subjects: Gods, Chinese, in art.
Folk art--China.
Genre Or Form: Woodcuts--Chinese.
Notes: Date according to time period Anne S. Goodrich collected prints in Beijing.
Record ID: NYCP02-F20
CLiMB - Columbia University
20
Chinese Paper Gods Bibliography
• Day, Clarence Burton. Chinese peasant cults : being
a study of Chinese paper gods. Taipei : Ch'eng Wen
Pub. Co., 1974.
• Goodrich, Anne Swann. Peking paper gods : a look at
home worship. Nettetal : Steyler Verlag, 1991.
• Laing, Ellen Johnston. Art and aesthetics in Chinese
popular prints: selections from the Muban Foundation
collection. Ann Arbor, MI : Center for Chinese
Studies, University of Michigan, c2002
CLiMB - Columbia University
21
Chinese gods: selection from LC
Authority File
HEADING:
Used For/See From:
Nezha (Chinese deity)
Daluoxian (Chinese deity)
Jinhuan Yuanshuai (Chinese deity)
Jinkang Yuanshuai (Chinese deity)
Li Nezha (Chinese deity)
Luoche Taizi (Chinese deity)
Ne Zha (Chinese deity)
Nezhataizi (Chinese deity)
No-cha (Chinese deity)
Nuozha (Chinese deity)
Tailuoxian (Chinese deity)
Taizi Yuanshuai (Chinese deity)
Taiziyeh (Chinese deity)
Yühuang Taizi (Chinese deity)
Zhongtan Yuanshuai (Chinese deity)
Search Also Under: Gods, Chinese
CLiMB - Columbia University
22
CliMB:
2 year timetable
• YEAR 1 (Summer 02 to Summer 03)
– Evaluating existing computational tools
– Developing additional software as needed
– Selecting and building (scanning, converting)
needed candidate texts
– Evaluating initial results
– Selecting initial descriptive metadata for loading
into end-user system
CLiMB - Columbia University
23
CliMB:
2 year timetable
• YEAR 2 (Summer 03 to Summer 04)
– Use feedback to refine metadata
generation & filtering
– Prepare additional collections for testing
– Incorporate data in a useful platforms
– Seek external partners for using CLiMB
toolset
CLiMB - Columbia University
24
Guide
• Goals of the project – general terms
• What we aim to achieve today
– User needs and formative evaluation
• Next steps
CLiMB - Columbia University
25
User Input and Feedback
• Formative Evaluation
– What are we testing?
– Are we testing in the right way?
– What are we doing right? Wrong?
• User Evaluation
– Testing with various types of users
– Different types of information
– And for different purposes
CLiMB - Columbia University
26
Your Background
•
•
•
•
Art Librarian – cataloging
Computer Scientist – metadata
Art Librarian – reference
Image resource specialist
Does your background affect how you respond to
CLiMB output evaluation?
Does your background provide new perspectives
on how to use our results?
CLiMB - Columbia University
27
Questions to Keep in Mind
•
•
•
•
•
•
What are we doing right so far?
What have we overlooked?
Which other groups might we talk to?
What are your opinions on testing?
What terms do we need to define?
How can the evaluation tasks best be
formulated?
CLiMB - Columbia University
28
Cataloger’s Interface
• Collect ideas on what ways CLiMB output
can be embedded into an interface for
catalogers
• What uses would you make of this data in
a more general way (supposing you are
not a cataloger)
CLiMB - Columbia University
29
A Hands-on Meeting
Interactive Format
•
•
•
•
•
•
9:30 – 10:00 This introduction with goals
10:00 – 10:45 Individual work with the data
11:00 – 11:15 Break
11:15 – 11:45 (approx) Small group
discussion
11:45 – 1:00 Large group feedback and
conclusion
1:00 Lunch
CLiMB - Columbia University
30
Thank you!
Any questions?
www.columbia.edu/cu/cria/climb
CLiMB - Columbia University
31
Download