Powerpoint - Columbia University

advertisement
Project CLiMB
Computational Linguistics for
Metadata Building
Columbia University
Funded by the Andrew W. Mellon Foundation
2002-2004
Using Computational Linguistic Techniques
to Harvest Image Descriptors
1
Photograph courtesy of the Council of Industrial Design's Design Archive.
2
3
CLiMB:
Interdisciplinary Research at
Columbia University
Funded by the Andrew W. Mellon Foundation
2002-2004
•
•
•
Libraries
Computer Science Department
Center for Research on Information Access
(CRIA)
4
CLiMB Project Members
Judith Klavans, PI
Stephen Davis
Angela Giral
Patricia Renfro
Bob Wolven
Roberta Blitz
Rebecca Passonneau
Veronika Horvath
David Elson
5
Problems in Image Access
Traditional approach:
labor intensive
expensive
6
Project CLiMB
Help image catalogers
provide subject access?
Harvest image descriptors
from existing literature?
7
Can we harvest image descriptors?
8
CLiMB Technical Contribution
CLiMB will identify and extract
• proper nouns
• terms and phrases
from text related to an image:
By September 14, 1908, the basis of the Greenes' final design had
been worked out. It featured a radically informal, V-shaped plan
(that maintained the original angled porch) and interior volumes of
various heights, all under a constantly changing roofline that
echoed the rise and fall of the mountains behind it. The chimneys
and foundation would be constructed of the sandstone boulders
that comprised the local geology, and the exterior of the house
would be sheathed in stained split-redwood shakes.
— Edward R. Bosley. Greene & Greene. London: Phaidon, 2000. p.127.
9
CLiMB Overall Goals
The essence of CLiMB:
• Use scholars themselves as “catalogers” by
employing scholarly publications
• Enhance existing descriptive metadata
The CLiMB project:
• Research: Development of richer retrieval
through increased numbers of descriptors
• Practice: Development of CLiMB ToolKit
10
Squeezing Metadata out of
Scholarly Texts
• Image collection
• Associated text
• Target object identification (TOI)
• CLiMB ToolKit
11
Greene & Greene Architectural
Records and Papers Collection
Drawings and Archives
Avery Architectural and Fine Arts Library
Columbia University Libraries
12
NYDA.1960.001.00023
All Saints Episcopal Church (Pasadena, Calif.). Alterations
1902-1903
13
Greene & Greene Catalog Record
Author:
Title:
Published:
Greene & Greene.
[Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.).
Alterations.]
Residence of Mrs. Dudley P. Allen, 1188 Hillcrest Ave., Pasadena, Cal.
[graphic] : Alteration / Greene & Greene, Architects.
[1917]
Physical Details:
Location:
4 sheets : various media ; 87.8 x 57.3 cm. (34 5/8 x 22 5/8 in.)
Columbia University, Avery Architectural Drawings
Other Authors:
Greene, Charles Sumner, 1868-1957.
Greene, Henry Mather, 1870-1954.
Houses
Alterations
Architecture--Designs and plans--United States.
Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.)
Subjects:
Component Item: [1] Item no. NYDA.1960.001.03224. [AVERYimage]. Electric lighting -floor plan, part plan of basement : Sheet no.
Component Item: [2] Item no. NYDA.1960.001.00073. [AVERYimage]. [Electric lighting]
floor plan, part plan of basement.
14
Greene & Greene Bibliography
(associated texts)
• Bosley, Edward R. Greene & Greene. London : Phaidon, 2000.
• Current, William R. Greene & Greene: architects in the residential
style. Fort Worth [Tex.] : Amon Carter Museum of Western Art, [1974]
• Makinson, Randell L. Greene & Greene: architecture as fine art. Salt
Lake City : Peregrine Smith, c1977.
• Makinson, Randell L. Greene & Greene: the passion and the legacy.
Salt Lake City : Gibbs and Smith, c1998.
• Smith, Bruce. Greene & Greene masterworks. San Francisco :
Chronicle Books, c1998.
• Strand, Janann. A Greene & Greene guide [Pasadena, Calif. : G.
Dahlstrom, 1974]
15
16
Squeezing Metadata out of
Scholarly Texts
• Image collection
• Associated text
• Target object identification (TOI)
• CLiMB ToolKit
17
Target Object Identification (TOI)
• “Authority” list
• Varies from collection to collection
– Greene & Greene – Project Names
– North Carolina Museum – Creator/Title
18
19
North Carolina Museum of Art
Museum Catalog
(Associated Text)
Images
(Catalog Records)
North Carolina Museum of Art: Handbook of the Collections. Ed. Rebecca Martin Nagy.
Raleigh, NC: North Carolina Museum of Art, Hudson Hills Press, 1998.
20
Georgia O'Keeffe (American, 1887-1986)
Cebolla Church, 1945
Oil on canvas, 20 1/16 x 36 1/4 in. (51.1 x 92.0 cm.)
Purchased with funds from the North Carolina Art Society (Robert F.
Phifer Bequest), in honor of Joseph C. Sloane, 72.18.1
North Carolina Museum of Art
<http://ncartmuseum.org/collections/highlights/20thcentury/20th/1910-1950/038_lrg.shtml>
21
MARC format
100
245
260
300
500
500
500
535
650
650
650
O’Keeffe, Georgia, ≠d 1887 -1986.
Cebolla church ≠ h [slide] / ≠ c Georgia O’Keeffe.
≠c2003
1 slide : ≠ b col.
Object date: 1945.
Oil on canvas.
20 x 36 in.
North Carolina Museum of Art ≠ b Raleigh, N.C.
Painting, American ≠ y 20th century.
Women artist ≠ z United States
Church buildings in art.
22
Cebolla Church, 1945
Oil on canvas, 20 1/16 x 36 1/4 in. (51.1 x 92.0 cm.)
Purchased with funds from the North Carolina Art Society (Robert F.
Phifer Bequest), in honor of Joseph C. Sloane, 72.18.1
Driving through the New Mexican highlands near her home, Georgia
O'Keeffe would often pass through the village of Cebolla with its rude
adobe Church of Santo Niño. The artist was moved by the poignancy of
the little building: its sagging, sun-bleached walls and rusted tin roof
seemed so typical of the difficult life of the people.
When O'Keeffe came to paint the church she addressed it directly,
emphasizing its isolation and stark simplicity. Literally formed out of the
earth, the building affirms the permanence and the hard, defiant patience
of the people. For O’Keeffe, it symbolized human endurance and
aspiration. "I have always thought it one of my very good pictures", she
wrote, "though its message is not as pleasant as many others".
And the question remains: What is that in the window?
23
MARC format with CLiMB subject terms
100
245
260
300
500
500
500
535
650
650
650
O’Keeffe, Georgia, ≠d 1887 -1986.
Cebolla church ≠ h [slide] / ≠ c Georgia O’Keeffe.
≠c2003
1 slide : ≠ b col.
Object date: 1945.
Oil on canvas.
20 x 36 in.
North Carolina Museum of Art ≠ b Raleigh, N.C.
Painting, American ≠ y 20th century.
Women artist ≠ z United States
Church buildings in art.
CLiMB
CLiMB
CLiMB
CLiMB
CLiMB
CLiMB
CLiMB
CLiMB
New Mexican highlands
village of Cebolla
adobe Church of Santo Niño
sagging, sun-bleached walls
rusted tin roof
isolation
human endurance
window
24
Squeezing Metadata out of
Scholarly Texts
• Image collection
• Associated text
• Target object identification (TOI)
• CLiMB ToolKit
25
The CLiMB ToolKit
• Software prototype
• For large image collections
• Semi-automated metadata
– Subject access terms
– Human intervention at all steps
• Iterative development cycle
26
27
The CLiMB ToolKit
A Graphical User Interface (GUI)
•
Web Browser
•
Help Menus
•
Projects
28
CLiMB TOOLKIT:
Process Flow
5. Review
4. Select Subject
Access Terms
3. Analyze Text
2. Load TOI List
1. Load Text
29
CLiMB DocViewer
http://www1.cs.columbia.edu/~delson/cni/
30
Thank you!
Any further questions?
www.columbia.edu/cu/cria/climb
31
Download