PowerPoint

advertisement
Implementing FRBR on Large
Databases
Thomas Hickey
Diane Vizine-Goetz
OCLC Research
What is FRBR
• IFLA study group report: Functional
Requirements for Bibliographic Records
• Bibliographic model independent of
cataloging rules
• Clusters bibliographic items into a four-level
structure
•
•
•
•
Work
Expression
Manifestation
Item
CNI 2002 Fall Task Force
2
Control of Entities in FRBR
Work
Expression
Manifestation
Person
Concept
Object
Corporate Body
Item
Event
Place
Entities
Surrogates
Uniform titles
Citations
CNI 2002 Fall Task Force
Names
Subjects
3
Why FRBR?
• Potential to improve:
– Cataloging
– Discovery
– Delivery
• By
– Bringing versions of works together
– Showing relationships of various kinds
– Enabling users to navigate to level of interest
CNI 2002 Fall Task Force
4
Research on FRBR & WorldCat
• Subsets
– By library, region
– Example/problem sets
• Shakespeare, the Bible
• Humphry Clinker
• 1,000 random works
– By genre
• Dissertations
• Fiction
• Whole file, 47 million bibliographic records
CNI 2002 Fall Task Force
5
Our Approach
• Concentrating on work-level
– Problems with expression-level clusters
• Efficient, maintainable, understandable
• Few, if any, false matches with correct
cataloging
– Err on the side of missed matches
– Some accommodation of frequent variants
• Compare with manually clustered
CNI 2002 Fall Task Force
6
The Algorithm
• A key is generated for each record
• Extract author, title
– Look up in NACO authority file
– Added entry information as needed
• Form a key from bibliographic record
– Author, title, added entry information
– These can be sorted, compared
CNI 2002 Fall Task Force
7
Problems
• Many (17%) records do not have
– Author main-entry
– Uniform title
• In general these can not be matched
– Look at added entries
– Information at the expression and manifestation
levels
– Handled separately
– 180,000 clusters involving ~400,000 records
CNI 2002 Fall Task Force
10
Top 10 WorldCat Clusters
# Recs
Author/Title Key
8,383
8,055
6,174
4,033
3,964
3,477
2,402
2,248
2,153
bible\n t
bible
bible\authorized
bible\o t\psalms
haggadah
great britain/treaties etc
bible\o t
koran
arabian nights
CNI 2002 Fall Task Force
11
Top 10 from a Public Library
# Recs Author/Title Key
89
85
84
81
63
61
60
58
57
56
bible\authorized
mother goose
chopin, frederic\1810 1849/piano music
schulz, charles m/peanuts
davis, jim/garfield
moore, clement clarke\1779 1863/night before christmas
mozart, wolfgang amadeus\1756 1791/instrumental music
bach, johann sebastian\1685 1750/cantatas
beethoven, ludwig van\1770 1827/sonatas
twain, mark\1835 1910/adventures of huckleberry finn
CNI 2002 Fall Task Force
12
Results
• Manual estimate: 1.5
manifestations/work in WorldCat
• Algorithm: ~1.3
• 25,844 clusters have 20 or more
records
• 401,659 clusters have 5 or more
records
CNI 2002 Fall Task Force
13
Preliminary Plans
• Build structures for FRBR into new
catalog
• Expose FRBR clustering for searching
• Make visible in cataloging
– As consensus on implementation is
developed
– As cataloging rules accommodate FRBR
CNI 2002 Fall Task Force
14
Spin-offs
• NACO normalization code
– Testbed
– Server
• Authority work
– ePrints UK
• FRBR in other projects
– FictionFinder
– NDLTD union catalog
CNI 2002 Fall Task Force
15
Fiction Subset
•
•
•
•
•
2,665,662 WorldCat records
1,758,479 work clusters
1.5 records/cluster
3,866 clusters have 20 or more records
50,540 clusters have 5 or more records
CNI 2002 Fall Task Force
16
Top 10 clusters for fiction
# Recs
1,296
1,248
971
828
689
624
618
600
581
570
Author/Title Key
defoe, daniel\1661 1731/robinson crusoe
carroll, lewis\1832 1898/alices adventures in wonderland
cervantes saavedra, miguel de\1547 1616/don quixote
stevenson, robert louis\1850 1894/treasure island
twain, mark\1835 1910/adventures of huckleberry finn
twain, mark\1835 1910/adventures of tom sawyer
swift, jonathan\1667 1745/gullivers travels
andersen, h c\hans christian\1805 1875/tales
stowe, harriet beecher\1811 1896/uncle toms cabin
arabian nights
CNI 2002 Fall Task Force
17
FictionFinder
• Employs work clusters in a prototype system
for searching and browsing bibliographic
records for fiction
• Indexes records at the work level and
organizes displays by work and expression
(primarily language)
• Includes records for textual items; additional
modes of expression (moving image, sound)
to be added later
CNI 2002 Fall Task Force
18
395 records for author “crichton,
michael\1942” clustered into 17 entries
23
40
5
11
44
26
5
16
7
27
47
25
37
31
7
19
25
395
airframe
andromeda strain
binary
case of need
congo
disclosure
disclosure a novel
eaters of the dead
eaters of the dead the manuscript of ibn fadlan relating his experiences with the
northmen in a d 922
great train robbery
jurassic park
lost world
rising sun
sphere
sphere a novel
terminal man
timeline
Typical Results Set Display
Typical Work-level Display
Typical Results Set Display
Typical Work-level Display
Benefits
• Aggregated displays for works and
expressions
• Enhancement of (fiction) records at
work level
– with elements from records within the
work cluster (e.g., summaries, genre
terms, subject headings, class numbers)
– with external data (e.g., literary prizes,
prequels/sequels, evaluative content)
CNI 2002 Fall Task Force
24
Challenges
• Identifying appropriate bibliographic data for
systematically grouping or differentiating
works and expressions
– Works
• Genre (graphic novel v.s novel)
• Genre + mode of expressions (audio book v.s radio play)
• Degree of modification (abridgement of juvenile work v.s
an adaptation for young children)
– Expressions
• translators, illustrators, editors
CNI 2002 Fall Task Force
25
Next Steps
• FRBR algorithm
– Explore applications
– Refine algorithm as needed
• FictionFinder
– Add records for sound and image
– Conduct user studies
CNI 2002 Fall Task Force
26
Links
• Functional Requirements for Bibliographic Records Final Report
– http://www.ifla.org/VII/s13/frbr/frbr.htm
• Experiments with the IFLA Functional Requirements
for Bibliographic Records (FRBR)
– http://www.dlib.org/dlib/september02/hickey/09hickey.html
• OCLC Research Activities and IFLA's Functional
Requirements for Bibliographic Records
– http://www.oclc.org/research/projects/frbr/index.shtm
• Implementing FRBR on Large Databases
– http://staff.oclc.org/~vizine/CNI/OCLCFRBR.htm
CNI 2002 Fall Task Force
27
Download