05_vocabulary_worksheet

advertisement
Class 10 - Subject analysis and Classification
Exercise Overview
In our prior classes this semester we have focused on the process of creating and manipulating
metadata rich documents and representations of those documents. As part of this work we have run
into subject headings and call numbers and explored types of authority control. For the next few
weeks we will explore categorization and classification and will become familiar with the tools that
enable us to apply existing classification systems and create new knowledge organization systems.
Today we will explore two different classification systems and try our hand at using them to classify
some resources.
Instructions:
Online students: Work individually to complete the worksheet. When asked to ‘discuss as a
group’, consider your response and continue completing the worksheet.
In-Class students: Working in groups of 3-4, complete the worksheet. Appoint one person to read
the text and questions on the worksheet, one person to record the group answers on the worksheet
and one person who is responsible for reporting back to the entire class. All members of the group
should participate in team exploration and discussion.
For everyone:
1. Because this worksheet involves technical exercises, each person should complete the
technical portions. As your group works through the technical elements of the worksheet keep
talking and helping each other.
2. Wait for your group members to catch up or help them over rough spots so that you can
discuss the key questions together.
Metadata Standards and Web Services
Erik Mitchell
Page 1
Suggested readings
1. Mitchell, E. (2015). Chapter 5 in Metadata Standards and Web Services in Libraries, Archives, and Museums.
Libraries Unlimited. Santa Barbara, CA.
Overview
This week we are exploring the process of subject analysis and classification, particularly in relation
to subject analysis for bibliographic representation. Chowdhury (2009) defines indexing as the
“Assignment of identifiers to text items” and Subject Indexing as “Conceptual analysis of the subject
of documents.” For this worksheet we will read/skim chapter 2 in Lanacaster’s work Indexing and
Abstracting in Theory and Practice. In this exercise we will explore these two concepts as they are
applied in manual contexts. Later in this course we will explore automatic classification and systems
that are built to enable retrieval. Lets begin exploring these activities by understanding some of the
key concepts in subject analysis.
Step 1:
Complete the following table of concepts by identifying a definition for each concept
Table 1 Classification vocabulary
Aboutness
Exhaustivity
Specificity
Conceptual analysis
Translation
Controlled Vocabulary
Metadata Standards and Web Services
Erik Mitchell
Page 2
These concepts work together to form the foundation of how we talk about the scope and content of
our indexing process. For example, exhaustivity and specificity are two concepts that work together
in balance to help us understand how to manage recall. Semantic and syntactic analysis are two
different types of meaning encoded in documents, one from content meaning (semantics) and the
other from content structure (syntactics). Subject analysis and classification focus on the application
of these concepts during the analysis of a resource. Lancaster suggests two principle steps (e.g.
Conceptual analysis and translation) that form the foundation of subject analysis and classification. In
doing this Lancaster mentions a number of questions that need to be asked when performing topical
analysis on a document.
Review Lancaster’s chapter and answer the key questions
Key Questions
Question 1. What questions does Lancaster recommend asking about a resource during the
indexing process?
Question 2. What role does Lancaster indicate that the “community” plays in helping create good
indexes?
Question 3. What are the three types of controlled vocabularies that Lancaster mentions?
Question 4. How are subject heading and thesauri lists related? How are they different?
Take a moment to review Lancaster’s Figure 5 on page 22. Notice how each controlled vocabulary
handles terminology slightly differently.
Let’s turn to Kwasnik’s article The Role of Classification in Knowledge Representation and Discovery.
In her article, Kwasnik mentions four types of classification structures. For each structure fill out the
table below.
Metadata Standards and Web Services
Erik Mitchell
Page 3
Table 2: Map of classification types
Classification type
Common uses
Limitations
Examples
Hierarchies
Trees
Paradigms
Faceted Classifications
Folksonomies (not in
article)
Kwasnik’s article dates from before the emergence of folksonomies. If you are not familiar with
folkonomies take a moment to look the term up and fill out the entry in the table.
With these types of classification in mind spend a few moments exploring the Library of Congress
classification system at http://www.loc.gov/catdir/cpso/lcco/ and answer the following questions.
Key Questions
Question 5. Where would Kwasnik place the LC Classification?
There are a number of classification systems including The Library of Congress system, the Colon
Classification System, the Universal Decimal System, Bliss Bibliographic Classification system and
the Dewey Decimal system. Each of these systems focus on identifying the “aboutness” of a
document and coding of that aboutness into a classification number. Lets begin by understanding the
difference between three types of systems. Chowdhury (2009) takes an alternative approach to
describing classification systems, focusing on three types, enumerative, faceted and analyticosynthetic.
Metadata Standards and Web Services
Erik Mitchell
Page 4
Enumerative: Subjects are pre-defined and listed in a hierarchical notation. Application of the
classification system involves finding the appropriate class in the classification system and
applying the class without modification.
Analytico-synthetic: Analytico-Synthetic systems are hierarchical but rather than relying on a
completely pre-defined hierarchy it allows the cataloger to add refining concepts to
classification such as geographic, temporal and topical refinements. In addition, an
Analytico-Synthetic system allows the classifier to build a classification number using the
combination of hierarchical and refining concepts.
Faceted: Faceted systems are non-hierarchical and involve the combination of multiple
categorization areas (or facets) to create a classification. One of the most popular faceted
classification systems is Ranganathan’s colon classification. Ranganathan’s system
featured five facets: Personality, Matter, Energy, Space and Time (PMEST).
Table 3 Features of Classification systems
System feature
Classification type
Subjects and classes are listed in a pre-defined notation
Enumerative
System is - “Strictly hierarchical”, “pre-defined”
Rules for classification have no pre-defined classes but define an
approach to classification
Classification process focuses on identifying unrelated aspects of
a document (personality, matter, energy, space, time)
Mixes pre-defined hierarchy and refining facet features
Uses classification schedules to build a classification number
These three types of classification systems (Enumerative, Faceted, and Analytico-synthetic) are the
most common traditional systems. In addition to this there are social classification systems known as
Metadata Standards and Web Services
Erik Mitchell
Page 5
folksonomies that rely on the aggregation of tags assigned by users of information resources.
Folksonomies are often represented in Tag Clouds, a visual representation of tags with emphasis
based on tag occurrence.
Generally speaking, the Library of Congress Classification System and Dewey Decimal Systems are
considered Analytico-Synthetic because they blend a hierarchical subject analysis (Enumerative) with
refining classification schedules (quasi-faceted). For example, the LCC system allows you to assign
geographic and time facet refinements to a subject classification and the Dewey system features 10
main divisions that are hierarchically arranged to create a classification.
Step 2:
Lets try our hand in applying at subject analysis and classification system to a resource. As
our resource we will use Think Stats by Allen Downey. Think Stats is an online book make
available under a Creative Commons Attribution-Noncommercial 3.0 Unsupported License.
The book is available at http://greenteapress.com/thinkstats/html/index.html. Use it in each
of the following classification exercises below.
Step 3:
For each system apply the following process for subject analysis
a. Analyze the resource for content
b. Identify keywords and key concepts
c. Group concepts and consider what the primary ‘aboutness’ is.
d. Explore your classification system and see how your keywords match
e. Identify a primary topic area and order sub-concepts hierarchically
f. Consult the class schedule and produce a chain of subject links
g. Translate the subject headings to the appropriate notation scheme
Step 4:
Classify the resource using Association of Computing Machinery system (Enumerative)
a. Browse the ACM classification system at http://www.acm.org/about/class/1998.
Question 6. What is your top level ACM heading?
Question 7. What is the full ACM classification?
Metadata Standards and Web Services
Erik Mitchell
Page 6
*note – The ACM system does not focus on developing a classification number that is unique to
each item.
Step 5:
Identify subject headings using the resource using the Library of Congress Classification
(Analytico-synthetic)
a. Lets begin by identifying the authorized headings for these subjects:
i. Go to http://authorities.loc.gov
ii. click on “search authorities”, Make sure you have “Subject Authority Headings”
selected and conduct searches to find headings.
iii. As you browse results, pay attention to Type of heading (we want LC Subject
Headings) and See Also and Scope Notes.
iv. Pick three to four entries that are “Authorized headings.” Pick the heading that
the book is generally ‘about’ and use that to find a classification number
Question 8. What Headings did you select?
1. Heading 1:
2. Heading 2:
3. Heading 3:
4. Heading 4:
Step 6:
Let’s add these subject headings to our MARC and Dublin Core representations of our
book.
a. Launch your VCL and using your favorite editor add them to the MARC record using the
MARC cataloging template as your guide (http://www.loc.gov/marc/bibliographic).
b. Using the Dublin core guide select an appropriate field and add the authority to your DC
record.
c. Although to date we have only been working with simple Dublin core it would be useful if
we could indicate the classification scheme that we used to classify our resource.
Metadata Standards and Web Services
Erik Mitchell
Page 7
d. In order to indicate the appropriate type of vocabulary we will use a special attribute
called xsi-type to which we will assign the dcterms vocabulary encoding value.
e. xsi-type is an attribute in the XML Schema specification that allows an element to
explicitly define the content located in the field. Review the code below, particularly the
subject element to understand the use of xsi:type.
f. To find the list of values that can be assigned to xsi:type, visit the Dublin Core metadata
registry (http://dcmi.kc.tsukuba.ac.jp/dcregistry/) and retrieve the Vocabulary Encoding
Schemes.
g. Once you have created your xml document be sure to check its well-formedness and
validity!
1.
2. <?xml version="1.0"?>
3. <qualifieddc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
4. xsi:noNamespaceSchemaLocation="http://dublincore.org/schemas/xmls/qdc/2008/02/11/qualifieddc.x
sd"
5. xmlns:dc="http://purl.org/dc/elements/1.1/"
6. xmlns:dcterms="http://purl.org/dc/terms/">
7. <dc:title>Organization of Information Course Example </dc:title>
8. <dc:subject xsi:type="dcterms:DDC">062</dc:subject>
9. <dc:subject xsi:type="dcterms:UDC"> 061(410)</dc:subject>
10. <dc:description>This is an example record </dc:description>
11. <dc:description xml:lang="fr"> Cette classe est magnifique!</dc:description>
12. <dc:publisher>University of Maryland </dc:publisher>
13. <dc:identifier xsi:type="dcterms:URI"> http://erikmitchell.info/lbsc670_fall2011</dc:identifier>
14. <dcterms:isPartOf xsi:type="dcterms:URI">http://erikmitchell.info</dcterms:isPartOf>
15. </qualifieddc>
Metadata Standards and Web Services
Erik Mitchell
Page 8
Step 7:
While we are working on authorities, lets use our authority list to find a valid value for our
author entry.
a. Return to http://authorities.loc.gov and click on search authorities.
b. Make sure you have Name authorities and search for the author’s name.
c. Find the appropriate heading, using the LC cataloging guide
Question 9. What is the Authorized heading for our author? How could you tell?
Step 8:
Using the same process as step 6, add/update the creator value in your DC and MARC
records for author (Note, DC does not have a dcterms type for LC Name Headings).
Step 9:
We are now going to use these headings to select a classification
a. Login to classification web (http://classificationweb.net/)
i. Username and password are available in blackboard under course documents
for this class
ii. Click Log On and enter the username and password
iii. Lets begin by clicking on “Browse LC Subject Headings”
iv. Complete a few searches using the headings you found above. When you found
the appropriate heading click on the Classification number range to drill down
further into the classification
Figure 1 Example of Classification
v.
Question 10.
What are some potential classification numbers for our resource?
1. Potential heading 1:
2. Potential heading 2:
Metadata Standards and Web Services
Erik Mitchell
Page 9
vi. Take note of potential classification numbers, paying attention to specificity in the
headings and consider if that is the proper place for this resource. It can help to
look for other similar books to help decide (hint – search your library catalog).
Question 11.
What is the best classification Number for this resource?
vii. Select a specific class area and begin the process of cuttering
1. Cuttering is the process of adding a author-based refinement to your
classification number for uniqueness. Cuttering involves adding alphanumeric text after the classification number to position the book properly in
context on the stacks.
2. For complete documentation on cuttering
3. First Letter (author last name)
4. Number (See cuttering sheet)
Question 12.
What is your final Call number (including cutter): QA 276.45 .P5 D6
Step 10: Repeat the steps for adding your call number to your MARC and DC records. (Note, DC
does have a dcterms type for Library of Congress classification)
Key Questions
Question 13.
What call number did you create for this record?
Question 14.
What subject headings did you assign?
Question 15.
What Dublin Core vocabulary scheme did you select for the library of congress
classification?
Metadata Standards and Web Services
Erik Mitchell
Page 10
Step 11: Classify the resource using folksonomies.
a. Begin by looking through the resource and picking out the words that you would use to
describe the resource. Write these words down
b. Lets see how other sites have assigned tags to this resource. Visit each site below and
search for the book “Think stats.” Each site handles tags a bit differently. Look for
single words or groups of words that describe the book (e.g. goodreads calls tags
‘popular shelves). Take note of a few tags from each site and write them in the table
below.
Table 4 Tags for Think Stats
http://www.librarything.com
http://goodreads.com
http://www.shelfari.com/
Step 12: Before we finish up lets try our hand at a form of automatic classification using tag clouds.
Return to your think stats book and pull up the index in the html version
(http://greenteapress.com/thinkstats/html/thinkstats011.html).
a. Visit a tag cloud generation site (http://wordle.net or http://tagcrowd.com)
b. Copy the index entries from Think Stats and paste them into the tag creator.
c. Analyze the tag cloud that gets generated and modify the available settings.
d. Evaluate the resulting tags and record the top 5:
i. Tag 1:
ii. Tag 2:
iii. Tag 3:
iv. Tag 4:
v. Tag 5:
Metadata Standards and Web Services
Erik Mitchell
Page 11
Key questions
Question 16.
What were the most representative tags from the folksonomies? How did their
content compare to the other classification systems?
Question 17.
Rank each classification system that we have worked with (e.g. ACM, LCSH
and Folksonomies) with regards to how specific the classification is. Re-rank the systems in
terms of ‘exhaustivity.’ Which system is most exhaustive and how does that compare to the
system that is most specific?
One reason that we create classification systems is to enable browsing for physical resources.
Today we classified an e-book – a resource that can only be found and viewed using online
systems. Look around at some library catalogs and find some e-books.
Question 18.
Do ebooks have call numbers? How exhaustive or specific are the subject
headings?
Summary
This week we have explored a number of different classification systems and controlled vocabulary
platforms. and have explored how to include subject headings and classification numbers in our
representations. We explored the process of cuttering in LC and familiarized ourselves with common
LC classification tools. Next week we will consider how classification structures and controlled
vocabularies change in a Linked Data environment and learn more about how library metadata is
changing in response to these systems.
Metadata Standards and Web Services
Erik Mitchell
Page 12
Download