Class 10 - Subject analysis and Classification Exercise Overview In our prior classes this semester we have focused on the process of creating and manipulating metadata rich documents and representations of those documents. As part of this work we have run into subject headings and call numbers and explored types of authority control. For the next few weeks we will explore categorization and classification and will become familiar with the tools that enable us to apply existing classification systems and create new knowledge organization systems. Today we will explore two different classification systems and try our hand at using them to classify some resources. Instructions: Online students: Work individually to complete the worksheet. When asked to ‘discuss as a group’, consider your response and continue completing the worksheet. In-Class students: Working in groups of 3-4, complete the worksheet. Appoint one person to read the text and questions on the worksheet, one person to record the group answers on the worksheet and one person who is responsible for reporting back to the entire class. All members of the group should participate in team exploration and discussion. For everyone: 1. Because this worksheet involves technical exercises, each person should complete the technical portions. As your group works through the technical elements of the worksheet keep talking and helping each other. 2. Wait for your group members to catch up or help them over rough spots so that you can discuss the key questions together. Metadata Standards and Web Services Erik Mitchell Page 1 Suggested readings 1. Mitchell, E. (2015). Chapter 5 in Metadata Standards and Web Services in Libraries, Archives, and Museums. Libraries Unlimited. Santa Barbara, CA. Overview This week we are exploring the process of subject analysis and classification, particularly in relation to subject analysis for bibliographic representation. Chowdhury (2009) defines indexing as the “Assignment of identifiers to text items” and Subject Indexing as “Conceptual analysis of the subject of documents.” For this worksheet we will read/skim chapter 2 in Lanacaster’s work Indexing and Abstracting in Theory and Practice. In this exercise we will explore these two concepts as they are applied in manual contexts. Later in this course we will explore automatic classification and systems that are built to enable retrieval. Lets begin exploring these activities by understanding some of the key concepts in subject analysis. Step 1: Complete the following table of concepts by identifying a definition for each concept Table 1 Classification vocabulary Aboutness Exhaustivity Specificity Conceptual analysis Translation Controlled Vocabulary Metadata Standards and Web Services Erik Mitchell Page 2 These concepts work together to form the foundation of how we talk about the scope and content of our indexing process. For example, exhaustivity and specificity are two concepts that work together in balance to help us understand how to manage recall. Semantic and syntactic analysis are two different types of meaning encoded in documents, one from content meaning (semantics) and the other from content structure (syntactics). Subject analysis and classification focus on the application of these concepts during the analysis of a resource. Lancaster suggests two principle steps (e.g. Conceptual analysis and translation) that form the foundation of subject analysis and classification. In doing this Lancaster mentions a number of questions that need to be asked when performing topical analysis on a document. Review Lancaster’s chapter and answer the key questions Key Questions Question 1. What questions does Lancaster recommend asking about a resource during the indexing process? Question 2. What role does Lancaster indicate that the “community” plays in helping create good indexes? Question 3. What are the three types of controlled vocabularies that Lancaster mentions? Question 4. How are subject heading and thesauri lists related? How are they different? Take a moment to review Lancaster’s Figure 5 on page 22. Notice how each controlled vocabulary handles terminology slightly differently. Let’s turn to Kwasnik’s article The Role of Classification in Knowledge Representation and Discovery. In her article, Kwasnik mentions four types of classification structures. For each structure fill out the table below. Metadata Standards and Web Services Erik Mitchell Page 3 Table 2: Map of classification types Classification type Common uses Limitations Examples Hierarchies Trees Paradigms Faceted Classifications Folksonomies (not in article) Kwasnik’s article dates from before the emergence of folksonomies. If you are not familiar with folkonomies take a moment to look the term up and fill out the entry in the table. With these types of classification in mind spend a few moments exploring the Library of Congress classification system at http://www.loc.gov/catdir/cpso/lcco/ and answer the following questions. Key Questions Question 5. Where would Kwasnik place the LC Classification? There are a number of classification systems including The Library of Congress system, the Colon Classification System, the Universal Decimal System, Bliss Bibliographic Classification system and the Dewey Decimal system. Each of these systems focus on identifying the “aboutness” of a document and coding of that aboutness into a classification number. Lets begin by understanding the difference between three types of systems. Chowdhury (2009) takes an alternative approach to describing classification systems, focusing on three types, enumerative, faceted and analyticosynthetic. Metadata Standards and Web Services Erik Mitchell Page 4 Enumerative: Subjects are pre-defined and listed in a hierarchical notation. Application of the classification system involves finding the appropriate class in the classification system and applying the class without modification. Analytico-synthetic: Analytico-Synthetic systems are hierarchical but rather than relying on a completely pre-defined hierarchy it allows the cataloger to add refining concepts to classification such as geographic, temporal and topical refinements. In addition, an Analytico-Synthetic system allows the classifier to build a classification number using the combination of hierarchical and refining concepts. Faceted: Faceted systems are non-hierarchical and involve the combination of multiple categorization areas (or facets) to create a classification. One of the most popular faceted classification systems is Ranganathan’s colon classification. Ranganathan’s system featured five facets: Personality, Matter, Energy, Space and Time (PMEST). Table 3 Features of Classification systems System feature Classification type Subjects and classes are listed in a pre-defined notation Enumerative System is - “Strictly hierarchical”, “pre-defined” Rules for classification have no pre-defined classes but define an approach to classification Classification process focuses on identifying unrelated aspects of a document (personality, matter, energy, space, time) Mixes pre-defined hierarchy and refining facet features Uses classification schedules to build a classification number These three types of classification systems (Enumerative, Faceted, and Analytico-synthetic) are the most common traditional systems. In addition to this there are social classification systems known as Metadata Standards and Web Services Erik Mitchell Page 5 folksonomies that rely on the aggregation of tags assigned by users of information resources. Folksonomies are often represented in Tag Clouds, a visual representation of tags with emphasis based on tag occurrence. Generally speaking, the Library of Congress Classification System and Dewey Decimal Systems are considered Analytico-Synthetic because they blend a hierarchical subject analysis (Enumerative) with refining classification schedules (quasi-faceted). For example, the LCC system allows you to assign geographic and time facet refinements to a subject classification and the Dewey system features 10 main divisions that are hierarchically arranged to create a classification. Step 2: Lets try our hand in applying at subject analysis and classification system to a resource. As our resource we will use Think Stats by Allen Downey. Think Stats is an online book make available under a Creative Commons Attribution-Noncommercial 3.0 Unsupported License. The book is available at http://greenteapress.com/thinkstats/html/index.html. Use it in each of the following classification exercises below. Step 3: For each system apply the following process for subject analysis a. Analyze the resource for content b. Identify keywords and key concepts c. Group concepts and consider what the primary ‘aboutness’ is. d. Explore your classification system and see how your keywords match e. Identify a primary topic area and order sub-concepts hierarchically f. Consult the class schedule and produce a chain of subject links g. Translate the subject headings to the appropriate notation scheme Step 4: Classify the resource using Association of Computing Machinery system (Enumerative) a. Browse the ACM classification system at http://www.acm.org/about/class/1998. Question 6. What is your top level ACM heading? Question 7. What is the full ACM classification? Metadata Standards and Web Services Erik Mitchell Page 6 *note – The ACM system does not focus on developing a classification number that is unique to each item. Step 5: Identify subject headings using the resource using the Library of Congress Classification (Analytico-synthetic) a. Lets begin by identifying the authorized headings for these subjects: i. Go to http://authorities.loc.gov ii. click on “search authorities”, Make sure you have “Subject Authority Headings” selected and conduct searches to find headings. iii. As you browse results, pay attention to Type of heading (we want LC Subject Headings) and See Also and Scope Notes. iv. Pick three to four entries that are “Authorized headings.” Pick the heading that the book is generally ‘about’ and use that to find a classification number Question 8. What Headings did you select? 1. Heading 1: 2. Heading 2: 3. Heading 3: 4. Heading 4: Step 6: Let’s add these subject headings to our MARC and Dublin Core representations of our book. a. Launch your VCL and using your favorite editor add them to the MARC record using the MARC cataloging template as your guide (http://www.loc.gov/marc/bibliographic). b. Using the Dublin core guide select an appropriate field and add the authority to your DC record. c. Although to date we have only been working with simple Dublin core it would be useful if we could indicate the classification scheme that we used to classify our resource. Metadata Standards and Web Services Erik Mitchell Page 7 d. In order to indicate the appropriate type of vocabulary we will use a special attribute called xsi-type to which we will assign the dcterms vocabulary encoding value. e. xsi-type is an attribute in the XML Schema specification that allows an element to explicitly define the content located in the field. Review the code below, particularly the subject element to understand the use of xsi:type. f. To find the list of values that can be assigned to xsi:type, visit the Dublin Core metadata registry (http://dcmi.kc.tsukuba.ac.jp/dcregistry/) and retrieve the Vocabulary Encoding Schemes. g. Once you have created your xml document be sure to check its well-formedness and validity! 1. 2. <?xml version="1.0"?> 3. <qualifieddc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 4. xsi:noNamespaceSchemaLocation="http://dublincore.org/schemas/xmls/qdc/2008/02/11/qualifieddc.x sd" 5. xmlns:dc="http://purl.org/dc/elements/1.1/" 6. xmlns:dcterms="http://purl.org/dc/terms/"> 7. <dc:title>Organization of Information Course Example </dc:title> 8. <dc:subject xsi:type="dcterms:DDC">062</dc:subject> 9. <dc:subject xsi:type="dcterms:UDC"> 061(410)</dc:subject> 10. <dc:description>This is an example record </dc:description> 11. <dc:description xml:lang="fr"> Cette classe est magnifique!</dc:description> 12. <dc:publisher>University of Maryland </dc:publisher> 13. <dc:identifier xsi:type="dcterms:URI"> http://erikmitchell.info/lbsc670_fall2011</dc:identifier> 14. <dcterms:isPartOf xsi:type="dcterms:URI">http://erikmitchell.info</dcterms:isPartOf> 15. </qualifieddc> Metadata Standards and Web Services Erik Mitchell Page 8 Step 7: While we are working on authorities, lets use our authority list to find a valid value for our author entry. a. Return to http://authorities.loc.gov and click on search authorities. b. Make sure you have Name authorities and search for the author’s name. c. Find the appropriate heading, using the LC cataloging guide Question 9. What is the Authorized heading for our author? How could you tell? Step 8: Using the same process as step 6, add/update the creator value in your DC and MARC records for author (Note, DC does not have a dcterms type for LC Name Headings). Step 9: We are now going to use these headings to select a classification a. Login to classification web (http://classificationweb.net/) i. Username and password are available in blackboard under course documents for this class ii. Click Log On and enter the username and password iii. Lets begin by clicking on “Browse LC Subject Headings” iv. Complete a few searches using the headings you found above. When you found the appropriate heading click on the Classification number range to drill down further into the classification Figure 1 Example of Classification v. Question 10. What are some potential classification numbers for our resource? 1. Potential heading 1: 2. Potential heading 2: Metadata Standards and Web Services Erik Mitchell Page 9 vi. Take note of potential classification numbers, paying attention to specificity in the headings and consider if that is the proper place for this resource. It can help to look for other similar books to help decide (hint – search your library catalog). Question 11. What is the best classification Number for this resource? vii. Select a specific class area and begin the process of cuttering 1. Cuttering is the process of adding a author-based refinement to your classification number for uniqueness. Cuttering involves adding alphanumeric text after the classification number to position the book properly in context on the stacks. 2. For complete documentation on cuttering 3. First Letter (author last name) 4. Number (See cuttering sheet) Question 12. What is your final Call number (including cutter): QA 276.45 .P5 D6 Step 10: Repeat the steps for adding your call number to your MARC and DC records. (Note, DC does have a dcterms type for Library of Congress classification) Key Questions Question 13. What call number did you create for this record? Question 14. What subject headings did you assign? Question 15. What Dublin Core vocabulary scheme did you select for the library of congress classification? Metadata Standards and Web Services Erik Mitchell Page 10 Step 11: Classify the resource using folksonomies. a. Begin by looking through the resource and picking out the words that you would use to describe the resource. Write these words down b. Lets see how other sites have assigned tags to this resource. Visit each site below and search for the book “Think stats.” Each site handles tags a bit differently. Look for single words or groups of words that describe the book (e.g. goodreads calls tags ‘popular shelves). Take note of a few tags from each site and write them in the table below. Table 4 Tags for Think Stats http://www.librarything.com http://goodreads.com http://www.shelfari.com/ Step 12: Before we finish up lets try our hand at a form of automatic classification using tag clouds. Return to your think stats book and pull up the index in the html version (http://greenteapress.com/thinkstats/html/thinkstats011.html). a. Visit a tag cloud generation site (http://wordle.net or http://tagcrowd.com) b. Copy the index entries from Think Stats and paste them into the tag creator. c. Analyze the tag cloud that gets generated and modify the available settings. d. Evaluate the resulting tags and record the top 5: i. Tag 1: ii. Tag 2: iii. Tag 3: iv. Tag 4: v. Tag 5: Metadata Standards and Web Services Erik Mitchell Page 11 Key questions Question 16. What were the most representative tags from the folksonomies? How did their content compare to the other classification systems? Question 17. Rank each classification system that we have worked with (e.g. ACM, LCSH and Folksonomies) with regards to how specific the classification is. Re-rank the systems in terms of ‘exhaustivity.’ Which system is most exhaustive and how does that compare to the system that is most specific? One reason that we create classification systems is to enable browsing for physical resources. Today we classified an e-book – a resource that can only be found and viewed using online systems. Look around at some library catalogs and find some e-books. Question 18. Do ebooks have call numbers? How exhaustive or specific are the subject headings? Summary This week we have explored a number of different classification systems and controlled vocabulary platforms. and have explored how to include subject headings and classification numbers in our representations. We explored the process of cuttering in LC and familiarized ourselves with common LC classification tools. Next week we will consider how classification structures and controlled vocabularies change in a Linked Data environment and learn more about how library metadata is changing in response to these systems. Metadata Standards and Web Services Erik Mitchell Page 12