Testing a Vocabulary Standard Against Cataloguing Practice in Canadian Museums: Demonstrating the Validity of the Art & Architecture Thesaurus as a Vocabulary Source or Search Tool for the Humanities National Database of the Canadian Heritage Information Network Prepared by Heather Dunn Canadian Heritage Information Network October 1995 The two sections, “Introduction and Methodology” and “Object Name/Object Type Match Results”, are taken largely from a University of Toronto Museum Studies Master's Thesis, Testing a Vocabulary Standard Against Object Naming Practice in Canadian Museums, by Heather (Coates) Dunn. The English version of the paper was edited by Merridy Bradley. This document is available in French under the title, Mise à l'épreuve d'un vocabulaire normalisé au regard des habitudes de catalogage dans les musées canadiens : Validité du Art & Architecture Thesaurus en tant que source de vocabulaire et outil de recherche pour le Répertoire national des sciences humaines du Réseau canadien d'information sur le patrimoine. iii Table of Contents 1. Introduction and Methodology ....................................................................................................... 1 2. CHIN OBJECT NAME/OBJECT TYPE vs. Entire AAT ...................................................................... 3 2.1 OBJECT NAME/OBJECT TYPE Match Results .......................................................................... 6 2.1.1 2.1.2 2.1.3 Mendel Art Gallery OB/OT Match Results .............................................................. 6 a. Non-frequency-weighted Match Results (Unique Term Match) ..................... 6 b. Frequency-weighted Match Results .............................................................. 8 c. Commonly Used OBJECT NAME/OBJECT TYPE Combination Match Results ... 9 New Brunswick Museum OB/OT Match Results .................................................... 11 a. Non-frequency-weighted Match Results (Unique Term Match) .................... 11 b. Frequency-weighted Match Results ............................................................. 12 c. Commonly Used OT/OB Combination Match Results ................................. 13 UBC Museum of Anthropology OB/OT Match Results ........................................... 16 a. Non-frequency-weighted Match Results (Unique Term Match) .................... 16 b. Frequency-weighted Match Results ............................................................. 17 2.2 Mismatches in Sample OB/OT Data: Contextual Errors..................................................... 21 3. CHIN MATERIAL Field vs. AAT Materials Facet .......................................................................... 24 3.1 MATERIAL Field Match Results .......................................................................................... 25 4. 3.1.1 Non-frequency-weighted Match Results .................................................................. 25 3.1.2 Frequency-weighted Match Results ........................................................................ 26 3.1.3 Ewing Vocabulary Match Results (Non-frequency-weighted) .................................. 27 3.1.4 Commonly Used MATERIAL Terms Match Results.................................................... 27 CHIN TECHNIQUE Field vs. AAT Processes/Techniques Hierarchy .......................................... 40 4.1 TECHNIQUE Field Match Results ...................................................................................... 41 4.1.1 Non-frequency-weighted Match Results ................................................................... 41 4.1.2 Frequency-weighted Match Results ........................................................................ 42 4.1.3 Ewing Vocabulary Match Results (Non-frequency-weighted) .................................. 43 iv 4.1.4 Commonly Used TECHNIQUE Terms Match Results ................................................. 44 5. CHIN CULTURE and SCHOOL/STYLE Fields vs. AAT Styles and Periods Facet .......................... 49 5.1 CULTURE Field Match Results ............................................................................................ 51 5.1.1 Non-frequency-weighted Match Results ................................................................... 51 5.1.2 Frequency-weighted Match Results ........................................................................ 52 5.1.3 CHIN CULTURE Thesaurus Match Results (Non-frequency-weighted) ..................... 53 5.1.4 Commonly Used CULTURE Terms Match Results .................................................... 54 5.2 SCHOOL/STYLE Field Match Results ............................................................................... 59 5.2.1 Non-frequency-weighted Match Results .................................................................. 59 5.2.2 Frequency-weighted Match Results ........................................................................ 60 5.2.3 Commonly Used SCHOOL/STYLE Terms Match Results ........................................... 61 6. Summary of Match Results ....................................................................................................... 65 7. Conclusions ............................................................................................................................... 67 8. Recommendations ..................................................................................................................... 68 9. Bibliography and References ..................................................................................................... 69 10. Appendix - Definitions............................................................................................................. 71 10.1 Terminology....................................................................................................................... 71 10.2 Occurrence ....................................................................................................................... 71 10.3 Types of Match Statistical Methods................................................................................... 71 10.4 Match Results .................................................................................................................. 72 v vi 1. Introduction and Methodology The intent of this paper is to determine how closely the contents of several fields (OBJECT NAME, OBJECT TYPE, MATERIAL, TECHNIQUE, CULTURE, and SCHOOL/STYLE) in the Humanities National Inventory of the Canadian Heritage Information Network (CHIN) correspond with the terminology of the Art & Architecture Thesaurus (AAT), and to examine the implications of this correspondence. This study will determine the feasibility of using the AAT as a vocabulary source or a search tool for users of the CHIN Humanities National Inventory. It may also provide the AAT with candidate terms, which could broaden its scope and user base and make it a better source for Canadian museums. To date, there have been few comprehensive vocabulary sources and no real thesauri for use by the museum community. The AAT fills this need; it contains a comprehensive vocabulary (including the Nomenclature terms in a thesaural structure) that can be used for describing as well as naming objects. The AAT could be used by many of the CHIN contributors to assist them with data entry. The AAT is available in an electronic version called the Authority Reference Tool (ART) that is designed to work with computer collections applications. CHIN has recently re-evaluated the need to manage institutional databases on its Ottawa mainframe. With advances in affordable, powerful systems and software tailored for collections management, it is now more practical for museums to handle their collections management databases locally. Many of the new collections applications may have the ability to use the AAT as an online interactive thesaurus that would provide both a controlled vocabulary for data entry and a tool for "intelligent" retrieval. Vendors of museum collections applications are incorporating the AAT into their software to make it easily accessible to their users. The AAT could be used as a tool for searching the National Inventories, which will remain centralized at CHIN. Because the data in the National are contributed by individual museums that may not follow a common vocabulary standard, there is a very diverse vocabulary. Problems currently encountered in a search of the National would be addressed with use of the AAT's hierarchical, synonymous, and associative relationships through the following: 1 1) use of the AAT's synonymous relationships (Use For terms) to find all variations of a term, including regional terminology differences and synonyms. Regardless of whether the variation was a preferred term in the AAT, it would point the user to the appropriate records. 2) use of the AAT's hierarchical relationships (Broader and Narrower terms) to perform successful broad searches, which are presently impossible. For example, if the AAT were used as a search tool, a search for wood would also find occurrences of pine, elm, spruce, etc. At present, the searcher must know and search for all possible types of wood in order to find all the appropriate records. 3) use of the AAT's associative relationships (Related Terms) to point out differences between similar terms, find related terms to help in the search, and to supply appropriate and precise terminology for a particular search. Given that the CHIN National Inventory is far too large to be completely standardized in vocabulary, it becomes essential to focus on developing tools that will make order out of the diverse vocabularies used by contributing museums and will allow access to the contents regardless of variations. This issue of access, despite differences in vocabulary and level of user expertise, will become increasingly significant as CHIN data become available via the Internet, where it will be encountered by a wide range of people at various levels of museum expertise, from all over the world. Another challenge that faces CHIN is that it must make English and French collections data accessible to both English and French researchers. Although the AAT is currently only in English, CHIN needs to be able to map the AAT vocabulary to the French language equivalents of terms most commonly used by Canadian museums. The mapping would be done by: 1) locating the most commonly used CHIN terms in the AAT hierarchies. 2) entering the French translations of these terms into the AAT as language 2 equivalents. 3) creating a search assistant tool (bilingual thesaurus) to retrieve data in the National, regardless of whether the query was in French or in English. The AAT would become an even more essential front-end search tool for CHIN if it could be used to access both English and French records, and respond to queries in both languages. The CHIN Humanities National Inventory fields studied were OBJECT NAME (OB), OBJECT TYPE (OT), MATERIAL (MA), TECHNIQUE (MT), SCHOOL/STYLE (SA), and CULTURE (CU). Due to the large number of records, the study of the OBJECT NAME, OBJECT TYPE, TECHNIQUE, and MATERIAL fields was done on a sample of the Humanities National Inventory. The OBJECT NAME/OBJECT TYPE sample consisted of the data from three institutions. The MATERIAL and TECHNIQUE terms used in the study were random samples of the Humanities National Inventory. As the number of records including SCHOOL/STYLE and CULTURE were small, the entire Humanities National Inventory was studied for these two fields. The methodology for the OBJECT NAME/OBJECT TYPE Match differed slightly from the technique used to Match the AAT with the other CHIN fields. Because OBJECT TYPE is meant to be a qualifier to OBJECT NAME, it was possible to append OBJECT NAME to OBJECT TYPE prior to the Match, making one "term" to Match with the AAT terms. For example, a record with OB=chair and OT=rocking would become rocking chair prior to being matched against the AAT. The other CHIN fields (MATERIALS, TECHNIQUE, CULTURE, and SCHOOL/STYLE) were not combined prior to the Match. The OBJECT NAME/OBJECT TYPE data were matched against the entire AAT (not just the Objects facet); the other CHIN fields were only matched against the AAT hierarchy that held the majority of their terms (CHIN MATERIAL field matched against the AAT Materials facet, etc.). Also, the OBJECT NAME/OBJECT TYPE Match was computer-assisted, making it possible to generate more information about the Match; the other CHIN fields were matched manually against the AAT. Corrections for some of the structural differences between CHIN and AAT terminology was made prior to the Match. For instance, punctuation in the CHIN data was disregarded in order to make it more similar in structure to the AAT terminology. Although CHIN object names are usually singular (e.g., OB=chair) and the AAT main preferred terms are plural, the CHIN data could usually Match the AAT Alternate Term, which is in the singular form. Therefore, it was not necessary to change 3 CHIN data from singular to plural to Match the AAT. The Match between CHIN and AAT terms resulted in four mutually exclusive categories of data: Exact Phrase Match, Exact Word Match, Partial Match, or Non-Match (see Appendix—Definitions). 2. CHIN OBJECT NAME/OBJECT TYPE vs. Entire AAT Although it would have been best to base the study on all of the OBJECT NAME and OBJECT TYPE data in the Humanities National Inventory, this was impossible. The Humanities National Inventory contains approximately 2 500 000 records, and most of these records have entries in both the OB and OT fields. Each of these entries would have to be matched with the 106 519 terms in the AAT. Differing recording practices across CHIN museums would have made it difficult to program the computer to anticipate all variations of the CHIN data and Match them with the AAT accordingly. Therefore, in order for the test to be accurate (and not biased by variations of syntax, etc.), and for the results to be meaningful, the Match was performed in smaller, internally consistent sections—representing a few of the institutions’ data—each of which would be independently matched with the AAT. Another advantage of basing the study on a sample of the Humanities National Inventory was that in the findings and conclusions of the study, one could distinguish between the Match results and implications for each institution studied, and between the different disciplines represented by the museums. Although the Humanities National Inventory also includes data from French institutions, the sample was selected from among museums with only English records. The AAT is in English only, and could not be matched against French data. Criteria for selecting a representative sample from the Humanities National Inventory were as follows: 1) the sample was to contain only English data; 4 2) the sample was to have a high level of internal consistency within the OBJECT NAME and OBJECT TYPE fields. The data did not have to conform to the Humanities Data Dictionary, but had to be internally consistent in terms of format and structure; 3) the sample was to be of a manageable size, yet large enough to be an acceptable statistical sample; 4) the sample was to be representative of broad collections in the different disciplines found in the Humanities National Inventory: history, art, and ethnology; 5) the sample was to be representative of different regions of Canada, and 6) the sample was to be representative of different sizes of museum collections. In order to select the sample, a study of internal consistency in the OBJECT NAME (OB) and OBJECT TYPE (OT) fields of museums with general history, ethnology, and art collections was undertaken. One institution of each of the three disciplines was selected for its internal data consistency and its general collections. The rules that governed data entry for each institution (as became apparent in the study of their OB and OT fields) were noted for future use in the programming of the computer Match. Three institutions, one from each of the disciplines of history, art, and ethnology, and from widely separated geographical regions, were selected from the Humanities National Inventory. The Mendel Art Gallery, located in Saskatoon, Saskatchewan, was the representative of art collections and of prairie institutions. The New Brunswick Museum, located in Saint John, New Brunswick, was chosen as representative of history museums and of the east coast institutions. The University of British Columbia Museum of Anthropology, located in Vancouver, British Columbia, was chosen to represent ethnology collections and west coast institutions. The sample OBJECT NAME/OBJECT TYPE data represented 105 165 of 2 452 810 records, or approximately 4.3% of the Humanities National Inventory. This sample is an adequate indicator of the makeup of the data and is representative of the whole. Because the sample was, in part, 5 chosen for its internal consistency of format and structure, there is a possibility that the terminology usage within the sample may be more consistent than the norm. However, the level of consistency in format and structure of terms should not have affected the level of vocabulary control. The contents of the sample institutions' OBJECT NAME and OBJECT TYPE fields were sent to the Getty Art History Information Program (AHIP) to run a computer Match of the AAT vocabulary against CHIN data. A series of separate matches for each of the three sample institutional data sets were run against all 33 hierarchies of the AAT, and computer-generated reports were produced. It was necessary to combine the terms from the two CHIN fields (OBJECT TYPE and OBJECT NAME) in a meaningful way prior to the Match. In general, the OBJECT TYPE (OT) term was prepended to the OBJECT NAME (OB) term in order to make up a meaningful composite term in natural word order. This is because the OT field is meant as a qualifier to the OB field. For example, OB=chair OT=rocking Composite term to be matched with the AAT=rocking chair Although the reports from AAT computer program on the OBJECT NAME/OBJECT TYPE Match were useful to some extent, some problems caused difficulty in calculating the Match percentages. Because the frequency of term usage within the CHIN databases was a significant factor in determining the overall success of the AAT/CHIN Match, it was also important to generate weighted Match statistics based on the number of occurrences of the CHIN OBJECT NAME/OBJECT TYPE phrases; this was something that the computer-generated Match had not been able to do. For example, a term such as photograph, which occurs frequently in the CHIN databases, would be counted as a Match each time that it occurred, thus giving it more weight in the Match than a term which occurred infrequently. CHIN terms that are not found in the AAT may have more authority as candidate terms if they occur frequently in the National Inventory. Also, terms that occur only once or twice in the data are often spelling or typographical errors. To rectify the problems with the AAT reports and to generate frequency-weighted Match statistics, 6 frequency lists for each of the unique OBJECT NAME/OBJECT TYPE combinations in each of the three databases were produced at CHIN, and these were manually coded as belonging to one of four mutually exclusive categories: Exact Phrase Match, Exact Word Match, Partial Match, or Non-Match (see Appendix—Definitions). From this analysis, both unique occurrence and frequency-weighted statistics could be produced on the OBJECT NAME/OBJECT TYPE terms in these four categories, and on combinations thereof. AHIP’s reports on the computer Match were manually checked at CHIN. The CHIN terms listed by the computer as AAT matches were studied to determine whether they were true matches, or whether the computer Match was incorrect because of a mismatch in context. Incorrectly and questionably matched terms were noted. The CHIN terms listed by the computer as non matches were also studied to determine whether they would have matched with slight alteration (e.g., spacing, spelling, etc.), or whether they were potential lead-in or preferred vocabulary to be added to the AAT. 7 2.1 OBJECT NAME/OBJECT TYPE Match Results The results reported here are of two sorts, from two different sources, as follows: 1) The Match rate statistics for the entire institutional database—for unique occurrences and frequency-weighted matches. 2) The Match rate statistics for the most commonly used terms—for unique occurrences and frequency-weighted matches. 2.1.1 Mendel Art Gallery OB/OT Match Results a. Non-frequency-weighted Match Results (Unique Term Match) There were 101 unique OBJECT NAME/OBJECT TYPE combinations of the Mendel Art Gallery that were matched against the AAT. The results of this Match are shown in Table 1. Table 1 AAT vs. Mendel Art Gallery OB/OT Data: Non-frequency-weighted Match Results Match Categories Number of Matches Percentage of Total 30 / 101a,b 29.7% Exact Word Matches 57 / 101 56.4% Partial Matches 13 / 101 12.9% 1 / 101 1% 87 / 101 86.1% Exact Phrase Matches Non-matches Total Exact Matches 8 a Of the 30 terms which matched as phrases, the term designation of the AAT term was as follows: 23 / 30 (76.7%) matched (a) - Alternate Terms (singular form) 2 / 30 (6.7%) matched (m) - main terms (plural form) 3 / 30 (10%) matched (u,w) - Use For terms (lead-in term) 2 / 30 (6.7%) matched (b) - Alternative British Equivalents The CHIN term matched the AAT Alternate Term most often because the CHIN term is usually in singular form. b Of the 30 CHIN terms which matched as phrases, most appeared in the Objects Facet: 27 / 30 (90%) found in Objects Facet 1 / 30 (3.3%) found in Materials Facet (stained glass) 1 / 30 (3.3%) found in Physical Attributes Facet (crest) 1 / 30 (3.3%) found in Activities Facet. (Embroidery) 9 b. Frequency-weighted Match Results When term frequency was taken into account, 4826 Mendel Art Gallery entries of terms were matched against the AAT. Table 2 shows the results of this Match. Table 2 AAT vs. Mendel Art Gallery OB/OT Data: Frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Exact Phrase Matches 2105 / 4826 43.6% Exact Word Matches 1199 / 4826 24.8% Partial Matches 1501 / 4826 31.1% 1 / 4826 0.02% 3304 / 4826 68.4% Non-matches Total Exact Matches The Exact Match rate for the Mendel Art Gallery OBJECT NAME/OBJECT TYPE data was quite low when the frequency of terms was taken into account. This is because three of the most commonly used terms in the Mendel Art Gallery data ("b/w negative", 790; "b/w photograph", 474; "b/w print", 214) did not Match exactly because of an abbreviation of the term "black and white". When the Mendel Art Gallery’s "b/w" was expanded to "black-and-white" as in the AAT, the results are a significant increase in percentage matches (Table 3): Table 3 AAT vs. Mendel Art Gallery OB/OT Data: Frequency-Weighted Match Results - "b/w" Expanded 10 Match Categories Exact Phrase Matches Exact Word Matches Number of Matches 3583 / 4826 74.2% 1199 / 4826 24.8% 23 / 4826 0.5% 1 / 4826 0.02% 4782 / 4826 99% Partial Matches Non-matches Total Exact Matches Percentage of Total 11 c. Commonly Used OBJECT NAME/OBJECT TYPE Combination Match Results The 11 most commonly used terms (occurring 100 times or more) in the Mendel Art Gallery OBJECT NAME/OBJECT TYPE data are listed in Table 4. These terms, when their frequency were taken into account, total 4144 and made up 85.8% of the Mendel Art Gallery data. Note that none of the Mendel Art Gallery commonly used OB/OT terms were Non-matches. Table 4 AAT vs. Mendel Art Gallery OT/OB Combination Phrases Used 100 Times or More Match Freq Term - 790 b/w negative - 474 b/w photograph - 214 b/w print 417 colour print 459 drawing 211 intaglio print 608 painting 113 planographic print 186 relief print O 211 stencil print O 461 watercolour painting 4144 by frequency Total 11 by unique occurrence ─────────────────────────────────────── 12 Key to Match codes = Exact Phrase Match with the AAT O = Exact Word Match with the AAT - = Partial Match with the AAT (matching word is underlined) X = no Match 13 A match against the AAT was made on a sample of 11 commonly used, unique OBJECT NAME/OBJECT TYPE terms from the Mendel Art Gallery. The results of the match are shown in Table 5. Table 5 AAT vs. Mendel Art Gallery Commonly Used OB/OT Terms: Non-frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Exact Phrase Matches 6 / 11 54.5 % Exact Word Matches 2 / 11 18.1 % Partial Matches 3 / 11 27.2 % Non-matches 0 / 11 0 % Total Exact Matches 8 / 11 72.7 % A match against the AAT was made on 4144 Mendel Art Gallery OBJECT NAME/OBJECT TYPE entries of phrases that occurred commonly (100 times or more). The results are shown in Table 6. Table 6 AAT vs. Mendel Art Gallery Commonly Used OB/OT Terms: Frequency-weighted Match Results Match Categories Number of Matches Matches Matches 14 Percentage of Total Exact Phrase Matches Exact Word Matches Partial Matches 1994 / 4144 48.1 % 672 / 4144 16.2 % 1478 / 4144 35.6 % Non-matches Total Exact Matches 0 / 4144 2666 / 4144 15 0 % 64.3 % 2.1.2 New Brunswick Museum OB/OT Match Results a. Non-frequency-weighted Match Results (Unique Term Match) There were 5975 unique OBJECT NAME/OBJECT TYPE combinations of the New Brunswick Museum that were matched against the AAT. The results of this match are shown in Table 7. Table 7 AAT vs. New Brunswick Museum OB/OT Data: Non-frequency-weighted Match Results Match Categories Number of Matches Percentage of Total 1624 / 5975a,b 27.2% Exact Word Matches 2699 / 5975 45.2% Partial Matches 1299 / 5975 21.7% Non-matches 353 / 5975 5.9% Total Exact Matches 4323 / 5975 72.4% Exact Phrase Matches a Of the 1624 terms which matched as phrases, the term designation of the AAT term was as 1243 / 1624 (76.5%) matched (a) - Alternate Terms (singular form) 264 / 1624 (16.3%) matched (m) - main terms (plural form) 91 / 1624 (5.6%) matched (u,v) - Use For terms (lead-in term) 20 / 1624 (1.2%) matched (b) - Alternative British Equivalent (k) - British Equivalents 6 / 1624 (0.4%) matched follows: The CHIN term matched the AAT Alternate Term most often because the CHIN term is usually in singular form. 16 b Of the 1624 terms which matched as phrases, most matched the AAT's Objects Facet. 1518 / 1624 (93.4%) found in Objects Facet 80 / 1624 (4.9%) found in 5 / 1624 (0.3%) found in 15 / 1624 (0.9%) found in Materials Facet Physical Attributes Facet Activities Facet. 4 / 1624 (0.2%) found in Styles and Periods Facet 1 / 1624 (0.1%) found in Agents Facet 1 / 1624 (0.1%) found in Associated Concepts Facet 17 b. Frequency-weighted Match Results When term frequency was taken into account, 68 413 entries for New Brunswick Museum OBJECT NAME/OBJECT TYPE terms were matched against the AAT. Table 8 shows the results of this match. Table 8 AAT vs. New Brunswick Museum OB/OT Data: Frequency-weighted Match Results Match Categories Exact Phrase Matches Number of Matches Percentage of Total 55678 / 68413 81.4% Exact Word Matches 8441 / 68413 12.3% Partial Matches 2986 / 68413 4.4% Non-matches 1308 / 68413 1.9% Total Exact Matches 64119 / 68413 93.7% 18 c. Commonly Used OT/OB Combination Match Results The 38 most commonly used OBJECT NAME/OBJECT TYPE terms (occurring 200 times or more) in the New Brunswick Museum data are listed in Table 9. These terms, when their frequency was taken into account, totalled 39 928 entries, and made up 58.3% of the 68 413 entries in the New Brunswick Museum data. Note that none of the New Brunswick Museum OB/OT terms were Partial Matches. Table 9 AAT vs. New Brunswick Museum OT/OB Combo Phrases Used 200 Times or More Match Freq Term 368 architectural drawing 472 badge 282 basket 275 bowl 364 box 309 case 2601 coin 466 cover 218 cup 270 currency 217 dish 219 doll 537 drawing 273 dress 659 intaglio print 4673 lantern slide 1465 map O 357 marine photograph 19 262 mask 899 medal 794 painting 10805 X 372 picture 250 pipe 212 plane 373 planographic print 391 plate 3127 postcard 3613 print 383 1097 photograph relief print reproduction 474 saucer 443 sculpture 1072 310 teacup 408 teaspoon 253 vase O 365 watercolour painting Total 39 928 38 stereograph by frequency by unique occurrence ──────────────────────────────────── Key to Match codes = Exact Phrase Match with the AAT O = Exact Word Match with the AAT - = Partial Match with the AAT (matching word is underlined) X = no Match 20 21 Statistics for these 38 commonly used New Brunswick Museum OBJECT NAME/OBJECT TYPE terms were calculated for both frequency-weighted and unique occurrences. A match was made between the AAT and the 38 OBJECT NAME/OBJECT TYPE combination phrases that occurred 200 times or more in the New Brunswick Museum data (non-frequency-weighted). The results are shown in Table 10. Table 10 AAT vs. New Brunswick Museum Commonly Used OB/OT Terms: Non-frequency-weighted Match Results Match Categories Number of Matches Exact Phrase Matches Percentage of Total 35 / 38 92.1 % Exact Word Matches 2 / 38 5.3 % Partial Matches 0 / 38 0 % Non-matches 1 / 38 2.6 % Total Exact Matches 37 / 38 97.4 % A match was made between the AAT and the 39 928 entries of New Brunswick Museum OBJECT NAME/OBJECT TYPE phrases that occurred most commonly (200 times or more). The results are shown in Table 11. Table 11 AAT vs. New Brunswick Museum Commonly Used OB/OT Terms: Frequency-weighted Match Results 22 Match Categories Exact Phrase Matches Exact Word Matches Number of Matches 38834 / 39928 97.2 % 722 / 39928 1.8 % 0 / 39928 0 % 372 / 39928 0.9 % 39556 / 39928 99.0 % Partial Matches Non-matches Total Exact Matches Percentage of Total 23 2.1.3 UBC Museum of Anthropology OB/OT Match Results a. Non-frequency-weighted Match Results (Unique Term Match) There were 1915 unique OBJECT NAME/OBJECT TYPE combinations of the UBC Museum of Anthropology that were matched against the AAT. Table 12 shows the results of this match. Table 12 AAT vs. UBC Museum of Anthropology OB/OT Data: Non-frequency-weighted Match Results Match Categories Number of Matches Percentage of Total 728 / 1915a,b 38.0% 644 / 1915 33.6% Partial Matches 411 / 1915 21.5% Non-matches 133 / 1915 6.9% 1372 / 1915 71.6% Exact Phrase Matches Exact Word Matches Total Exact Matches a Of the 728 terms which matched as phrases, the term designation of the AAT term was as follows: 495 / 728 (67.9%) matched (a) 181 / 728 (24.9%) matched (m) 44 / 728 (6.0%) matched - Alternate Terms (singular form) - main terms (plural form) (u,v,w) - Use For terms (lead-in term) 6 / 728 (0.8%) matched (b) - Alternative British Equivalent 2 / 728 (0.3%) matched (k) - British Equivalent The CHIN term usually matched an AAT Alternate Term, as the CHIN terms are in singular form. b Of the 728 terms which matched as phrases, most were found in the AAT's Objects Facet. 24 664 / 728 (91.2%) found in 49 / 728 (6.7%) found in Objects Facet Materials Facet 2 / 728 (0.3%) found in Physical Attributes Facet 7 / 728 (1%) Activities Facet. found in 3 / 728 (0.4%) found in Styles and Periods Facet 1 / 728 (0.1%) found in Agents Facet 2 / 728 (0.3%) found in Associated Concepts Facet 25 b. Frequency-weighted Match Results When term frequency was taken into account, 31 926 entries of UBC Museum of Anthropology OBJECT NAME/OBJECT TYPE terms were matched against the AAT. Table 13 shows the results of this match. Table 13 AAT vs. UBC Museum of Anthropology OB/OT Data: Frequency-weighted Match Results Match Categories Number of Matches Percentage of Total 3Matches Matches Exact Phrase Matches 27416 / 31926 85.9% Exact Word Matches 2396 / 31926 7.5% Partial Matches 1303 / 31926 4.1% 811 / 31926 2.5% 29812 / 31926 93.4% Non-matches Total Exact Matches c. Commonly Used OT/OB Combination Match Results The 60 most commonly used terms (occurring 100 times or more) in the UBC Museum of Anthropology OBJECT NAME/OBJECT TYPE data are listed in Table 14. These terms, when their frequency was taken into account, totalled 20 821 entries, and made up 65.2% of the 31 926 entries in the UBC Museum of Anthropology OBJECT NAME/OBJECT TYPE data field. Note that none of the UBC Museum of Anthropology OB/OT terms matched the AAT as Exact Word Matches. 26 Table 14 AAT vs. UBC Museum of Anthropology OT/OB Combination Phrases Used 100 Times or More Match Freq. Name 265 arrow 318 bag 110 band 1760 basket 197 belt 168 blade 154 bottle 505 bowl 230 box 168 brooch 158 calendar 208 carving 202 club 2895 coin 228 container 105 cup 357 dish 414 doll 260 drawing 177 fan 1894 figure 276 hat 175 headdress 110 hook 120 jacket 238 jar 27 113 jug 221 knife 125 label 110 ladle 919 mask 209 mat 120 maul 316 necklace 426 ornament 101 paddle 278 painting 116 panel 943 paper 125 pipe 110 plaque 263 plate 417 point 116 pot 722 print 238 puppet 203 rattle 150 robe 134 sash 321 sculpture 108 sherd 208 skirt 191 spear 334 spoon 168 stencil - 128 stonecut print 647 textile 175 tool 28 183 vase 191 whistle Total 20 821 60 by frequency by unique occurrence ─────────────────────────────────────── Key to Match codes = Exact Phrase Match with the AAT - = Partial Match with the AAT (matching word is underlined) 29 Statistics for these 60 commonly used UBC Museum of Anthropology OBJECT NAME/OBJECT TYPE terms were generated for both frequency-weighted and unique occurrences. A match was made between the AAT and the 60 unique OBJECT NAME/OBJECT TYPE combination phrases that occurred 100 times or more in the UBC Museum of Anthropology data. The results of the match are shown in Table 15. Table 15 AAT vs. UBC Museum of Anthropology Commonly Used OB/OT Terms: Non-frequency-weighted Match Results Match Categories Number of Matches Exact Phrase Matches Percentage of Total 59 / 60 98.3 % Exact Word Matches 0 / 60 0 % Partial Matches 1 / 60 1.7 % Non-matches 0 / 60 0 % Total Exact Matches 59 / 60 98.3 % A match was made between the AAT and the 20 821 entries of commonly occurring (100 times or more) OBJECT NAME/OBJECT TYPE phrases from the UBC Museum of Anthropology data. The results of the match are shown in Table 16. Table 16 AAT vs. UBC Museum of Anthropology Commonly used OB/OT Terms: Frequency-weighted Match Results 30 Match Categories Number of Matches Percentage of Total Matches Exact Phrase Matches 20693 / 20821 99.4 % 0 / 20821 0% 128 / 20821 0.6 % 0 / 20821 0% 20693 / 20821 99.4 % Exact Word Matches Partial Matches Non-matches Total Exact Matches 31 2.2 Mismatches in Sample OB/OT Data: Contextual Errors The terms that the computer had identified as matches with the AAT were manually checked for mismatches in context. Mismatches or questionable matches in each museums' data were marked as such on the "Source Match Summary" printouts. The results were as follows: 1) for the New Brunswick Museum, approximately 5% (about 449 of the 8979 terms) were context mismatches, and a further 3% (about 289 of the 8979 terms) were questionable in context. 2) for the UBC Museum of Anthropology, approximately 6.5% (about 163 of the 2499 terrms) were context mismatches, and a further 2.4% (about 59 of the 2499 terms) were questionable in context. 3) for the Mendel Art Gallery, approximately 2.9% (about 5 of the 171 terms) were context mismatches. That the percentages of computer mismatches were quite high is misleading. There are many terms in the AAT with more than one meaning—e.g., tables (documents) and tables (support furniture)—but the computer program matched the CHIN term with the first AAT term it hit, and could not determine context. Therefore, although the context of the match the computer made might have been incorrect, there was usually a homonym in the AAT that did match in context. Many of the terms that the computer "mismatched," therefore, are actually present, in their correct context, in the AAT. In most cases, the correct AAT homonym was in the same facet, and had the same term designation as the mismatched homonym. Also, the mismatches tended to happen repeatedly on the same few terms, as shown in Table 17. The few AAT terms (such as disc, carrier, guard, head, kit, and gun) that did not have a correct context for the usual usage of the CHIN terms, were genuine computer mismatches. The CHIN terms that the computer program could not match with the AAT were also studied to determine their makeup. Terms did not match for a variety of reasons, including misspellings 32 (accessory), Canadian spellings (adze), missed spaces (linoblock), French terms (aiguillette), and vagueness (object, unknown). Most of the terms, however, were genuine candidates for addition to the AAT, either as main, alternate, or lead-in vocabulary. Some of the terms that were not found in the AAT were very commonly used by the CHIN museums. Some of the most promising candidate terms, those which were used over five times across the three databases, are shown in Table 18. There were also many infrequently used terms (occurring less than 5 times across the three databases) that are promising candidates for inclusion in the AAT. These include "cigarette", "cultivator", "funnel", "harrow", etc. 33 Table 17 Computer Mismatches: CHIN OB/OT Term and Correct/Incorrect AAT Contexts a CHIN term b AAT Match c (usually incorrect in context) miniature miniature (painting) V,a knight king AAT Match (correct context) miniature <Attributes and Properties> D,m Knight (chessman) V,a Knight (landholder) H,a King (chessman) V,a King (person) H,a queen Queen (chessman) V,a Queen (person) H,a cutter cutter (sleigh) V,a cutter (sailing vessel) V,a <cutter: woodworking tool> V,g <cutter: metalworking tool> V,g shell shell (boat) V,a shell <Material> M,m shell (ammunition) V,a button button (info. artifact) V,a gun Gun <Styles & Periods> F,m kit kit (fiddle) V,a table table (document) V,a stretcher stretcher (conveyance) V,a skull skull (helmet component) V,a head head (weapon component) V,a paddle paddle (ball game equ.) V,a hammer hammer (sports equip.) V,a button (fastener) V,a table (support furniture) V,a stretcher (framing and mounting equip.) V,a skull (skeleton component) M,m paddle (watercraft equipment) V,a hammer (tool) V,a hammer (firearm component) V,a stick stick (hockey equipment) V,a stick <object genres by form> V,a guard guard (weapon component) V,a set set (architect. element) V,a key key (text) V,a key (hardware) V,a representation representation (gov't function) K,u <Associated Concepts><Forms of Expression> B,m tie tie (concrete fastener) V,a ties (neckties) V,u set (group) V,a tie (fastener) V,a carrier Carrier <Styles and Periods> F,m mat mat (framing & mounting equip.) V,a mat (floor covering) V,a mat (furniture covering) V,a mask mask (transparency) V,a disc disc (foot component) V,a mask (costume) V,a board board (binding component) V,a board (flat object) V,a ────────────────────────────────────────────────────────── 34 a CHIN terms that are most frequently mismatched by the computer program b the AAT term that the computer matched the CHIN term with, including the AAT term's facet c the correct-context homonym that is found elsewhere in the AAT, including the AAT term's facet and designation (see b above). (V=Object, D=Physical Attributes, F=Styles and Periods, K=Activities, M=Materials, H=Agents, B=Associated Concepts) and designation (a=alternate, m=main, u=use for, g=guide) 35 Table 18 Commonly Used CHIN OB/OT Terms Which Do Not Appear in the AAT CHIN term Frequency in the three sample databases abrader 57 adze 44 arrowpoint 15 bailer 11 bellows (in two databases) 8 (in two databases bird 10 biface 22 bobbin 65 (in two databases) booklet 22 (in two databases) bundle 8 churinga 7 garment 66 gavel 6 haversack 8 headring 8 human 10 ikon 68 last 7 macehead 8 needlecase 8 part 8 picture 13 pounder 35 6 (in two databases) 92 rosary 7 sabretache 6 sampler shellcase shredder (in two databases) 372 pills ring (in two databases) 19 measure quirt (in two databases) (in two databases) 108 (in two databases) 9 11 sickle 6 sinker 35 (in two databases) (in two databases) 36 stirrup(s) 11 syringe 7 telescope 18 toggle 46 tumpline 12 unguentarium 10 washboard whip whorl(s) (in two databases) (in two databases) (in two databases) 7 22 (in two databases) 10 (in two databases) 37 3. CHIN MATERIAL Field vs. AAT Materials Facet The AAT's Materials facet contains terms which cover "a broad range of types of physical matter, including both natural and synthetic substances and ranging from raw materials to material products" (AHIP, 1994, p.74). As such, it is a possible source of terminology for the CHIN MATERIAL (MA) field of the Humanities National Inventory. A thesaurus has already been produced for the MATERIAL field in the 1993 CHIN publication, Standards for the Use of the Material (MA), Technique (MT), and Related Fields on CHIN Humanities Databases (Ewing, 1993). In order to determine the feasibility of using the AAT as a vocabulary source for the MATERIAL field, and to see how closely the AAT vocabulary corresponds with that presented by Ewing (which was primarily derived from the AAT), a sample of the contents of the MATERIAL field of the Humanities National Inventory as well as the vocabulary from Ewing were matched against a sample of vocabulary from the Materials facet of the AAT. The analysis was performed on a sample of the CHIN MATERIAL field contents because of the large number of records (2 123 061) in the National Inventory. The sample was selected from a printout by analysing 10 pages, skipping 10, etc., until approximately 11% of the terms (by frequency) were analysed. The match was accomplished by manually correlating an alphabetic listing of AAT Descriptors, Former Terms, UK Equivalents, UK Alternatives, Alternate Terms, and Use For terms from the Materials facet with 1) the vocabulary for MATERIALS (Ewing, 1993), and 2) a sample of the alphabetical frequency list of the MATERIAL field data. Match results were calculated in two ways: 1) based on the number of unique occurrences of terms (e.g., if wax appeared in the database 591 times, and matched the AAT, it counted as one match). This type of match was done for both the data from the National Inventory MATERIAL field and the vocabulary from Ewing. 38 2) based on frequency of the CHIN terms (e.g., terms were frequency-weighted so that if wax appeared in the database 591 times, and matched the AAT, it counted as 591 matches). This type of match was done for only the data from the National Inventory MATERIAL field. 39 3.1 MATERIAL Field Match Results 3.1.1 Non-frequency-weighted Match Results A match against the AAT was made on a sample of 3934 unique terms in the CHIN MATERIAL field (i.e., 4710 terms minus 776 French terms). The match results are shown in Table 19. Table 19 AAT Materials vs. Sample of CHIN MATERIAL Field Data: Non-frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Exact Phrase Matches 215 / 3934 5.5% Exact Word Matches 906 / 3934 23.0% 2172 / 3934 55.2% Non-matches 641 / 3934 16.3% Total Exact Matches 1186 / 3934 28.5% Partial Matches There were a few CHIN entries that were spelling or typographical errors, but that were still recognisable as AAT terms. When the spelling or typing errors were corrected, the MATERIAL non-frequency-weighted match rates were as follows: Exact Phrase Matches 6.9% Exact Word Matches 23.3% 40 Partial Matches 55.1% Non-matches 14.8% Total Exact Matches 30.2% 41 3.1.2 Frequency-weighted Match Results A match against the AAT was made on a sample of 165 781 entries from the MATERIAL field (i.e., 225 459 entries minus 59 678 French entries), including each occurrence of each term. The match results are shown in Table 20. Table 20 AAT Materials vs. Sample of CHIN MATERIAL Field Data: Frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Matches Matches Exact Phrase Matches 126365 / 165781 76.2% 126524/165781 126524/16578 Exact Word Matches 12299 / 165781 7.4% Partial Matches 22506 / 165781 13.6% 4611 / 165781 2.8% 138834 / 165781 83.6% Non-matches Total Exact Matches There were a few CHIN entries that were spelling or typographical errors, but that were still recognisable as AAT terms. When the spelling or typing errors were corrected, the MATERIAL frequency-weighted match rates were as follows: 42 Exact Phrase Matches 76.3% Exact Word Matches 7.4% Partial Matches 13.6% Non-matches 2.7% Total Exact Matches 83.7% 43 3.1.3 Ewing Vocabulary Match Results (Non-frequency-weighted) Based on the unique occurrence of terms, there were 399 unique vocabulary terms from Ewing (1993) matched against the AAT. The match results are shown in Table 21. Table 21 AAT Materials vs. Ewing Vocabulary: Non-frequency-weighted Match Results Match Categories Number of Matches Exact Phrase Matches Percentage of Total 83 / 399 20.8% Exact Word Matches 160 / 399 40.1% Partial Matches 139 / 399 34.8% Non-matches 17 / 399 4.2% Total Exact Matches 243 / 399 60.9% 3.1.4 Commonly Used MATERIAL Terms Match Results The following analysis of commonly used MATERIAL terms was based on all of the MATERIAL field data (not just the sample). The 349 most commonly used MATERIAL terms (occurring 500 times or more) in the CHIN Humanities National Inventory are shown in Table 22. These terms, when their 44 frequency was taken into account, totalled 1 820 879 entries, and made up 85.7% of the 2 123 061 entries in the Humanities National Inventory MA field. Terms which are exclusively French are marked with an “F” along the left margin. Some words are likely French words in the context of this database, but are counted as English because they are sometimes used in English as well. Some of the terms in this list matched terms in other hierarchies of the AAT, though they did not match in the Materials facet; the hierarchy that they were found in is marked on the list within parentheses. 45 Table 22 AAT Materials vs. CHIN MATERIAL Terms that Occurred 500 Times or More Lang. Match Freq. MATERIAL F X 3531 ACIER F X 3547 ADHESIF O 2779 ALBUMEN PHOTOGRAPHIC PAPER 726 ALUMINIUM 2850 ALUMINUM 5206 ANTLER X 4072 ARGENT 1146 ARGILLITE F F 594 BAKELITE 1499 BALL CLAY 1385 BAMBOO 1407 BARK O 1390 BARK, BIRCH O 1839 BARK, CEDAR O 518 1996 BASALT X 2205 BEAD X 4702 BEADS X 902 BIRCHBARK - 578 BIRD RIVER RHYOLITE ("River" - Settlem./Landscapes) BLACK CHERT ("Black" in Colors) - 1002 BARK, CHERRY (in Objects facet) (in Objects facet) F X 30895 BOIS F X 817 BOIS, PIN 61958 BONE - 600 BONE, BIRD - 6909 BONE, LAND-MAMMAL 46 F - 511 BONE, SEA-MAMMAL 18529 BRASS 1728 BROMIDE ? 7390 BRONZE 3181 BUCKSKIN 613 BURLAP 844 CANE 4123 CANVAS X 3727 CAOUTCHOUC F 30994 CARDBOARD X 7911 CARTON 1261 CAST IRON - 855 CATHEAD CHERT 531 CATLINITE 723 CEDAR O 1365 CEDAR BARK O 812 CEDAR ROOT 2475 CELLULOID 18049 CERAMIC 554 CERAMIC ? - 693 CERAMIC, CHINA - 590 CERAMIC, CREAMWARE - 2561 ("Sea" in Settlem.\Landscapes) ("Creamware" - Object Genres) CERAMIC, EARTHENWARE ("Earthenware"-Obj. Genres) O 528 CERAMIC, IRONSTONE O 2705 CERAMIC, PORCELAIN - 840 CERAMIC, STONEWARE - 682 CERAMIC: EARTHENWARE ("Earthenware"-Obj. Genres) - 542 CERAMIC:EARTHENWARE ("Earthenware" -Obj. Genres) F X 4438 CERAMIQUE F X 1202 CERAMIQUE, PORCELAINE 1942 CHALCEDONY 853 CHARCOAL 47 ("Stoneware" - Object Genres) F F F O 999 15656 CHERT - 6196 CHERT, RAMAH X 1755 CHINA X 518 CIRE 9395 CLAY 9609 CLOTH X 949 COLLE 812 COMPOSITION 7531 COPPER 2221 COPPER ALLOY 1458 CORD 2529 CORK X 3820 COTON 43107 COTTON 1130 COTTON ? O O 556 1421 CHERRY BARK COTTON CLOTH COTTON THREAD X 590 F X 3338 CUIR F X 2722 CUIVRE 946 DIORITE X 537 DRAP F F F F CRAYON (in Tools and Equipment) 5081 DYE X 27435 EARTHENWARE 1561 EGGSHELL X 1331 ELASTIC X 1623 EMAIL 4216 ENAMEL X 12515 ENCRE 598 X 1663 (in Object Genres) (in Attributes and Properties) ENGOBE ETAIN 48 11798 FABRIC O 594 FABRIC, COTTON X 1821 FAUNAL 2723 FEATHER X 2652 FEATHERS 3283 FELT F X 7121 FER F X 1082 FER FORGE 3804 FERROUS METAL X 703 15407 - 551 FIBRE SYNTHETIQUE O 506 FIBRE, CANVAS F F F FEUTRE FIBRE O 1167 FIBRE, CLOTH - 2379 FIBRE, COTON 11909 FIBRE, COTTON 623 FIBRE, COTTON O 511 FIBRE, FLAX (LINEN) F - F - 819 FIBRE, LIN O 695 FIBRE, LINEN O 980 FIBRE, NYLON O F (in Components and in Tools and Equipment) 1135 1128 FIBRE, LAINE FIBRE, POLYESTER O 948 FIBRE, RAYON O 3586 FIBRE, SILK - 1562 FIBRE, SOIE O 6044 FIBRE, WOOL 1093 FIBRE: COTTON O 638 FIBRE: LINEN O 966 FIBRE: SILK 2443 FUR X 1582 GILT (in Processes & Techniques) 49 F X GLACURE 80352 GLASS 1269 GLASS ? - 2685 GLASS BEADS - 2648 GLASS, CLEAR 905 ("Beads" in Objects facet) GLASS, OPAQUE - 1366 GLASS, TRANSLUCENT 3568 GLAZE X 791 - 1023 GLAZED PORCELAIN ("Glazed" - Processes & Techn.) X 1099 GLAZED STONEWARE 2215 GLUE 1870 GOLD ("Translucent"-Attrib.\Propert.) GLAZED EARTHENWARE ("Earthenware"-Obj.Genres) 608 2389 GRANITE 1734 GRASS - 2686 GREY CHERT 559 GREYWACKE F 3646 ("Stoneware" - Object Genres) GOURD 1431 HAIR 5566 HARDWOOD 553 ("Grey" in Colors) HEMP 3214 HIDE 1352 HORN 636 HORSEHAIR - 680 HUDSON BAY LOWLAND CHERT ("Lowland"-Styl/Per) 11996 INK 20788 IRON 710 IRON ? 731 IRONSTONE X 866 IVOIRE 7016 775 IVORY JADE 50 523 JASPER 2868 KAOLIN - 5738 KNIFE RIVER FLINT ("River"-Sett./Landsc.,"Knife"-Obj.) 2486 LACE 749 LACQUER F X 1976 LAINE F X 5119 LAITON 2407 LEAD 20026 LEATHER 671 LIMESTONE X 520 LIN 8490 LINEN X 9888 LITHIC 574 153006 METAL F - 2789 METAL, ACIER F O F F 670 MAHOGANY METAL, ALUMINIUM O 1363 METAL, ALUMINUM - 2714 METAL, ARGENT 719 METAL, BASE O 18330 METAL, BRASS O 5267 METAL, BRONZE O 5057 METAL, COPPER F - 3796 METAL, CUIVRE F - 6619 METAL, FER F - 930 METAL, FONTE - 655 METAL, GILT ("Gilt" - Processes and Techniques) O 821 METAL, GOLD O 13531 METAL, IRON F - 2515 METAL, LAITON O 1864 METAL, LEAD O 1420 METAL, NICKEL 51 O 514 O 6685 O O F 11219 2901 774 1114 METAL, PEWTER METAL, SILVER METAL, STEEL METAL, TIN METAL, TOLE METAL, WHITE O 868 METAL, ZINC - 500 METAL: COPPER ALLOYS O 827 METAL: LEAD ALLOY O 1423 METAL:BRONZE O 594 METAL:SILVER - 12203 METALLIC SALTS (SILVER) ("Metallic" - Styles/Periods) - 529 METALLIC THREAD ("Metallic" - Styles/Periods) O 9066 MINERAL, CHERT - 2880 MINERAL, MINERAL, RAMAH O 3769 MINERAL, QUARTZ - 759 MINERAL, QUARTZ VARIETY O 675 MINERAL, QUARTZITE - 20266 MINERAL, RAMAH CHERT O 1861 MINERAL,QUARTZ O 1159 MINERAL,QUARTZ CRYSTAL X 1838 MISCELLANEOUS 685 MIXED MEDIA 966 MOTHER-OF-PEARL - 805 MOTTLED CHERT ("Mottled" in Processes & Techniques) 1903 613 1326 NICKEL 2522 NYLON 1004 OAK 689 1221 NEPHRITE NET OBSIDIAN OCHRE 52 F X 527 OR 14110 PAINT 125290 O 880 F X 29552 F X 2576 PAPIER JOURNAL F - 1038 PAPIER LUSTRE F X 1258 PEAU F X 997 F X 6953 PEINTURE X 1571 PENCIL F F PAPER PAPER, CARDBOARD PAPIER PEAU, CUIR (in Tools and Equipment) X 507 PENCIL CRAYON - 622 PETRIFIED WOOD 613 PEWTER 6791 X 994 PHOTOGRAPHIC PAPER PIERRE 4071 PIGMENT X 921 PIN 907 PINE X 939 PLANT (in Settlements and Landscapes) 1778 PLASTER 22361 PLASTIC F X 4869 PLASTIQUE F X 779 F X 1908 PLATRE F X 1430 PLOMB 1201 POLYESTER 8422 PORCELAIN X 638 PORCELAINE X 655 POTTERY - 2298 F 16348 (both in Tools and Equipment) PLASTIQUE, VINYLE QTZ. CRYSTAL QUARTZ 53 (in Object Genres) O 1897 QUARTZ CRYSTAL 11268 QUARTZITE X 597 QUILLS 2244 RAWHIDE 2728 RAYON 655 O 1033 766 REED RESIN RESIN, RUBBER 1266 RHYOLITE 2074 RIBBON 660 ROOT O 554 ROOT, SPRUCE 786 ROPE 8510 RUBBER 2891 SATIN 1027 SEALSKIN - 677 SEED BEADS - 796 SELKIRK CHERT - 745 SEMI-PORCELAIN 4277 ("Beads" in Objects facet) SHELL - 653 SHELL, DENTALIUM O 620 SHELL: MOTHER OF PEARL 631 SHELLAC 22055 881 1336 SILTSTONE 11179 SILVER 992 SILVER PLATE X 649 SILVERPLATE 5297 SINEW 6542 SKIN O 6351 SKIN, LEATHER SILK SILK ? 54 O F 958 SKIN: LEATHER 1757 SLATE 1087 SOAPSTONE 1706 SOFTWOOD X 1470 SOIE 1149 SOIL O 742 SPRUCE ROOT 827 STAINLESS STEEL 1913 STEATITE 16231 STEEL 887 STEEL ? 1090 26816 O 3262 O 879 STERLING SILVER STONE STONE, CHERT STONE, JADE, NEPHRITE - 2772 STONE, METAMORPHIC, QUARTZITE - 7566 STONE, METAMORPHIC, SLATE O 616 STONE, QUARTZITE O 2536 STONE, RHYOLITE O 1631 STONE, SCHIST - 517 STONE, SEDIMENTARY, SILTSTONE - 1555 STONE, SILICEOUS, CHERT O 3964 STONE, SLATE O 2991 STONE, SOAPSTONE - 6694 STONE, VOLCANIC, BASALT - 2053 STONE, VOLCANIC, OBSIDIAN - 3809 STONE,SEDIMENTARY,SANDSTONE - 3294 STONE,SILICEOUS,CHALCEDONY - 10804 STONE,SILICEOUS,CHERT - 52240 STONE,VOLCANIC,BASALT X 3822 STONEWARE 1566 STRAW 55 (in Object Genres) F F 2901 STRING X 576 STROUD - 6755 SWAN RIVER CHERT X 3921 SYNTHETIC ("River" in Settlem./Landsc.) ("Synthetic" - Attributes and Propert.) - 595 SYNTHETIC, PLASTIC ("Synthetic"-Attributes and Prop.) - 732 SYNTHETIC: PLASTIC ("Synthetic"-Attributes and Prop.) 507 TAPE X 1035 TEINTURE 550 3526 TEXTILE 4986 THREAD 6852 TIN X 722 4454 TOOTH X 2901 UNKNOWN 539 TERRA COTTA TISSU VARNISH 3188 VELVET F X 2029 VERNIS F X 12848 VERRE 1004 VINYL 1658 WAX 3897 WHITE METAL 614 3225 WIRE 106666 WOOD O 717 O 1004 WOOD, CEDAR O 1351 WOOD, MAHOGANY WICKER WOOD, BIRCH O 724 WOOD, MAPLE O 999 WOOD, OAK O 2076 WOOD, PINE O 1129 WOOD, RED CEDAR 56 O 760 18013 WOOL 918 YARN Total 1 820 879 Total 349 WOOD, WALNUT by frequency by unique occurrence ─────────────────────────────────────── Key to Match codes = Exact Phrase Match with the AAT (phrase matches as a whole) O = Exact Word Match with the AAT (single words Match separately) - = Partial Match with the AAT (matching word underlined) X = Non-Match with the AAT Materials facet After the French terms were removed from the match, statistics were calculated for the commonly used English terms of the MA Field. Again, the statistics for commonly used terms are based on the entire MA field (not only the sample). A match against the AAT was made on a sample of 290 unique terms that were used 500 times or more in the MATERIAL field (i.e., 349 terms minus 59 French terms). The match results are shown in Table 23. 57 Table 23 AAT Materials vs. CHIN MATERIAL Terms Used 500 Times or More: Non-frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Matches Matches Matches Exact Phrase Matches 152 / 290 52.4% 67 / 290 23.1% Partial Matches 47 / 290 16.2% Non-matches 24 / 290 8.3% Total Exact Matches 219 / 290 75.5% Exact Word Matches MatchMatch (word)es The above statistics are based on a match of CHIN MATERIAL terms with terms from the Materials facet of the AAT. If these CHIN terms had been matched against the entire AAT, the results would have been as follows: Exact Phrase Matches 56.6% Exact Word Matches 30.0% Partial Matches 10.3% Non-matches 3.1% Total Exact Matches 86.6% 58 A match against the AAT was made on a sample of 1 611 498 entries of terms that were used 500 times or more in the MATERIAL field (i.e., 1 820 879 entries minus 209 381 French entries), including each occurrence of each term. The match results are shown in Table 24. 59 Table 24 AAT Materials vs. CHIN MATERIAL Terms Used 500 Times or More: Frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Matches Matches Exact Phrase Matches 1194314 / 1611498 74.1% Exact Word Matches 158296 / 1611498 9.9% Partial Matches 183159 / 1611498 11.4% Non-matches 74729 / 1611498 4.6% 1353610 / 1611498 83.9% Total Exact Matches The above statistics are based on the match of the CHIN MATERIAL terms with terms from the Materials facet of the AAT. When these CHIN terms were matched against the entire AAT, the results were as follows: Exact Phrase Matches 77.3% Exact Word Matches 11.5% Partial Matches 9.9% Non-matches 1.3% Total Exact Matches 88.8% 60 4. CHIN TECHNIQUE Field vs. AAT Processes/Techniques Hierarchy The AAT's Processes and Techniques hierarchy contains "descriptors for actions and methods performed physically on or with materials and objects and for processes occurring in materials and objects" (AHIP, 1994, Vol.2, p.42). As such, it corresponds directly to, and is a possible source of terminology for, the CHIN TECHNIQUE (MT) field of the Humanities National Inventory, which "indicates the processes, methods, or techniques used to manufacture the item" (CHIN, 1993, p. 106). A thesaurus has already been produced for the TECHNIQUE field in Standards for the Use of the Material (MA), Technique (MT), and Related Fields on CHIN Humanities Databases (Ewing, 1993). In order to determine the feasibility of using the AAT as a vocabulary source for the TECHNIQUE field, and to see how closely the AAT vocabulary corresponds with that presented by Ewing (which was primarily derived from the AAT), a sample of the contents of the TECHNIQUE field of the Humanities National Inventory as well as the vocabulary from Ewing were matched with the entire vocabulary from the Processes and Techniques hierarchy of the AAT. The analysis was performed on a sample of the CHIN TECHNIQUE field contents, because of the large number of records (845 570). The sample was selected by analysing 20 pages, skipping 20 pages, etc., until approximately 35% of the terms (by frequency) were analysed. The match was accomplished by manually correlating an alphabetic listing of AAT Descriptors, Former Terms, UK Equivalents, UK Alternatives, Alternate Terms, and Use For terms from the Processes and Techniques hierarchy with: 1) the vocabulary for TECHNIQUE from Ewing (1993) and 2) a sample of the alphabetical frequency list from the TECHNIQUE field. Match results were calculated in two ways: 1) based on the number of unique occurrences of terms (e.g., if carved appeared in the 61 database 591 times, and matched the AAT, it counted as one match). This type of match was done for both the data from the National Inventory TECHNIQUE field and the vocabulary from Ewing. 2) based on frequency of the CHIN terms (e.g., terms are frequency-weighted so that if carved appeared in the database 591 times, and matched the AAT, it counted as 591 matches). This type of match was done for only the data from the National Inventory TECHNIQUE field. 62 4.1 TECHNIQUE Field Match Results 4.1.1 Non-frequency-weighted Match Results A match against the AAT was made on a sample of 3970 unique terms in the CHIN TECHNIQUE field (i.e., 4303 terms minus 333 French terms). The match results are shown in Table 25. Table 25 AAT Processes & Techniques vs. CHIN TECHNIQUE Terms: Non-frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Matches Matches Exact Phrase Matches 195 / 3970 4.9% Exact Word Matches 199 / 3970 5.0% Partial Match 1990 / 3970 50.1% Non-matches 1586 / 3970 40.0% 394 / 3970 9.9% Total Exact Matches There were a very few CHIN entries that were spelling or typographical errors, but that were still recognisable as AAT terms. When the spelling or typing errors were corrected, the TECHNIQUE non-frequency-weighted match rates were as follows: Exact Phrase Matches 6.0% 63 Exact Word Matches 5.1% Partial Matches 50.1% Non-matches 38.8% Total Exact Matches 11.1% 64 4.1.2 Frequency-weighted Match Results A match against the AAT was made on a sample of 242 040 entries from the TECHNIQUES field (i.e., 286 229 entries minus 44 189 French entries), including each occurrence of each term. The match results are shown in Table 26. Table 26 AAT Processes & Techniques vs. CHIN TECHNIQUE Terms: Frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Matches Matches Exact Phrase Matches 192663 / 242040 79.6% Exact Word Matches 1385 / 242040 0.6% Partial Matches 9830 / 242040 4.1% Non-matches 38162 / 242040 15.7% Total Exact Matches 194048 / 242040 80.2% There were a very few CHIN entries that were spelling or typographical errors, but that were still recognisable as AAT terms. When the spelling or typing errors were corrected, the TECHNIQUE non-frequency-weighted match rates did not change significantly. 65 4.1.3 Ewing Vocabulary Match Results (Non-frequency-weighted) Based on the unique occurrence of terms, it was found that there were 147 unique TECHNIQUE terms from Ewing (1993); these were matched against the AAT. The match results are shown in Table 27. Table 27 AAT Processes and Techniques vs. Ewing Vocabulary TECHNIQUE Terms: Non-frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Matches Matches Exact Phrase Matches 84 / 147 57.1% 1 / 147 0.7% Partial Matches 31 / 147 21.1% Non-matches 31 / 147 21.1% Total Exact Matches 85 / 147 57.8% Exact Word Matches Of the TECHNIQUE vocabulary terms from Ewing, the 16 that were counted as Partial or Nonmatches would have been Exact Matches if they had been in a different grammatical form. 66 Because the AAT's Processes and Techniques terms are often in the gerund form, adjectival terms such as Netted and Gilded, Oil in the Ewing vocabulary did not match the AAT's Netting and Gilding, Oil. When allowances were made for grammatical form, there was an increase in the Total Exact Matches to 62%. 67 4.1.4 Commonly Used TECHNIQUE Terms Match Results The 96 most commonly used terms (occurring 1000 times or more) in the CHIN MATERIAL (MT) field are listed in Table 28. These terms, when their frequency was taken into account, totalled 513 530 entries, and made up 60.7% of the 845 570 entries in the Humanities National Inventory MATERIAL field. Exclusively French terms were marked with an “F” along the left margin. Some words, such as assemble and impression, are likely French words in the context of this database, but were counted as English because they are sometimes used in English as well. Some of the terms in this list matched terms in other hierarchies of the AAT, although they did not match in the Processes and Techniques hierarchy; the hierarchy in which they were found is marked on the list within parentheses. Table 28 AAT Processes & Techniques vs. Commonly Used CHIN TECHNIQUE Terms Lang. F F Match Freq. TECHNIQUE Term 7405 ABRADED - 1749 ABRADED, ENTIRE X 10856 ARTISANAL X 9521 ASSEMBLE 2997 BEADED 1188 BENT X 1495 BLADE-CORE ("Blade" - Tools and Equipment) X 1890 BLOWN, MOULD X 1828 BOUND 1202 BRAIDED 1516 BROCADED 22103 CARVED 12010 CAST 6220 CHIPPED X 1486 CISELE 68 ("Mould" - Tools and Equipment) F F F F F F F F X 1521 CLOUE 1825 COILED X 2052 COLLE X 1708 COMMERCIAL (in Information Forms) X 1037 CONTACT PRINT (in Objects facet) X 1093 CONTINUOUS (in Attributes and Properties) X 3559 COUSU 1564 CROCHETED X 1015 CUIT 13276 CUT X 6601 DECOUPE X 8190 DESSINE X 1007 DISCONTINUOUS X 1009 DOMESTIQUE 7673 DRAWN X 1059 DRILL 3413 DRILLED 3281 DYED X 6240 EAU-FORTE 9045 EMBROIDERED 1552 ENAMELLED 1746 ENGRAVED 2957 ETCHING X 2095 EXPANDING 27564 FIRED 74973 FLAKED (in Attributes and Properties) (in Objects facet) - 1160 FLAKED,BIFACIALLY X 2428 FORGE 2319 FORGED 5835 GLAZED 1990 GLUED 69 F X 1256 GRAVE F - 1338 GRAVURE SUR 15043 GROUND 1142 HAMMERED - 1055 HAND ASSEMBLED - 2020 HAND SEWN X 2459 HAND-WROUGHT ("Wrought" in Attributes Properties) F X 6089 HANDMADE (in Attributes and Properties) - 1540 HANDSEWN - 4111 HANDWRITTEN X 1960 HOLLOW-CANE X 2252 HOMEMADE X 1080 IMBRICATED X 1006 IMPRESSION X 6069 IMPRIME 2242 INCISED (in Attributes and Properties) F X 12624 F X 1456 INDUSTRIELLE 1600 INTAGLIO 1001 KNITTED X 1660 LACE 1530 LAYERED X 4145 LITHOGRAPH (in Objects facet) X 2757 LITHOGRAPHIE 1227 LITHOGRAPHY - 3646 MACHINE SEWN X 4075 MANUFACTURE X 1661 MANUFACTURED X 2053 MARTELE 1941 MARVERED F F INDUSTRIEL (in Materials facet) 70 ("Machine" in Objects facet) and F X 3725 MELTON 30222 MOLDED 8639 MOULDED X 7434 MOULE X 5900 N 1600 NAILED X F F 1723 PAINTED PECKED X 11362 1119 PHOTOENGRAVING X 1398 PHOTOGRAPHIE 1036 PHOTOGRAVURE 12938 PEINT PHOTOLITHOGRAPHY X 3500 PHOTOMECHANICAL 1761 PIERCED 4108 POLISHED 6149 PRESSED F 12012 11090 PRINTED 1224 PROJECTION X 1374 RELIE 1075 RELIEF 5220 RETOUCHED X 2630 ROLLED 71 ─────────────────────────────────────── Key to Match Codes = exact Match with the AAT Processes and Techniques hierarchy - = Partial Match with the AAT (matching word underlined) X = Non-Match with the AAT Processes and Techniques hierarchy After the French terms were removed from the match, statistics were calculated for the commonly used terms of the MT Field. A match against the AAT was made on a sample of 75 unique terms that were used more than 999 times in the TECHNIQUE field (i.e., 96 terms minus 21 French terms). The match results are shown in Table 29. 72 Table 29 AAT Processes and Techniques vs. Commonly used CHIN TECHNIQUE Terms: Non-frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Exact Matches 45 / 75 60.0% Partial Matches 7 / 75 9.3% 23 / 75 30.7% 45 / 75 60.0% Non-matches Total Exact Matches The above statistics were based on the match of the CHIN TECHNIQUE terms with only the Processes and Techniques hierarchy of the AAT. When these terms were matched against the entire AAT, the results were as follows: Exact Matches 73.3% Partial Matches 12.0% Non-matches 14.7% A match against the AAT was made on a sample of 430 627 entries of terms that were used more than 999 times in the TECHNIQUE field (i.e., 513 530 entries minus 82 903 French entries), including each occurrence of each term. The match results are shown in Table 30. 73 Table 30 AAT Processes and Techniques vs. Commonly Used CHIN TECHNIQUE Terms: Frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Matches Matches Exact Matches 337573 / 430627 78.3% 15281 / 430627 3.5% Non-matches 77773 / 430627 18.0% Total Exact Matches 337573 / 430627 78.3% Partial Matches The above statistics were based on the match of the CHIN TECHNIQUE terms with only the Processes and Techniques hierarchy of the AAT. When these terms were matched against the entire AAT, the results were as follows: Exact Matches 83.9% Partial Matches 4.1% Non-matches 12.0% 74 5. CHIN CULTURE and SCHOOL/STYLE Fields vs. AAT Styles and Periods Facet The AAT's Styles and Periods facet contains "the names of art and architecture styles, historical periods, and art movements. Names of peoples, cultures, individuals, and sites are included only if they designate distinct styles or periods" (AHIP, 1994, Vol.1, p. 336). As such, it is a possible source of terminology for several CHIN fields, such as SCHOOL/STYLE (SA), PERIOD DESIGNATION (PER), and CULTURE (CU). The SCHOOL/STYLE field, which contains information about school and/or stylistic associations, and the CULTURE field, which is used to identify culture based on social/geographic origin, are fields in the National Inventory. This study does not deal with the Period Designation field data, although the PER field could be matched against the AAT at a later date. A thesaurus has already been produced for the CULTURE field in Standards and Terminology for the Recording of CULTURE in the Humanities Data Dictionary (Jewett, 1985). However, no guidelines have been produced for the vocabulary of the SCHOOL/STYLE field. In order to determine the feasibility of using the AAT as a vocabulary source for the SCHOOL/STYLE and CULTURE fields, the contents of these two CHIN fields, as well as the CHIN thesaurus for CULTURE terms (Jewett, 1985), were matched with the entire vocabulary of the Styles and Periods facet of the AAT. This was accomplished by manually correlating an alphabetic listing of AAT Descriptors, Former Terms, UK Equivalents, UK Alternatives, Alternate Terms, and Use For terms from the Styles and Periods facet with: 1) alphabetical frequency lists of the CHIN CULTURE and SCHOOL/STYLE fields and 2) the thesaurus for culture terms from Jewett (1985). Match results were calculated in two ways: 1) based on the number of unique occurrences of terms (e.g., if Norwegian appearred in the 75 database 591 times and matched the AAT, it counted as one match). 2) based on frequency of the CHIN terms (e.g., terms are frequency-weighted so that if Norwegian appearred in the database 591 times and matched the AAT, it counted as 591 matches). For the SCHOOL/STYLE field, the match categories were calculated using the Match Results Definitions (see Appendix), except that no differentiation was made between the Exact Word Match and the Exact Phrase Match because word matches happened so infrequently. Matches were categorized as Exact Matches (which included both Exact Word Matches and Exact Phrase Matches), Partial Matches, and Non-matches. Again, French terms were not included in the match. In general, if a CHIN term exactly matched an AAT term, it was counted as a match; if it did not match an AAT term, it was counted as a Non-match. Spelling or typing errors were an exception to this rule; if a word was obviously misspelled and its correct form was evident, it was counted as a match if its correct form matched the AAT. Outside of these general rules, there were some interesting problems with the CULTURE match. Sometimes terms matched the AAT in meaning, but the terms differed grammatically. For example, Guatemala was in the CHIN databases, but the AAT term was Guatemalan. In these cases, Guatemala was counted as a Non-match. Another example was the CHIN entry Greenland Inuit, contrasted with the AAT vocabulary, Greenland Eskimo. The meaning is the same, but the terminology is not. In this case, the CHIN phrase Greenland Inuit does not match the AAT phrase, Greenland Eskimo. The AAT does not have the word Greenland as a stand-alone term, but it does have Inuit. Therefore, the CHIN term Greenland Inuit was counted as a Partial Match because one of the two words, Inuit, matched. Another problem arose when the CHIN term had a qualifier and the AAT term did not, or visa versa. For example, a CHIN entry such as Abenaki (St Francis) was considered to be a Partial Match with the AAT term Abenaki, because one of the words matched. However, the CHIN term Massim counted as a Non-match when paired with the AAT's term, Massim Area because the CHIN term Massim did not exist as a stand-alone term in the AAT. 76 5.1 CULTURE Field Match Results 5.1.1 Non-frequency-weighted Match Results A match against the AAT was made on a sample of 4787 unique terms in the CULTURE field (i.e., 5297 terms minus 510 French terms). The match results are shown in Table 31. Table 31 AAT Styles & Periods vs. CHIN CULTURE Data: Non-frequency-weighted Match Results Match Categories Exact Phrase Matches Number of Matches Percentage of Total 1208 / 4787 25.2% Exact Word Matches 280 / 4787 5.8% Partial Matches 981 / 4787 20.5% 2318 / 4787 48.4% 1488 / 4787 31.0% Non-matches Total Exact Matches There were a few CHIN entries that were spelling or typographical errors, but that were still recognisable as AAT terms. For example, the CHIN entry Candian was a typographical error that would match with the AAT term Canadian if corrected. Also, some of the CHIN terms were in a different grammatical form than AAT terminology; for example, CHIN's entry, Belgium would match the AAT's term, Belgian, if the form were changed. When the spelling or typing errors were corrected and allowances were made for differences in grammatical form, the CULTURE non- 77 frequency-weighted match rates were as follows: Exact Phrase Matches 27.0% Exact Word Matches 5.8% Partial Matches 27.0% Non-matches 40.2% Total Exact Matches 32.8% 78 5.1.2 Frequency-weighted Match Results A match against the AAT was made on a sample of 826 829 entries from the CHIN CULTURE field (i.e., 902 419 entries minus 75 590 French entries), including each occurrence of each term. The results of this match are shown in Table 32. Table 32 AAT Styles and Periods vs. CHIN CULTURE Data: Frequency-weighted Match Results Match Categories Exact Phrase Matches Exact Word Matches Partial Matches Non-matches Total Exact Matches Number of Matches Percentage of Total 475007 / 826829 57.5% 16825 / 826829 2% 114365 / 826829 13.8% 220632 / 826829 26.7% 491832 / 826829 59.5% There were a few CHIN entries that were spelling or typographical errors, but that were still recognisable as AAT terms. Also, some of the CHIN terms were in a different grammatical form than AAT terminology. When the spelling or typing errors were corrected and allowances were made for differences in grammatical form, the CULTURE frequency-weighted match rates were as follows: Exact Phrase Matches 57.5% Exact Word Matches 2.0% 79 Partial Matches Non-matches Total Exact Matches 14.9% 25.5% 59.5% 80 5.1.3 CHIN CULTURE Thesaurus Match Results (Non-frequency-weighted) The CHIN Culture Thesaurus (Jewett, 1985) contained 299 terms, 268 of which were preferred terms. When the non-preferred vocabulary was included in the match, the results were as shown in Table 33. Table 33 AAT Styles & Per. vs. 1985 CHIN Culture Thesaurus Preferred/Non-preferred Terms: Non-frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Matches Matches Exact Phrase Matches 183 / 299 61.2% Exact Word Matches 10 / 299 3.3% Partial Matches 29 / 299 9.7% Non-matches 77 / 299 25.7% Total Exact Matches 193 / 299 64.5% When the non-preferred vocabulary from the CHIN Culture Thesaurus was removed from the match, 268 terms remained. The results of the AAT Match are shown in Table 34. 81 Table 34 AAT Styles and Periods vs. 1985 CHIN Culture Thesaurus Preferred Terms Only: Non-frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Matches Matches Exact Phrase Matches 170 / 268 63.4% 8 / 268 3% Partial Matches 25 / 268 9.3% Non-matches 65 / 268 24.2% Total Exact Matches 178 / 268 66.4% Exact Word Matches Note the similarity between the percentages of the CHIN Culture Thesaurus (Jewett, 1985) and the non-frequency-weighted results from the CHIN CULTURE field. This similarity may be due to the fact that the Culture Thesaurus was derived from terms in the CHIN databases. 5.1.4 Commonly Used CULTURE Terms Match Results The 89 most commonly used terms (occurring 1000 times or more) in the CHIN CULTURE field are listed in Table 35. These terms, when their frequency was taken into account, totalled 766 249 entries, and made up 84.9% of the 902 419 entries (including French terms) in the Humanities National Inventory CULTURE field. Terms that are exclusively French are marked with an “F” in the Language column. Some of the terms in this list matched terms in other hierarchies of the AAT, though they did not match in the Styles and Periods facet; the hierarchy in which they were found is marked on the list within parentheses. Table 35 82 AAT Styles and Periods vs. CHIN Commonly Used CULTURE Terms Lang. Match Freq. CULTURE 2057 AFRICAN - 1232 ALASKAN ESKIMO F X 1762 AMERICAIN F X 1881 AMERICAINE 7121 AMERICAN F X 1092 ANGLAIS F X 1830 ANGLAISE 2100 ARCHAIC 4892 ASIAN 9135 ATHAPASKAN 2157 BLACKFOOT 1346 BLACKFOOT ? 2674 BLOOD 1331 BLOOD ? X 2429 BRITANNIQUE F 12075 BRITISH - 1098 BRITISH ISLES ("Isles" - Settlements&Landscapes) 42662 CANADIAN 7459 CANADIAN ? F X 9299 CANADIEN F X 21148 CANADIENNE - 2676 CENTRAL ESKIMO ("Central" - Attributes&Prop.) 12829 CHINESE 64268 COAST SALISH 2356 COPPER ESKIMO 4421 CREE - 23842 5196 DORSET DORSET, MIDDLE 83 - 2104 EARLY DORSET 3882 10771 EGYPTIAN ENGLISH 1190 ENGLISH ? O 6828 ENGLISH-CANADIAN 1769 ESKIMO 3692 EUROPEAN X 2154 EUROPEEN X 1752 EXPORT X 1009 FOLK CULTURE ("Culture" - Associated Concepts) F X 2305 FRANCAIS F X 4604 FRANCAISE 4596 FRENCH O 4417 FRENCH-CANADIAN 3744 GERMAN 1322 GREEK X 3724 GROSWATER 6181 HAIDA X 1647 HUTTERITE 5907 INDIAN F - F F 61082 INTERIOR SALISH ("Interior"- Attributes & Prop.) - 1026 INTERM INDIAN 105945 INUIT 1527 IROQUOIS X 1063 ISLAMIC 1929 ITALIAN X 1923 ITALIEN 9346 JAPANESE X 4214 JAPONAISE X 8305 KOOTENAY 1076 KOREAN 4503 KWAKWAKA'WAKW 84 (in Associated Concepts) F Total 4168 LABRADOR ESKIMO X 17675 LATE WOODLAND - 11225 MARITIME ARCHAIC X 1650 MI'KMAQ X 2667 MIDDLE WOODLAND - 2205 NEO-ESKIMO 1361 NOOTKA 5775 NORTH AMERICAN X 8097 NORTHERN WAKASHAN X 4175 NORTHWEST COAST X 8105 NUU-CHAH-NULTH 3478 OJIBWA - 4680 PALEO-ESKIMO - 4933 PALEOESKIMO - 1261 PEIGAN, SOUTH 2126 PLAINS CREE X 1531 PLATEAU GENERAL ("Plateau"- Settlem.\Landsc.) X 51420 POSTCONTACT 4314 PRE-DORSET X 68140 PRECONTACT X 2111 QUEBECOISE 1500 ROMAN 2389 RUSSIAN 1091 SCOTTISH 2700 THULE 3739 TSIMSHIAN X 4463 UKRAINIAN X 5636 UNKNOWN X 5729 Y 766 249 by frequency 89 by unique occurrence 85 Key to Match Codes = Exact Phrase Match (words Match as an entire phrase) O = Exact Word Match (words Match separately) - = Partial Match (matching word underlined) X = Non-Match with the AAT Styles and Periods facet 86 After the French terms were removed from the match, statistics were calculated for the commonly used terms of the CULTURE field. A match against the AAT was made on a sample of 76 unique terms that were used 1000 times or more in the CULTURE field (i.e., 89 terms minus 13 French terms). Match results are shown in Table 36. Table 36 AAT Styles and Periods vs. Commonly used CHIN CULTURE Terms: Non-frequency-weighted Match Results Match Categories Number of Matches Exact Phrase Matches Percentage of Total 44 / 76 57.8 % 2 / 76 2.6 % Partial Matches 12 / 76 15.8 % Non-matches 18 / 76 23.8 % 46 / 76 60.4 % Exact Word Matches Total Exact Matches The above statistics are based on the match of the CHIN CULTURE terms with only the Styles and Periods facet of the AAT. When these terms were matched against the entire AAT, the results were as follows: Exact Phrase Matches 59.2% Exact Word Matches Partial Matches Non-matches 6.6% 14.5% 19.7% 87 Total Exact Matches 65.8% A match against the AAT was made on a sample of 709 497 entries occurring 1000 times or more in the CULTURE field (i.e., 766 249 entries minus 56 752 French entries), including each occurrence of each term. Match results are shown in Table 37. 88 Table 37 AAT Styles and Periods vs. Commonly Used CHIN CULTURE Terms: Frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Matches Matches Exact Phrase Matches 402746 / 709497 56.7 % Exact Word Matches 11245 / 709497 1.6 % Partial Matches 98718 / 709497 13.9 % 196788 / 709497 27.7 % 413991 / 709497 58.3 % Non-matches Total Exact Matches The above statistics are based on the match of the CHIN CULTURE terms with only the Styles and Periods facet of the AAT. When these terms were matched against the entire AAT, the results were as follows: Exact Phrase Matches 56.9% Exact Word Matches Partial Matches 10.7% 4.6% Non-matches 27.2% Total Exact Matches 67.6% 89 5.2 SCHOOL/STYLE Field Match Results 5.2.1 Non-frequency-weighted Match Results A match against the AAT was made on a sample of 1578 unique terms in the SCHOOL/STYLE field (i.e., 1807 terms minus 229 French terms). The match results are shown in Table 38. Table 38 AAT Styles and Periods vs. CHIN SCHOOL/STYLE Field Data: Non-frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Exact Matches 343 / 1578 21.7% Partial Matches 332 / 1578 21.0% Non-matches 903 / 1578 57.2% 343 / 1578 21.7% Total Exact Matches There were a few CHIN entries that were spelling or typographical errors, but that were still recognisable as AAT terms. For example, the CHIN entry, absract, is a typographical error, but would match with the AAT term, Abstract, if corrected. Also, some of the CHIN terms were in a different grammatical form than AAT terminology; for example, CHIN's entry, Belgium, would match 90 the AAT's term, Belgian, if the form were changed. When the spelling or typing errors were corrected and allowances were made for the differences in grammatical form, the SCHOOL/STYLE non-frequency-weighted match rates were as follows: Exact Matches 22.2% Partial Matches 24.5% Non-matches 53.3% 91 5.2.2 Frequency-weighted Match Results A match against the AAT was made on a sample of 82 240 entries from the SCHOOL/STYLE field (i.e., 88 445 entries minus 6205 French entries), including each occurrence of each term. The match results are shown in Table 39. Table 39 AAT Styles and Periods vs. CHIN SCHOOL/STYLE Field Data: Frequency-Weighted Match Results Match Categories Number of Matches Percentage of Total Exact Matches 74633 / 82240 90.7% Partial Matches 2324 / 82240 2.8% Non-matches 5283 / 82240 6.4% Total Exact Matches 74633 / 82240 90.7% There were a few CHIN entries that were spelling or typographical errors, but that were still recognisable as AAT terms. Also, some of the CHIN terms were in a different grammatical form than AAT terminology. When the spelling or typing errors were corrected and allowance was made for differences in grammatical form, the SCHOOL/STYLE frequency-weighted match rates were as follows: 92 Exact Matches 90.7% Partial Matches 2.8% Non-matches 6.4% 93 5.2.3 Commonly Used SCHOOL/STYLE Terms Match Results The 40 most commonly used terms (occurring 100 times or more) in the CHIN SCHOOL/STYLE field are listed in Table 40. These terms, when their frequency was taken into account, totalled 79 685 entries, and made up 90% of the 88 445 entries (including French terms) in the Humanities National Inventory SCHOOL/STYLE field. Terms that are exclusively French are marked with an “F” in the Language column. Many of the terms, though not included in the Styles and Periods facet of the AAT, are found elsewhere in the AAT; the AAT hierarchy in which these terms were found is marked on the list within parentheses. Table 40 AAT Styles and Periods vs. Commonly Used CHIN SCHOOL/STYLE Terms Lang. Match Freq. 592 10095 ABSTRACT AMERICAN 201 AMERICAN ? 161 ART DECO 274 ART NOUVEAU 147 AUSTRIAN 8818 BRITISH - 577 BRITISH (ENGLAND) - 373 BRITISH (SCOTLAND) 196 BRITISH ? F SCHOOL/STYLE X 31923 275 4201 CANADIAN CANADIAN ? CANADIENNE 490 CHINESE 131 CLASSICAL 94 X F F Total 122 1199 DELFTWARE (in Object Genres) DUTCH X 166 EXPORT WARE (in Object Genres) X 298 FAIENCE (in Object Genres) 405 FLEMISH X 138 FOLK ART (SASKATCHEWAN) (in Associated Concepts) X 394 FRANCAISE 9112 FRENCH 2322 GERMAN X 194 GROUP OF SEVEN 216 INDIAN 865 INUIT 1676 X 198 ITALIENNE 887 JAPANESE X 210 MINGEI 855 RUSSIAN X 152 SOSAKU HANGA 567 SPANISH 518 SWISS 142 TIBETAN 110 UKIYO-E - 118 UKIYO-E SCHOOL ("School" in Associated Concepts) - 119 UKIYO-E STYLE ("Style" in Associated Concepts) X 248 UNKNOWN 79 685 40 ITALIAN by frequency by unique occurrence ─────────────────────────────────────── Key to Match Codes 95 = Exact Phrase Match (words Match as an entire phrase) with the AAT Styles and Period facet - = X = Partial Match (matching word underlined) Non-Match with the AAT Styles and Periods facet 96 After the French terms were removed from the match, statistics were calculated for the commonly used terms of the SCHOOL/STYLE field. A match against the AAT was made on a sample of 37 terms that were used 100 times or more in the SCHOOL/STYLE field (i.e., 40 terms minus 3 French terms). The match results are shown in Table 41. Table 41 AAT Styles and Periods vs. Commonly Used SCHOOL/STYLE Terms: Non-frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Matches Matches Exact Phrase Matches 25 / 37 67.5 % Exact Word Matches 0 / 37 0% Partial Matches 4 / 37 10.8 % Non-matches 8 / 37 21.6 % Total Exact Matches 25 / 37 67.5 % The above statistics are based on the match of the CHIN SCHOOL/STYLE terms with only the Styles and Periods facet of the AAT. If these terms had been matched against the entire AAT, the results would have been as follows: Exact Phrase Matches 73.0% 97 Exact Word Matches 5.4% Partial Matches 10.8% Non-matches 10.8% Total Exact Matches 78.4% 98 A match against the AAT was made on a sample of 74 892 entries occurring 100 times or more in the SCHOOL/STYLE field (i.e., 79 685 entries minus 4793 French entries), including each occurrence of each term. The match results are shown in Table 42. Table 42 AAT Styles and Periods vs. Commonly Used SCHOOL/STYLE Terms: Frequency-weighted Match Results Match Categories Number of Matches Percentage of Total Matches Matches Exact Phrase Matches 72177 / 74892 96.3 % 0 / 74892 0% Partial Matches 1187 / 74892 1.6 % Non-matches 1528 / 74892 2.0 % Total Exact Matches 72177 / 74892 96.3 % Exact Word Matches The above statistics are based on the match of the CHIN SCHOOL/STYLE terms with only the Styles and Periods facet of the AAT. When these terms were matched against the entire AAT, the results were follows: Exact Phrase Matches Exact Word Matches 97.0% 0.3% 99 Partial Matches Non-matches Total Exact Matches 1.7% 1.0% 97.3% 100 6. Summary of Match Results The frequency-weighted Exact Match rates between the AAT vocabulary and the OBJECT NAME/OBJECT TYPE vocabulary of the three sample museums were as follows: 1) Mendel Art Gallery - 68.4% match of all Mendel Art Gallery data with the AAT. - 64.3% match of the commonly used Mendel Art Gallery terms (these commonly used terms occurred 100 times or more and made up 85.8% of the Mendel Art Gallery data). Note: These rates are low because the three most commonly used terms were abbreviated in the sample data, and therefore did not match the AAT. When these three terms were expanded, the frequency-weighted match rate for the Mendel Art Gallery data was 99.0%. 2) New Brunswick Museum - 93.7% match of all New Brunswick Museum data with the AAT. - 99.0% match of the commonly used New Brunswick Museum terms (these commonly used terms occurred 200 times or more and made up 58.3% of the New Brunswick Museum data). 3) UBC Museum of Anthropology - 93.4% match of all UBC Museum of Anthropology data with the AAT. - 99.4% match of the commonly used UBC Museum of Anthropology terms (these commonly used terms occurred 100 times or more and made up 65.2% of the UBC Museum of Anthropology data). The data from these three institutions made up a statistical sample of the data in the Humanities 101 National Inventory, and the sample was representative of Canadian history, art, and ethnology institutions. Therefore, there is good reason to believe that object name data of the three sample institutions are representative of the English object names in the Humanities National Inventory. 102 The frequency-weighted Exact Match rates between the AAT vocabulary and the SCHOOL/STYLE, CULTURE, TECHNIQUE, and MATERIALS fields of the CHIN Humanities National Inventory were as follows: 1) SCHOOL/STYLE Field - 90.7% match of all Humanities National Inventory SCHOOL/STYLE field English data with the Styles and Periods facet of the AAT. - 96.3% match of the AAT Styles and Periods facet with the commonly used SCHOOL/STYLE terms (these commonly used terms occurred 100 times or more and made up 90% of the SCHOOL/STYLE field data, including French terms). - 97.3% match of the entire AAT with the commonly used (100 times or more) SCHOOL/STYLE terms. 2) CULTURE Field - 59.5% match of all Humanities National Inventory CULTURE field English data with the Styles and Periods facet of the AAT. - 58.3% match of the Styles and Periods facet with the commonly used CULTURE terms (these commonly used terms occurred 1000 times or more, and made up 84.9% of the CULTURE field data , including French terms). - 67.6% match of the entire AAT with the commonly used (1000 times or more) CULTURE Terms. 3) TECHNIQUE Field - 80.2% match of an English-only sample of Humanities National Inventory TECHNIQUE field data with the Processes and Techniques hierarchy of the AAT. - 78.3% match of the AAT Processes and Techniques hierarchy with the commonly used TECHNIQUE terms (these commonly used terms occurred 1000 times or more, and made up 60.7% of the TECHNIQUE field data, including French terms). 103 - 83.9% match of the entire AAT with the commonly used (1000 times or more) TECHNIQUE terms. 4) MATERIAL Field - 83.6% match of an English-only sample of Humanities National Inventory MATERIAL field data with the Materials facet of the AAT. 83.9% match of the Materials facet of the AAT with the commonly used MATERIAL terms (these commonly used terms occurred 500 times or more, and made up 85.7% of the MATERIAL field data, including French terms). - 88.8% match of the entire AAT with the commonly used (500 times or more) MATERIAL terms. 7. Conclusions The high match rate between the terms used in the CHIN Humanities National Inventory and the AAT terminology means that the AAT is a good vocabulary source for Canadian museums in the fields of history, fine art, or ethnology where data are recorded in English. The AAT is a comprehensive, well-designed, and continuously funded project being maintained and developed through a candidate term process, and is freely available to CHIN contributors as a reference database (http://www.chin.gc.ca). CHIN's new mission, to broker effective access to heritage information, will be well served by tools such as the AAT, which can provide a way to put into order or to make broadly accessible the diversity of collections information that is held by heritage institutions. The AAT has already shown itself to be a viable option for vocabulary control within the museum community, and it will become increasingly important to the heritage community as the Internet provides more opportunity for CHIN's contributors to become individually visible and collectively present to a diverse and continually changing audience. The UK Equivalents and UK Alternatives address, in many cases, the context and regional usage in Canadian English. Additional terms in Canadian English and French terms will become available to 104 Canadian users in the AAT Multilingual Thesaurus, which is currently under development. 105 8. Recommendations 1) Canadian museums that record their data in English should use the AAT vocabulary for data entry to standardize recording and maximize retrieval success on their local systems. 2) CHIN should use the AAT as a front-end retrieval tool for the Humanities National Inventory. This will enable users to take advantage of the thesaural relationships of the AAT, and will not require editing of data already in the Humanities National Inventory in order to achieve successful retrieval. Current CHIN terms would not necessarily have to Match a preferred term in the AAT. As long as the CHIN term existed somewhere in the AAT (even as a Use For or Alternate Term), the terminological relationships defined in the AAT would enable them to be retrieved in a search. Users could name their objects at the most specific level possible, and still be able to use the hierarchical relationships within the thesaurus to retrieve the data at the most general or specific level. For example, if an artifact was catalogued as a "Windsor chair", it could be retrieved in a search for "Windsor chair", "chair", "seating furniture", or "furnishings". Also, broad searches could be narrowed easily by following the hierarchies. 3) Canadians should not change their traditional terminology and spelling to suit the AAT. Terms commonly found in the Canadian museum data that did not match the AAT should be submitted to the AAT as candidate terms. Unmatched terms that appear frequently in CHIN data should be reviewed and submitted first, and more infrequent terms later. Differences in form in Canadian databases (e.g., the techniques and cultures in CHIN, which are adjectival—“netted” or “Belgian”—as opposed to gerunds—“netting”—or nouns— “Belgium”— in the AAT) should be submitted as Alternate Terms, similar to the singular form of object names. 4) When selecting software for in-house collections management systems, Canadian museums should investigate the capabilities of the software to support the AAT as a data entry or search tool. 106 5) French equivalents to the commonly used English terms should be submitted to the Multilingual AAT Project. 6) The most commonly occurring English terms should be translated to French (and vice versa) and added to the front-end retrieval tool (thesaurus) in order to enable retrieval of both English and French data, regardless of whether the search is in English or in French. This should be done prior to the availability of the Multilingual AAT. 7) Data in the CHIN field, PERIOD DESIGNATION (PER) should be matched against the AAT Styles and Periods facet. 9. Bibliography and References AHIP. 1994. Art & Architecture Thesaurus. Second Edition. Toni Petersen, Director, Getty Art History Information Program. Oxford University Press, New York. AHIP. 1994. Guide to Indexing and Cataloguing with the Art & Architecture Thesaurus. Petersen and Barnett, eds., Getty Art History Information Program. Oxford Press, New York. ANSI. 1980. American National Standard Guidelines for Thesaurus Structure, Construction, and Use (Z39.19). American National Standards Institute. ANSI. 1993. Guidelines for the Construction, Format, and Management of Monolingual Thesauri (ANSI/NISO Z39.19-1993). American National Standards Institute. Blackaby, James R. et al. 1988. The Revised Nomenclature for Museum Cataloging: A Revised and Expanded Edition of 107 Robert G. Chenhall's System for Classifying Man-Made Objects. American Association for State and Local History, Nashville, TN. CHIN. 1993. Humanities Data Dictionary of the Canadian Heritage Information Network (Third Revision). Documentation Research Group, Canadian Heritage Information Network, Ottawa. CHIN. 1994a. Proposal for the Expansion of the Humanities National Inventory. Documentation Research Group, Canadian Heritage Information Network, Ottawa. CHIN. 1994b. CHIN’s Position on Steven Shubert's Classification in the CHIN Humanities Database. November, 1994. Fellowship Program, Canadian Heritage Information Network, Ottawa. CHIN. 1995. CHIN's New Directions: Presentation to Collections Management Clients. Presentation by Lyn Elliot Sherwood to CHIN contributors, May 16, 1995. Canadian Heritage Information Network, unpublished. 108 Chenhall, Robert G. 1978. Nomenclature for Museum Cataloging: A System for Classifying Man-Made Objects. American Association for State and Local History, Nashville, TN. Delroy, Stephen H. 1994. Object Name and Related Standards. Canadian Heritage Information Network, Ottawa. Dunn, Heather. 1995. Testing a Vocabulary Standard Against Object Naming Practice in Canadian Museums. Master's Research Paper, Museum Studies Program, University of Toronto. Etherington, Robin. 1988. The Culture (CU) Field: Entry and Use—A Discussion Paper. Canadian Heritage Information Network, Ottawa. Ewing, Calum. 1993. Standards for the Use of the Material (MA), Technique (MT) and Related Fields on the CHIN Humanities Databases. Canadian Heritage Information Network, Ottawa. ISO. 1985. Documentation: Guidelines for the establishment and development of multilingual thesauri (ISO 5964). First Edition. International Organization for Standardization, Geneva. ISO. 1986. Documentation: Guidelines for the establishment and development of monolingual thesauri (ISO 2788). Second Edition. International Organization for Standardization , Geneva. Jewett, Deborah. 1985. Standards and Terminology for the Recording of Culture in the Humanities Data Dictionary. Canadian Heritage Information Network, Ottawa. Shubert, Steven Blake. 1992. 109 Classification in the CHIN Humanities Databases. Fellowship Program, Canadian Heritage Information Network, Ottawa. Sullivan, Mary. 1987. Standards for the Fine Arts Object Names. Canadian Heritage Information Network, Ottawa. 110 10. Appendix - Definitions 10.1 Terminology Term or Phrase An entry in a CHIN database sample or the AAT, representing one concept or thing. A term or phrase may consist of one or more words. Combination A term made up of two or more parts derived from different data fields in the CHIN database sample. For example, Object Type + Object Name = rocking chair. 10.2 Occurrence Frequency The number of times that terms occurred in the CHIN database sample, as defined below. Unique Term A term or phrase that occurred only once in a CHIN database sample, most often due to a spelling mistake, but sometimes representing a unique object in the collection or a term of local significance. Commonly used Term A term or phrase that occurred in a CHIN database sample more than a prescribed level (e.g., 100 times). Matches against commonly used terms produced results that were not skewed by errors in the data, unusual terms, and unusual objects in the collections. A commonly used term that did not Match had more authority as a candidate term for the AAT than did a unique term. 10.3 Types of Match Statistical Methods 111 Non-frequency-weighted Match A match of a list of unique terms in the CHIN sample against the AAT, done without frequency weighting. For example, even if a term occurred many times in the CHIN database sample, it would only be counted as one match with the AAT. Frequency-weighted Match A manual method to weigh match statistics based upon the number of occurrences of the term in the CHIN database sample. For example, if a term occurred 20 times in the CHIN database sample, it would be counted as 20 matches with the AAT. 10.4 Match Results Exact Phrase Match A phrase from the CHIN database sample that matches an AAT term exactly. For example, CHIN: colour print AAT: Exact Word Match colour print Each word within the CHIN term matches single terms in the AAT. For example, CHIN: stencil print AAT: stencil print Partial Match At least one word within the CHIN term matches single terms in the AAT. For example, CHIN: stonecut print AAT: Non-Match print No part of the CHIN term matches the AAT. 112 Total Exact Matches The total number of CHIN terms that match terms in the AAT exactly. That is, Total Exact Matches = number of Exact Phrase Matches plus the number of Exact Word Matches. 113