Taxonomy Strategies LLC Assorted Slides on Taxonomy & Metadata Governance Ron Daniel, Jr. Copyright 2009Taxonomy Strategies LLC. All rights reserved. Creating a Governance Structure for the Ongoing Maintenance of the Taxonomy Taxonomies must change if they are to remain relevant. But what will it cost to make those changes to the taxonomy and to the data which is categorized by it? Organizations must have appropriate maintenance processes so that the taxonomy changes are based on rational cost/benefit decisions, without becoming mired in endless paperwork. This interactive workshop will highlight the framework for creating taxonomy governance teams and what their specific responsibilities should be. Special attention will be given to defining maintainable taxonomies and metadata for achieving business needs. Taxonomy Strategies LLC The business of organized information 2 Agenda 10:15 Introduction 10:30 Background 10:35 Maintainable Taxonomies 10:45 Maintainable Metadata 10:50 ROI Estimation 11:00 Governance Environment 11:10 Controlled Items 11:30 Team Structures 11:45 Change Process 12:00 Exercises 12:15 Adjourn Taxonomy Strategies LLC The business of organized information 3 Three Problems Taxonomy development and maintenance is the LEAST of three problems: The Taxonomy Problem: How are we going to build and maintain the lists of pre-defined values that can go into some of the metadata elements? The Tagging Problem: How are we going to populate metadata elements with complete and consistent values? What can we expect to get from automatic classifiers? What kind of error detection and error correction procedures do we need? What fields do we need? The ROI (Return On Investment) Problem: How are we going to use content, metadata, and vocabularies in applications to obtain business benefits? More sales? Lower support costs? Greater productivity? Risk avoidance? How much content? How big an operating budget? How to expose to users? Tolerance for poor data quality? Business Goals and Cultural Factors are major influences on tagging and taxonomy. These must be acknowledged at the start to avoid rework. Taxonomy Strategies LLC The business of organized information 4 There’s more to maintaining the Taxonomy than just maintaining the Taxonomy What must change when the Taxonomy changes? The master copy of the taxonomy. This is a set of items that might be maintained by The data tagged with the taxonomy? taxonomy team and need to be updated. Few groups The user interface which uses the taxonomy? will have all of these under maint. by the taxo team. Backend system software which uses the taxonomy? The training set for automatic classifiers? The educational material for users, catalogers, programmers, etc.? The information sent to downstream users of the taxonomy? The versions of the taxonomy distributed to others. The list of changes. Announcements for stakeholders? Taxonomy Strategies LLC The business of organized information 5 Agenda 10:15 Introduction 10:30 Background 10:35 Maintainable Taxonomies 10:45 Maintainable Metadata 10:50 ROI Estimation 11:00 Governance Environment 11:10 Controlled Items 11:30 Team Structures 11:45 Change Process 12:00 Exercises 12:15 Adjourn Taxonomy Strategies LLC The business of organized information 6 Metadata and Taxonomy Metadata Field Title Data Type String Example Big “The Perl Directory” Creator Identifier String URL simple hierarchy has lots of nodes and is a lot of The Perl Foundation work to maintain. http://www.perl.org/ Date DateTime Jan. 12, 2006 Subject List Computers : Programming : Languages : Perl Taxonomy Taxonomy Strategies LLC The business of organized information 7 DMOZ: A worst case example of a unified ‘subject’ DMOZ has over 600k categories Most are a combination of common facets – Geography, Organization, Person, Document Type, … (e.g.) Top: Regional: Europe: Spain: Travel and Tourism: Travel Guides (BTW – DMOZ Governance model is out of whack) Business Biotechnology & Pharmaceuticals Education & Training Regional Europe Ireland Business & Economy Employment Health & Medical Reference Education Colleges & Universities North America United States Maryland Reference Education K-12 Home Schooling Unschooling Chats and Forums Science Math Academic Departments South America Colombia Society People Women Science & Technology Mathematics Science Social Sciences Linguistics Translation Associations Business Small Business Finance Business Accounting Firms Business Employment By Industry Business Healthcare Employment Taxonomy Strategies LLC Columbia Union College Athletics Competency (discipline) 11 Geography 9 Audience 9 Topic 7 Accounting Organization 5 Directories Doc Type 4 Industry 4 Process 4 Regional The business of organized information 8 If you want to get technical here, you can explain that lots of big hierarchies are pre-coordinated combinations of items that could come from separate facets. This introduces some arbitrary choices (do we list content type first and location second, or …). It also leads to a lot of repeated substructure which means there have to be edits in many places to make what is in concept a pretty small change. Taxonomy Strategies LLC The business of organized information 9 The power of taxonomy facets Categorize in multiple, independent, categories. Allow combinations of categories to narrow the choice of items. 4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10,000 nodes (104) Main Ingredients • • • • • • • • • • Chocolate Dairy Fruits Grains Meat & Seafood Nuts Olives Pasta Spices & Seasonings Vegetables Meal Type • • • • • • Breakfast Brunch Lunch Supper Dinner Snack Cooking Methods Cuisines • • • • • • • • • • • African American Asian Caribbean Continental Eclectic/ Fusion/ International Jewish Latin American Mediterranean Middle Eastern Vegetarian • • • • • • • • • • • • • Advanced Bake Broil Fry Grill Marinade Microwave No Cooking Poach Quick Roast Sauté Slow Cooking • Steam • Stir-fry Easier to maintain Can be easier to navigate 42 values to maintain (10+6+11+15) 9900 combinations (10x6x11x15) Taxonomy Strategies LLC The business of organized information 10 How do I get a good Taxonomy? – Seven practical rules 1) Incremental, extensible process that identifies and enables users, and engages stakeholders. 2) Quick implementation that provides measurable results as quickly as possible. 3) Not monolithic—has separately maintainable facets. 4) Re-uses existing IP as much as possible. 5) A means to an end, and not the end in itself . 6) Not perfect, but it does the job it is supposed to do—such as improving search and navigation. 7) Improved over time, and maintained. Taxonomy Strategies LLC The business of organized information 11 Some vocabulary construction rules Don’t just have names, also have identifiers This will reduce retagging later when names change When tagging content, use the most specific code. Let software handle the hierarchy. Bonus: Use URIs for node IDs & publish on the web (See LINKED DATA in the futures chapter) Develop scope notes Not just a definition, also say what kind of content the node applies to Metadata specification must state the vocabulary for a element. Gather data from multiple sources Talk with users and experts Analyze query logs and content Choose and arrange terms Test and finalize first version Shift into maintenance mode Taxonomy Strategies LLC The business of organized information 12 What do I do with all these facets? Either expose them directly in the user interface (postcoordinating) or Combine them in a minimal hierarchy (pre-coordination) Post-coordination takes software support, which may be fancy or basic. How many facets? (See elsewhere) Taxonomy Strategies LLC The business of organized information 13 Agenda 10:15 Introduction 10:30 Background 10:35 Maintainable Taxonomies 10:45 Maintainable Metadata 10:50 ROI Estimation 11:00 Governance Environment 11:10 Controlled Items 11:30 Team Structures 11:45 Change Process 12:00 Exercises 12:15 Adjourn Taxonomy Strategies LLC The business of organized information 14 Maintainable Metadata Design metadata specification for future changes Lessons from the Dublin Core Provide metadata tagging and storage that will deal with changes Taxonomy Strategies LLC The business of organized information 15 Dublin Core: A little more complicated over time Elements Refinements 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. Abstract Access rights Alternative Audience Available Bibliographic citation Conforms to Created Date accepted Date copyrighted Date submitted Education level Extent Has format Has part Has version Is format of Is part of Identifier Title Creator Contributor Publisher Subject Description Coverage Format Type Date Relation Source Rights Language Taxonomy Strategies LLC Encodings Types Is referenced by Is replaced by Is required by Issued Is version of License Mediator Medium Modified Provenance References Replaces Requires Rights holder Spatial Table of contents Temporal Valid The business of organized information Box DCMIType DDC IMT ISO3166 ISO639-2 LCC LCSH MESH Period Point RFC1766 RFC3066 TGN UDC URI W3CTDF Collection Dataset Event Image Interactive Resource Moving Image Physical Object Service Software Sound Still Image Text 16 Design Metadata Specification for future changes Degree of future changes will depend on organization size, sophistication of use, number of repositories and amount of content. Don’t over-engineer For all organizations: start with the Dublin Core with a few additions and deletions for specific needs At large/sophisticated organizations: “Refinements” will be unavoidable in the future. Start with “DatePublished” so that later additions of “DateModified”, DateApproved”, “DateVerified”, etc. fit in easily. Identify broad “integration metadata” vs. division-specific fields. Coordinate with others to set up a working understanding of a corporate multi-level metadata standard. Taxonomy Strategies LLC The business of organized information 17 Provide metatagging and storage that will deal with changes Tag with identifiers, not names. This will reduce retagging later when names change Not good if people need to view raw tagging, but usually software will be involved to show labels. When tagging content, use the most specific concept. Let software handle the hierarchy. Metadata is easier to manage if it is stored in a central repository, instead of spread out in the individual files. Exception – when sending files out to other systems (e.g. photo metadata) Warning – ‘metadata repositories’ are usually a different class of software than what we are discussing. Taxonomy Strategies LLC The business of organized information 18 Agenda 10:15 Introduction 10:30 Background 10:35 Maintainable Taxonomies 10:45 Maintainable Metadata 10:50 ROI Estimation 11:00 Governance Environment 11:10 Controlled Items 11:30 Team Structures 11:45 Change Process 12:00 Exercises 12:15 Adjourn Taxonomy Strategies LLC The business of organized information 19 Fundamentals of taxonomy ROI Tagging content using a taxonomy is a cost, not a benefit. There is no benefit without exposing the tagged content to users in some way that cuts costs, improves revenues, reduces risk, or achieves some other clear business goal. Putting taxonomy into operation requires UI changes and/or backend system changes, as well as data changes. You need to determine those changes, and their costs, as part of the ROI. Taxonomy Strategies LLC The business of organized information 20 Key Factors in ROI Breadth “How many people will metadata affect?” Repeatability “How many times a day will they use it? Cost/Benefit “Is this a costly effort with little or no benefits?” Taxonomy Strategies LLC The business of organized information 21 How to estimate costs — Tagging Consider complexity of facet and ambiguity of content to estimate time per value. Taxonomy Facet Hier? Typical CV Size Time/ Value (min) Avg # values / Item $ / Min Cost/ Element Audience N 10 0.25 2 $ 0.42 $ 0.21 Content Type N 20 0.25 1 $ 0.42 $ 0.11 Organizational Unit Y 50 0.5 2 $ 0.42 $ 0.42 Products & Services Y 500 1.5 4 $ 0.42 $ 2.52 Geographic Region Y 100 0.5 2 $ 0.42 $ 0.42 Broad Topics Y 400 2 4 $ 0.42 $ 3.36 1080 5 15 $ 7.04 TOTALS Is this field worth the cost? Estimated cost of tagging one item. This can be reduced with automation, but cannot be eliminated. Inspired by: Ray Luoma, BAU Solutions Taxonomy Strategies LLC The business of organized information 22 How to estimate costs — Assumptions Your numbers will vary. ASSUMPTIONS Enterprise SW License $ 100,000 Maintenance/Support 15% SW Implementation 200% Legacy Content Items 100,000 Content Growth Rate 15% Tagging/Item $ Enterprise Taxonomy $ 100,000 Taxonomy Strategies LLC The business of organized information 7.04 23 How to estimate costs — Total cost of ownership (TCO) Description Year 1 Year 2 Year 3 Year 4 Year 5 SW Licenses $ 100,000 Maintenance Implementation $ $ 15,000 $ 15,000 $ 15,000 $ 15,000 $ 30,000 $ 30,000 $ 30,000 $ 30,000 $ 105,600 $ 121,440 $ 139,656 $ 160,604 $ 15,000 $ 15,000 $ 15,000 $ 15,000 $ 165,600 $ 181,440 $ 199,656 $ 220,604 200,000 App Tech Support Tagging Legacy Content $ 704,000 Ongoing Taxonomy Creation $ 100,000 Maintenance TOTAL Taxonomy Strategies LLC $ 1,103,500 The business of organized information 24 Sample ROI Calculations Description Year 1 Year 2 Year 3 Year 4 Year 5 Costs Software Licenses/ Maintenance $ 100,000 $ 15,000 $ 15,000 $ 15,000 $ 15,000 Implementation/Support $ 200,000 $ 30,000 $ 30,000 $ 30,000 $ 30,000 Taxonomy Creation/ Maintenance $ 100,000 $ 15,000 $ 15,000 $ 15,000 $ 15,000 Legacy/Ongoing Tagging $ 703,500 $ 105,600 $ 121,440 $ 139,656 $ 160,604 Ongoing cost of tagging due to 15% content growth. Benefits Productivity increases $ - $ 125,000 $ 1,250,000 $ 1,250,000 $ 1,250,000 Service efficiency gains $ - $ 129,600 $ 1,296,000 $ 1,296,000 $ 1,296,000 Yearly Net Benefits $(1,103,500) $ $ 2,364,560 $ 2,346,344 $ 2,325,396 Payback period 1.4 89,000 Years until Benefits = Costs Inspired by: Todd Stephens, Dublin Core Global Corporate Circle Taxonomy Strategies LLC The business of organized information 25 Where do the benefits come from? Common taxonomy ROI scenarios Catalog site - ROI based on increased sales through improved: Product findability Product cross-sells and up-sells Customer loyalty Call center - ROI based on cutting costs through: Fewer customer calls due to improved website self-service Faster, more accurate CSR responses through better information access Compliance – ROI based on: Avoiding penalties for breaching regulations Following required procedures (e.g. Medical claims) Knowledge worker productivity - ROI based on cutting costs through: Less time searching for things Less time recreating existing materials, with knock-on benefits of less confusion and reduced storage and backup costs Executive mandate No ROI at the start, just someone with a vision and the budget to make it happen Taxonomy Strategies LLC The business of organized information 26 Agenda 10:15 Introduction 10:30 Background 10:35 Maintainable Taxonomies 10:45 Maintainable Metadata 10:50 ROI Estimation 11:00 Governance Environment 11:10 Controlled Items 11:30 Team Structures 11:45 Change Process 12:00 Exercises 12:15 Adjourn Taxonomy Strategies LLC The business of organized information 27 Generic, yet Important, Advice It’s not about the tools. It’s not about the taxonomy. It’s about the business goals and the processes people use to meet those goals. Metrics are grossly underused in metadata and search. Taxonomy Strategies LLC The business of organized information 28 Taxonomy governance overview Taxonomy governance can be viewed as a standards process Closely linked to organizational metadata standard Taxonomy must evolve, but in predictable way Take tips from other standards efforts Team structure, with an appeals process Taxonomy stewardship is part-time role at most organizations Team needs to make decisions based on costs and benefits Documentation and educational material on Taxonomy and Metadata Announcements Comment-handling responsibilities (part of error-correction process) Issue Logs Release Schedule Taxonomy Strategies LLC The business of organized information These practices are in rough order of implementation. 29 Taxonomy governance environment Change Requests & Responses 1: External vocabularies change on their own schedule, with some advance notice. ISO 3166-1 Other External Published Facets Consuming Applications Web CMS 2: Team decides when to update facets within Taxonomy Archives Intranet Search Vocabulary Management System ERMS Notifications CVs ERP 3: Team adds value via mappings, translations, synonyms, training materials, etc. Custodians Other Internal CV (Controlled Vocabulary) – The list of values for one facet in the Taxonomy. Taxonomy Strategies LLC ’ Other Controlled Items Intranet Nav. DAM … … ’ 4: Updated versions of facets published to consuming Taxonomy Governance applications Environment The business of organized information 30 Agenda 10:15 Introduction 10:30 Background 10:35 Maintainable Taxonomies 10:45 Maintainable Metadata 10:50 ROI Estimation 11:00 Governance Environment 11:10 Controlled Items 11:30 Team Structures 11:45 Change Process 12:00 Exercises 12:15 Adjourn Taxonomy Strategies LLC The business of organized information 31 Controlled Items Taxonomy Team will have several items to manage: Controlled Vocabularies Metadata Standard Editorial Rules Tagger Training Materials (manual and automatic) Charter, Goals, Performance Measures Team Processes Outreach & ROI Website Communication plan Presentations Announcements “Roadmap” Advanced practice, requires long planning horizon for organization's IT projects Even small taxonomy teams should develop many of these items, although not to the same level of formality. Taxonomy Strategies LLC The business of organized information 32 Controlled Vocabularies are not just tabbed lists Source: NASA Taxonomy Competencies Facet http://nasataxonomy.jpl.nasa.gov/nascomp/index_tt.htm Taxonomy Strategies LLC The business of organized information 33 Controlled Item: Metadata Specification Element Name XML Map Repeatable Source Purpose General Purpose Metadata Unique ID dc:identifier 1 System supplied System identifier to retrieve item. Owner dc:creator ? System supplied POC for content maintenance Title dc:title 1 User supplied Text search & results display Date dc:date 1 System supplied Publish, feature, & review content. Subject Metadata Organization x:corp * Corp Classif CV Asset x:asset * Asset CV Region/Country dc:coverage * Country CV Basin/Platform/Well x:well * B/P/Well CV Content Type dc:type ? Content Types CV Company/Client/Op erator/Partner x:company * Company CV Project x:project * Project CV Search for, browse, group & filter search results. Use Metadata Discipline dcTerms: audience * Discipline CV Target, personalize content. Retention x:retention 1 System supplied Remove expired content Legend: Taxonomy Strategies LLC ? – 1 or more The business of organized information * - 0 or more 34 Controlled Item: Editorial Rules Akin to “Chicago Manual of Style” Issues commonly addressed in the rules: Abbreviations Ampersands Capitalization Continuations (More… or Other…) Duplicate Terms Fidelity to External Source Hierarchy and Polyhierarchy Languages and Character Sets Length Limits “Other” – Allowed or Forbidden? Plural vs. Singular Forms Relation Types and Limits Scope Notes Serial Comma Sources of Terms Spaces Synonyms and Acronyms Translations Term Order (Alphabetic or …) Term Label Order (Direct vs. Inverted) What to do when rules conflict – how do people decide which rule is more important? Taxonomy Strategies LLC The business of organized information Rule Name Editorial Rule Use Existing Vocabularies Other things being equal, reusing an existing vocabulary is preferred to creating a new one. Ampersands The character '&' is preferred to the word ‘and’ in Term Labels. Example: Use Type: “Manuals & Forms”, not “Manuals and Forms”. Special Characters Retain accented characters in Term Labels. Example: Use “España”, not “Espana”. Serial comma If a category name includes more than two items, separate the items by commas. The last item is separated by the character ‘&’ which IS NOT preceded by a comma. Example: “Education, Learning & Employment”, not “Education, Learning, & Employment”. Capitalization Use title case (where all words except articles are capitalized). Example: “Education, Learning & Employment” NOT “Education, learning & employment” NOT “EDUCATION, LEARNING & EMPLOYMENT” NOT “education, learning & employment” … … 35 Controlled Item: Training Materials Staff will require training on The UI they use to tag the content The rules to follow when deciding what codes to apply The end-effect of the codes they apply The structure of the taxonomy Tagging examples come from earlier stages in taxonomy development process Indexing rules Rule Description Specificity rule Apply the most specific terms when tagging assets. Specific terms can always be generalized, but generic terms cannot be specialized. Repeatable rule All attributes should be repeatable. Use as many terms as necessary to describe What the asset is about and Why it is important. Storage is cheap. Re-creating content is expensive. Appropriate ness rule Not all attributes apply to all assets. Only supply values for attributes that make sense. Usability rule Anticipate how the asset will be searched for in the future, and how to make it easy to find it. Remember that search engines can only operate on explicit information. Indexing UI Hardcopies of the taxonomy, and yellow highlighters, are helpful during training Taxonomy Strategies LLC The business of organized information 36 Controlled item: Communications Plan Stakeholders: Who are they and what do they need to know? Channels: Methods available to send messages to stakeholders. Need a mix of narrow vs. broad, formal vs. informal, interactive vs. archival, … Messages: Communications to be sent at various stages of project. Bulk of the plan is here Taxonomy Strategies LLC The business of organized information Stakeholders Info. Needed Project Sponsors Progress, Issues, Policies Dept. Reps Progress, Priorities, … … Users Progress, How-Tos Vendors RFPs & SOWs Channel Description Demo Live, or screen capture for download Presentation Tailored message for specific audience Website Overview info for all, link to files Memo Formal notification … … Trigger Msg. Descrip From To Chan. Initiation Project overview Dept. head All Memo … … … … … 37 Controlled Item: Team Charter Taxonomy Team is responsible for maintaining: The Taxonomy, a multi-faceted classification scheme Associated materials, including a website providing: Corporate Metadata Standard Editorial Style Guide Taxonomy Training Materials Team rules and procedures (subject to CIO review) Team evaluates costs and benefits of suggested changes. Taxonomy Team will: Manage relationship between providers of source vocabularies and consumers of the Taxonomy Identify new opportunities for use of the Taxonomy across the Enterprise to improve information management practices Promote awareness and use of the Taxonomy Taxonomy Strategies LLC The business of organized information 38 Remaining Controlled Items Performance Measures to go along with Charter? Team Processes (see later in this presentation) Automatic Classifier Training Materials Website Presentations and Announcements Change Request List (see later in this presentation) “Taxonomy Roadmap” Advanced practice, requires long planning horizon for organization's IT projects Taxonomy Strategies LLC The business of organized information 39 Exercise 2: Editorial Rules Look at sample taxonomy Think of ways to clean it up and make it ‘better’ Smaller More professional looking Easy to use Write editorial rules for the cleanups. Provide an example with each rule: Rule Name Plumem The business of organized information Lorne ipso ernum de jura fino el Symosyit Esr Dirgin a periso de forestima Himerisf Faleoin fi ribska firn eowkds Capitalization Taxonomy Strategies LLC Editorial Rule All terms in lowercase. “programming, NOT “Programming” 40 Exercise 2: Sample Taxonomy Source: http://del.icio.us/tag/ Taxonomy Strategies LLC The business of organized information 41 Exercise 2: Editorial Rules Worksheet Provide a name for each rule, the rule itself, and an example of the rule of the form “X, not Y”. Rule Name Plurals Use plural form of names, not singular. Capitalization Taxonomy Strategies LLC Editorial Rule All terms, except proper nouns, are lowercase. E.g. “programming”, NOT “Programming”. E.g. “Schwab”, not “schwab”. The business of organized information 42 Agenda 10:15 Introduction 10:30 Background 10:35 Maintainable Taxonomies 10:45 Maintainable Metadata 10:50 ROI Estimation 11:00 Governance Environment 11:10 Controlled Items 11:30 Team Structures 11:45 Change Process 12:00 Exercises 12:15 Adjourn Taxonomy Strategies LLC The business of organized information 43 Organization 1: Taxonomy Governance Team Organization 1 – Internal portal for Fortune 50 Diversified Multinational. Team’s liaison to content creators Estimates costs of proposed changes in terms of editorial process changes, additional or reduced workload, etc. Small-scale Metadata QA Responsibility Executive Sponsor Advocate for the taxonomy team Business Lead Keeps team on track with larger business objectives Balances cost/benefit issues to decide appropriate levels of effort Specialists help in estimating costs Obtains needed resources if those in team can’t accomplish a particular task Technical Specialist Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc. Helps obtain data from various systems Taxonomy Strategies LLC The business of organized information Content Specialist Taxonomy Specialist Suggests potential taxonomy changes based on analysis of query logs, indexer feedback Makes edits to taxonomy, installs into system with aid of IT specialist Content Owner Reality check on process change suggestions Changes Taxonomy Strategist Taxonomist Information Architect 2 Communications Specialist* 44 Organization 2: Vocabulary Policy Committee Organization 2 – A non-profit international organization. Goal is to improve information management practices to reduce overlap between many similar vocabularies across many systems. Constraint: Even when number of vocabularies reduced, some must still have very close links. Business Lead Chairs group. Assures CVs fit with organization’s larger information management effort. Small group management experience, Information management background. Vocabulary Custodians (3) Responsible for content in a specific CV, typically based on organizational lines. Team lead experience, detail-oriented. Familiar with databases and organization processes Other Relevant Staff IT Steering Group Oversees Vocabulary Policy Committee Stakeholders Managers of systems using the vocabularies, thus affected by changes. They have a lot of visibility into the process. Control over CV changes is limited, but they schedule their system’s adoption of changes. Additional Roles – available during startup of team, and on an as-needed basis later Training Representative Develops communications plan, training materials Work Practices Representative Develops processes, monitors adherence IT Representative Backups, admin of CV Tool IT administration experience Taxonomy Strategies LLC The business of organized information 45 Organization 3: Taxonomy Team Organization 3 – Public catalog site for Fortune 50 Retailer. Data for products provided by manufacturers. Business Lead Chairs committee, resolves disputes Marketing Representatives Provide product marketing expertise Advocate for product manufacturers Represent data entry concerns Website Representative Likely Changes Fast-Track Process – A fast-track process exists, likely to be used very often. Representative will ask Taxonomy Specialist for a change and he will get approval from Website Representative. Provides input on search and navigation impacts Advocate for customers and other website users Provides search log and click trail analysis Larger team than many retailers, where a single person is responsible. Taxonomy Specialist Maintains taxonomy and product catalog Provides data feeds to drive site Taxonomy Strategies LLC The business of organized information A single person still makes the changes here, but there is some oversight. 46 What if I have to do it solo? Realize: Its not totally solo – IT help, Graphics & UI help, Business Goals help, Funding help, Review & QA help… You are the general contractor It needs to be part of your objectives Limit the objectives to what can be achieved by you, and by your organization Concentrate: Resource allocation (i.e. Manage your time) Fundamental processes Query log examination Error correction procedure Cherry-pick from Roles Business Lead – align with organization goals, get needed resources, make cost/benefit decisions, report upstairs IT Liaison – Work with IT specialists to get software installed, logs gathered, content harvested, etc. Consider impact of changes on tools and data Taxonomy / Search Specialist – analyze behavior and suggest changes. Implement changes which pass cost/benefit muster Website/User Representative – consider impact of changes on users and job performance Communications!!! Taxonomy Strategies LLC The business of organized information 47 Exercise 3: Team & Stakeholder Identification Role Applicable/Modify Name(s) Taxonomy Team Members Team Lead Taxonomy Editor(s) Vocabulary Custodian(s) Liaisons with external vocabularies Liaisons with applications using vocabularies User advocate(s) Training / Communications IT / Data & System Maintenance External Stakeholders Team Supervisory Group Representatives of external vocabularies Representatives of consuming applications Representatives of users Other representatives of organization Taxonomy Strategies LLC The business of organized information 48 Agenda 10:15 Introduction 10:30 Background 10:35 Maintainable Taxonomies 10:45 Maintainable Metadata 10:50 ROI Estimation 11:00 Governance Environment 11:10 Controlled Items 11:30 Team Structures 11:45 Change Process 12:00 Exercises 12:15 Adjourn Taxonomy Strategies LLC The business of organized information 49 Taxonomy editing tools Immature industry – no vendors in upper-right quadrant! high Most popular taxonomy editor? MS Excel Ability to Execute This slide is out of date. Don’t know if we want to include this. low All upper-end tools are high functionality and high cost. Widely used, cheap, good reporting, bad IDs Niche Players Taxonomy Strategies LLC Completeness of Vision The business of organized information Visionaries 50 Basic Standard and Custom Fields Standard and Custom Relations Data Typing and Restrictions Consistency Enforcement Flexible Reporting Flexible Importing? Midrange UNICODE Multiple Vocabulary Support Inter-Vocabulary Relations Unique IDs ISO Codes not sufficient Advanced Taxonomy editor functionality requirements Workflow Voting Change Request Mgmt. Stylistic rules enforcement Programmability Taxonomy Strategies LLC The business of organized information Term Editing Hierarc hy Browse 51 Taxonomy governance: Where changes come from Firewall Application UI Tagging UI Content Application Logic Tagging Logic Taxonomy Staff notes ‘missing’ concepts I think three sources of Query log change requests is a big analysis concept to communicate to End User readers. Recommendations by Editor 1. Small taxonomy changes (labels, synonyms) 2. Large taxonomy changes (retagging, application changes) 3. New “best bets” content Taxonomy Strategies LLC Tagging Staff Taxonomy Editor Taxonomy Team The business of organized information Team considerations 1. Business goals 2. experience Changes in user experience 3. Retagging cost Requests from other Requests from other parts of NASA parts of the organization 52 Processes Different organizations will need to consider their own change processes. Organization 1: A custodian is responsible for the content, but checks facts with department heads before making changes Organization 2: Analysts suggest changes, editors approve, copyeditors verify consistency Organization 3: Marketing reps ask for a change, taxonomy editor makes demo, web representative approves it. Change process MUST also consider cost of implementing the change Retagging data Reconfiguring auto-classifier Retraining staff Changes in user expectations Taxonomy Strategies LLC The business of organized information Taxonomy Change Cases Case 1. Renaming a term Case 2. Adding a new leaf term Case 3. Inserting a new term Case 4. Splitting a term Case 5. Deleting a leaf term or subtree Case 6. Deleting a term Case 7. Moving a subtree Case 8. Merging terms Case 9. Adding a CV Case 10. Deleting a CV 53 Taxonomy governance: Taxonomy maintenance workflow Can contrast this process with others that are less formal and/or less like a newsroom.. Couple more are described on next slide. Suggest new name/category Problem? Yes Review new name Problem? No Copy edit new name Add to enterprise Taxonomy Taxonomy No Yes Taxonomy Tool Taxonomy Strategies LLC Analyst The business of organized information Editor Copywriter Sys Admin 54 Other change processes Processes may be diagramed or Organization X: written Change Request Process Provide an ‘emergency’ change process because it will be needed. How can emergency changes be requested? Who makes the change and who approves it? Who are backups for the people when they are out? Who are escalation points? Change Request Process should call out decision criteria, e.g. Anyone can ask a team member for a change. Team members responsible for figuring out details and bringing to team for decision. Pending changes list for low priority/high cost items. Change Process Includes preview of change on site and data mockup Fast-Track Change Process Anyone can ask editor, he gets team leader or deputy approval Cost of retagging Benefit of change Conflict with editorial rules Taxonomy Strategies LLC The business of organized information 55 Fundamental Processes & Outlooks Two fundamental processes every organization should implement to maintain its metadata and taxonomies: Query log / Click trail examination Error Correction Another biggie What are the key outlooks a taxonomist should try to instill in their organization? Integrated approach to Taxonomy, Metadata, Search, and UI Measure & Improve Mindset Taxonomy Strategies LLC The business of organized information 56 Fundamental process #1 – Query log examination How can we characterize users and what they are looking for? • • Query Log & Click Trail Examination • Only 30-40% of organizations • interested in Taxonomy Governance examine query logs* Basic reports provide plenty of real value Greatest value comes from: Identifying a person as responsible for search quality Starting a “Measure & Improve” mindset Greatest challenge: • • UltraSeek Reporting Top queries Queries with no results Queries with no click-through Most requested documents Query trend analysis Complete server usage summary Click Trail Packages iWebTrack NetTracker OptimalIQ SiteCatalyst Visitorville WebTrends Getting a person assigned (≥ 10%) Getting logs turned back on Source: Metadata Maturity Model Presentation, Ron Daniel, ESS’05 Taxonomy Strategies LLC The business of organized information 57 Fundamental process #2 – Error correction Errors will happen, and some will be found. What are you going to do about them? Tagging errors, content errors, taxonomy errors, … Define an error correction process. You have an error correction process. Would you hate to see it on paper? Process will accommodate questions like: Who looks at it? Is it an error? What are the costs to correct vs. not correct? Does the correction need to be scheduled? etc. Once a tagging error is corrected, NEVER lose that fact. Manually reviewed pages are vital for training automatic classifiers Has implications for metadata specification and review procedures Over time, multiple error detection methods will be defined e.g. Statistical sampling of newly added pages Gradually, additional error correction processes may be defined to deal with particular types of errors Taxonomy Strategies LLC The business of organized information 58 Fundamental Outlooks Measure & Improve Mindset Query logs and click trails are prime example Next place to instrument: Error correction and error detection processes Integrated handling of Taxonomy, Metadata, UI, & Search To be most effective, these must work together Governance structure must help that happen Cross-functional team structure is a start Taxonomy Strategies LLC The business of organized information 59 Actions to define taxonomy governance Initial vocabularies should be selected for stability as well as utility. Custodians of shared vocabularies must be identified, educated re. impacts of changes. Group of custodians and stakeholders must be established. (Simple) System for sharing the CVs and tracking the update process must be established. Taxonomy Strategies LLC The business of organized information 60 Agenda 10:15 Introduction 10:30 Background 10:35 Maintainable Taxonomies 10:45 Maintainable Metadata 10:50 ROI Estimation 11:00 Governance Environment 11:10 Controlled Items 11:30 Team Structures 11:45 Change Process 12:00 Exercises 12:15 Adjourn Taxonomy Strategies LLC The business of organized information 61 Exercise 4: Self-Diagnosis 1. Does your organization know what it is, or wants to be, doing around search & taxonomy yet? 18. Do you have an identified taxonomy “team” with at least one person? 2. Is the cost basis for the taxonomy ROI clear to you? 19. Is there at least one person working on taxonomy/metadata/search more than ½ time? 3. Is the benefits basis for the taxonomy ROI clear to you? 20. Does the team contain members who represent search, UI, and metadata tagging? 4. Is the cost basis for the taxonomy ROI clear to your CFO? 5. Is the benefits basis for the taxonomy clear to your CFO? 21. Does the organization have any hiring and training criteria for taxonomy, metadata, and search positions? 6. Do you know how content will be tagged? 22. Does the team maintain Editorial Rules? 7. Do you know how tagged content will be displayed to users? 23. Does the team maintain a corporate metadata specification? 8. Do you know how users will fetch the content? 24. Does the team maintain educational materials? 9. Do users know how they should report errors in the tagging? 25. Does the team have a communications plan? 10. Do you know how what information will be logged for later analysis? 26. Does the team examine query logs? 11. Do you know what information has to be reported to management to justify the taxonomy team? 27. Does the team examine click trails? 12. Does management expect the taxonomy team to justify its existence? 13. Is your organization planning a tightly focused taxonomy effort? 14. Is your organization planning a credible ‘Enterprise Taxonomy Strategy’? I think a self-diagnosis quiz 16. Has your organization identified some facets as stable and some facets as volatile? like this could be nice to 17. Does yourhave organization a plan for retagging data when inhave the book. Also seethe taxonomy is changed? the “Metadata Maturity Model” stuff in the next set of slides. 15. Does your organization expect its taxonomies to change frequently? Taxonomy Strategies LLC The business of organized information 28. Does the team have a documented error correction process? 29. Does the organization have a procedure to locate ROT (Redundant, Obsolete, or Trivial content)? 30. Does the organization have any qualitative or quantitative measures of data quality? 31. Do you use a tool other than MS Excel for editing and maintaining the Taxonomy? 32. Were taxonomy, metadata, search, or content management tools purchased with money other than “use it or lose it” funds? 62 Taxonomy Strategies LLC Data Governance Maturity: When the business depends on clear description of fuzzy objects Presented to San Francisco DAMA Sept. 10, 2008 Ron Daniel, Jr. Copyright 2009Taxonomy Strategies LLC. All rights reserved. Goals for this talk Provide you with background on maturity models. Provide the results of our surveys of Search, Metadata, & Taxonomy practices and discuss interesting findings. Review the practices in use at stock photo houses, and compare them to methods that may be used in typical information management projects. Give you the tools to do a simple self-assessment of your organization’s metadata maturity Taxonomy Strategies LLC The business of organized information 64 Agenda 9:15 Metadata Definitions 9:30 Maturity Models 9:45 Metadata Maturity Model (ca. 2006) 10:15 Break 10:30 Stock Photo Business 10:40 Data Governance Practices in Stock Photo Agencies 11:40 Summary 11:45 Questions 12:00 Adjourn Taxonomy Strategies LLC The business of organized information 65 Taxonomy and metadata definitions Metadata “Data about data”. Different communities have very different assumptions about they types of data being described. I’m from the Information Science community, not the database, statistics, or massive storage communities. Taxonomy 1. The classification of organisms in an ordered system that indicates natural relationships. 2. The science, laws, or principles of classification; systematics. 3. Division into ordered groups, categories, or hierarchies. Taxonomy Strategies LLC The business of organized information 66 Examples of taxonomy used to populate metadata fields Metadata Values (Facets within the overall Taxonomy) Audience Metadata Title Author Department Audience Topic Taxonomy Strategies LLC The business of organized information Internal Executives Managers External Suppliers Customers Partners Topics Employee Services Compensation Retirement Insurance Further Education Finance and Budget Products and Services Support Services Infrastructure Supplies 67 Example faceted taxonomy ABC Computers.com Content Type Competency Industry Service Award Case Study Contract & Warranty Demo Magazine News & Event Product Information Services Solution Specification Technical Note Tool Training White Paper Other Content Type Business & Finance Interpersonal Development IT Professionals Technical Training IT Professionals Training & Certification PC Productivity Personal Computing Proficiency Banking & Finance Communications E-Business Education Government Healthcare Hospitality Manufacturing Petrochemocals Retail / Wholesale Technology Transportation Other Industries Assessment, Design & Implementati on Deployment Enterprise Support Client Support Managed Lifecycle Asset Recovery & Recycling Training Taxonomy Strategies LLC The business of organized information Product Family Desktops MP3 Players Monitors Networking Notebooks Printers Projectors Servers Services Storage Televisions Non-ABC Brands Audience Line of Business RegionCountry All Business ABC Employee Education Gaming Enthusiast Home Investor Job Seeker Media Partner Shopper First Time Experienced Advanced Supplier All Home & Home Office Gaming Government, Education & Healthcare Medium & Large Business Small Business All Asia-Pacific Canada ABC EMEA Japan Latin America & Caribbean United States 68 Manually tagged metadata sample Attribute Values Title Jupiter’s Ring System URL http://ringmaster.arc.nasa.gov/jupiter/ Description Overview of the Jupiter ring system. Many images, animations and references are included for both the scientist and the public. Content Types Web Sites; Animations; Images; Reference Sources Audiences Educators; Students Organizations Ames Research Center Missions & Projects Voyager; Galileo; Cassini; Hubble Space Telescope Locations Jupiter Business Functions Scientific and Technical Information Disciplines Planetary and Lunar Science Time Period 1979-1999 Taxonomy Strategies LLC The business of organized information 69 Other things sometimes called Taxonomy Type Remarks Synonym Ring Connects a series of terms together Treats them as equivalent for search purposes e.g (Dog, Canine, Pooch, Mutt) (Cat, Feline, Kitty), … Authority File Used to control variant names with a preferred term Typically used for names of countries, individuals, organizations e.g. (IBM, Big Blue, International Business Machines Inc.) Classification Scheme A hierarchical arrangement of terms May or may not follow strict “is-a” hierarchy rules Usually enumerated; ie, LC or Dewey Thesaurus Expresses semantic relationships of: • Hierarchy (broader & narrower terms) • Equivalence (synonyms) • Associative (related terms) May include definitions Ontology Taxonomy Strategies LLC Resembles faceted taxonomy but uses richer semantic relationships among terms and attributes and strict specification rules A model of reality, allowing inferences to be made. The business of organized information 70 Agenda 9:15 Metadata Definitions 9:30 Maturity Models 9:45 Metadata Maturity Model (ca. 2006) 10:15 Break 10:30 Stock Photo Business 10:40 Data Governance Practices in Stock Photo Agencies 11:40 Summary 11:45 Questions 12:00 Adjourn Taxonomy Strategies LLC The business of organized information 71 Organizational benchmarking A common goal of organizations is to ‘benchmark’ themselves against other organizations. Different organizations have: Different levels of sophistication in their planning, execution, and follow-up for CMS, Search, Portal, Metadata, and Taxonomy projects. Different reasons for pursuing Search, Metadata, and Taxonomy efforts Different cultures Benchmarks should be to similar organizations. Taxonomy Strategies LLC The business of organized information 72 Is unnecessary capability harmful? Tool Vendors continue to provide ever-more capable tools with ever-more sophisticated features. But we live in a world where a significant fraction of public, commercial, web pages don’t have a <title> tag. Organizations that can’t manage <title> tags stand a very poor chance of putting an entity extractor to use, which requires some ongoing management of the lists of entities to be extracted. Organizations that can’t create and maintain clean metadata can’t put a faceted search UI to good use. Unused capability is poor value-for-money. Organizations over-spend on tools and under-spend on staff & processes. Taxonomy Strategies LLC The business of organized information 73 Towards better benchmarking… Wanted a method to: Generally identify good and bad practices. Help clients identify the things they can do, and the things that stand an excellent chance of failing. Predict likely sources of problems in engagements. We have started to develop a Metadata Maturity Model, inspired by Maturity Models from the software industry. To keep the model tied to reality, we are conducting surveys to determine the actual state of practice around search, metadata, taxonomy, and supporting business functions such as staffing and project management. Taxonomy Strategies LLC The business of organized information 74 A Tale of Two Software Maturity Models CMMI (Capability Maturity Model Integration) vs. The Joel Test TAXONOMY STRATEGIES The business of organized information 75 CMMI structure Maturity Models are collections of Practices. Main differences in Maturity Models concern: • Descriptivist or Prescriptivist Purpose • Degree of Categorization of Practices • Number of Practices (~400 in CMMI) Taxonomy Strategies LLC The business of organized information Source: http://chrguibert.free.fr/cmmi 76 22 Process Areas, keyed to 5 Maturity Levels… Process Areas contain Specific and Generic Practices, organized by Goals and Features, and arranged into Levels Process Areas cover a broad range of practices beyond simple software development CMMI Axioms: Individual processes at higher levels are AT RISK from supporting processes at lower levels. A Maturity Level is not achieved until ALL the Practices in that level are in operation. Taxonomy Strategies LLC The business of organized information 77 CMMI Positives Independent audits of an organization’s level of maturity are a common service Level 3 certification frequently required in bids “…compared with an average Level 2 program, Level 3 programs have 3.6 times fewer latent defects, Level 4 programs have 14.5 times fewer latent defects, and Level 5 programs have 16.8 times fewer latent defects”. Michael Diaz and Jeff King – “How CMM Impacts Quality, Productivity,Rework, and the Bottom Line” ‘If you find yourself involved in product liability litigation you're going to hear terms like "prevailing standard of care" and "what a reasonable member of your profession would have done". Considering the fact that well over a thousand companies world-wide have achieved level 3 or above, and the body of knowledge about the CMM is readily available, you might have some explaining to do if you claim ignorance’. Linda Zarate in a review of A Guide to the Cmm: Understanding the Capability Maturity Model for Software by Kenneth M. Dymond Taxonomy Strategies LLC The business of organized information 78 CMMI Negatives Complexity and Expense Reading and understanding the materials Putting it into action – identifying processes, mapping processes to model, gathering required data, … Audits are expensive CMMI does not scale down well to small shops Has been accused of restraint of trade Taxonomy Strategies LLC The business of organized information 79 At the other extreme, The Joel Test Developed by Joel Spolsky as reaction to CMMI complexity Positives - Quick, easy, and inexpensive to use. Negatives - Doesn’t scale up well: Not a good way to assure the quality of nuclear reactor software. Not suitable for scaring away liability lawyers. Not a longer-term improvement plan. The Joel Test 1. Do you use source control? 2. Can you make a build in one step? 3. Do you make daily builds? 4. Do you have a bug database? 5. Do you fix bugs before writing new code? 6. Do you have an up-to-date schedule? 7. Do you have a spec? 8. Do programmers have quiet working conditions? 9. Do you use the best tools money can buy? 10. Do you have testers? 11. Do new candidates write code during their interview? 12. Do you do hallway usability testing? Scoring: 1 point for each ‘yes’. Scores below 10 indicate serious trouble. Taxonomy Strategies LLC The business of organized information 80 What does software development “Maturity” really mean? A low score on a maturity audit DOES NOT mean that an organization can’t develop good software It DOES mean that whether the organization will do a good job depends on the specific mix of people assigned to the project In other words, it sets a floor for how bad an organization is likely to do, not a ceiling on how good they can do Probability of failure is a good thing to know before spending a lot of time and money Taxonomy Strategies LLC The business of organized information 81 Towards a Metadata Maturity Model Caveats: Maturity is not a goal, it is a characterization of an organization’s methods for achieving its core goals. Mature processes impose expenses which must be justified by consequent cost savings, revenue gains, or service improvements. Nevertheless, Maturity Models are useful as collections of best practices and stages in which to try to adopt them. TAXONOMY STRATEGIES The business of organized information 82 Basis for initial maturity model CEN study on commercial adoption of Dublin Core Small-scale phone survey Organizations which have world-class search and metadata externally Not necessarily the most mature overall processes or the best internal search and metadata Literature review Client experiences Structure from software maturity models Taxonomy Strategies LLC The business of organized information 83 Initial Metadata Maturity Model (ca. May, 2005) 37 Practices, Categorized by Area, Level, and Importance Practice Area Maturity Level Basic Intermediate Advanced BleedingEdge Search Capabilities Uniform Search Box Query Log Exam. Index Multiple Repos. Best Bets Simple Grouping Intranet Facet Navigation Improved Ranking Metadata and taxonomy standards System MD Stds. Organization MD Std. Reuse ERP Multipe Repos Comply Taxonomy Roadmap Tools and tool selection Requirements, then Tools Bakeoff Datasets Budget for Bakeoffs Staff training and hiring Search Analyst Role Librarian Expertise Pre-hire Testing SME Catalogers Data creation and QA CM Introduced ROT-Eliminatiion Hybrid Creation Model Adaptive Qualification Quality Measures Project management Project Plan Std. Proj. Methodol. X-Functional Teams Communication Plan Multi-Year Plan Early Termination Executive support and ROI External Search ROI Intranet ROI Model CEO knows Search ROI Taxonomy Strategies LLC The business of organized information Limiting Highly Abstract Subject Taxos. Unneeded Capabils. Tools, then Reqs. Use it or Lose It Budgets 84 Shortcomings of the initial model No idea of how it corresponds to actual practice across multiple organizations Some indications that it over-emphasized the sophisticated practices and under-emphasized beginning practices. The initial metadata maturity model can be regarded as a hypothesis about how an organization progresses through various practices as it matures How to test it? Let’s ask! Two surveys to date Surveys are being run in stages because of large number of practices. Ask about future, current, and former practices to gather information on progression Taxonomy Strategies LLC The business of organized information 85 Agenda 9:15 Metadata Definitions 9:30 Maturity Models 9:45 Metadata Maturity Model (ca. 2006) 10:15 Break 10:30 Stock Photo Business 10:40 Data Governance Practices in Stock Photo Agencies 11:40 Summary 11:45 Questions 12:00 Adjourn Taxonomy Strategies LLC The business of organized information 86 Survey 1: Search, Metadata, & Taxonomy Practices The data in this section comes from a survey conducted in the autumn of 2005. TAXONOMY STRATEGIES The business of organized information 87 Participants by Organization Size Taxonomy Strategies LLC The business of organized information 88 Participants by Job Role Taxonomy Strategies LLC The business of organized information 89 Participants by Industry Taxonomy Strategies LLC The business of organized information 90 Search Practices Not current practice Being developed In practice Former practice NA or Unknown Search Box in standard place on all web pages. 20% (12) 11% (7) 62% (38) 2% (1) 5% (3) Search engine indexes multiple repositories in addition to web sites. 25% (15) 21% (13) 44% (27) 2% (1) 8% (5) Spell Checking. 31% (19) 18% (11) 38% (23) 0% (0) 13% (8) Synonym Searching. 41% (25) 23% (14) 30% (18) 0% (0) 7% (4) Search results grouped by date, location, or other factors in addition to simple relevance score. 37% (22) 20% (12) 37% (22) 0% (0) 7% (4) Queries are logged and the logs are regularly examined 31% (19) 25% (15) 31% (19) 5% (3) 8% (5) Common queries identified, 'best' pages for those queries are found, and search engine configured to return them at the top. 46% (28) 25% (15) 21% (13) 0% (0) 8% (5) Advanced computation of relevance based on data in addition to the text of the document. 43% (26) 16% (10) 25% (15) 0% (0) 16% (10) A faceted search tool, such as Endeca, has been implemented for the organization's external site or product catalog search. 68% (41) 7% (4) 10% (6) 0% (0) 15% (9) A faceted search tool, such as Endeca, has been implemented for the organization's internal website(s) or portal. 57% (34) 15% (9) 17% (10) 0% (0) 12% (7) Taxonomy Strategies LLC The business of organized information 91 Metadata PracticesThese two questions were the only ones with much correlation to organization size Not current practice Being developed In practice Former practice NA or Unknown Metadata standards are developed for the needs of each system with no overall attempt to unify them. 22% (13) 12% (7) 37% (22) 20% (12) 10% (6) An Organization-wide metadata standard exists and new systems consider it during development. 37% (22) 37% (22) 20% (12) 0% (0) 7% (4) The Organization-wide metadata standard is based on the Dublin Core. 52% (30) 16% (9) 21% (12) 0% (0) 12% (7) Multiple repositories comply with metadata standard. 52% (31) 20% (12) 17% (10) 0% (0) 12% (7) A Cataloging Policy document exists to teach people how to tag data in compliance with organizational metadata standard. 48% (29) 20% (12) 20% (12) 0% (0) 12% (7) The Cataloging Policy document is revised periodically. 48% (29) 15% (9) 17% (10) 0% (0) 20% (12) A centralized metadata repository exists to aggregate and unify metadata from disparate sources. 57% (34) 17% (10) 17% (10) 0% (0) 10% (6) 15% (9) 12% (7) 61% (36) 3% (2) 8% (5) Metadata is generated automatically by software. 38% (23) 18% (11) 27% (16) 2% (1) 15% (9) Metadata is generated automatically, then reviewed manually for correction. 48% (29) 18% (11) 17% (10) 2% (1) 15% (9) Metadata is manually entered into web forms. Taxonomy Strategies LLC The business of organized information 92 Taxonomy Practices Not current practice Being developed In practice Former practice NA or Unknown Org Chart' Taxonomy - One based primarily on the structure of the organization. 36% (21) 10% (6) 34% (20) 5% (3) 15% (9) 'Products' Taxonomy - One based primarily on the products and/or services offered by the organization. 37% (22) 10% (6) 32% (19) 5% (3) 15% (9) 'Content Types' Taxonomy - One based primarily on the different types of documents. 28% (16) 21% (12) 40% (23) 5% (3) 7% (4) 'Topical' Taxonomy - One based primarily on topics of interest to the site users. 20% (12) 36% (21) 34% (20) 3% (2) 7% (4) 'Faceted' Taxonomy - One which uses several of the approaches above. 32% (19) 29% (17) 34% (20) 0% (0) 5% (3) The Taxonomy, or a portion of it, was licensed from an outside taxonomy vendor. 75% (44) 3% (2) 14% (8) 0% (0) 8% (5) The Taxonomy follows a written 'style guide' to ensure its consistency over time. 47% (28) 22% (13) 20% (12) 0% (0) 10% (6) The Taxonomy is maintained using a taxonomy editing tool other than MS Excel. 35% (21) 17% (10) 40% (24) 2% (1) 7% (4) The Taxonomy was validated on a representative sample of content during its development. 28% (17) 22% (13) 33% (20) 3% (2) 13% (8) A Roadmap for the future evolution of the Taxonomy has been developed. 38% (23) 40% (24) 13% (8) 0% (0) 8% (5) Taxonomy Strategies LLC The business of organized information 93 Survey 2: Business Drivers, Processes, and Staffing The data in this section comes from a survey conducted in the spring of 2006. TAXONOMY STRATEGIES The business of organized information 94 Participants by Job Role Taxonomy Strategies LLC The business of organized information 95 Participants by Tenure Taxonomy Strategies LLC The business of organized information 96 Participants by Industry Taxonomy Strategies LLC The business of organized information 97 Participants by Organization Size Taxonomy Strategies LLC The business of organized information 98 Business Drivers: Search, Metadata, and Taxonomy (SMT) Applications Taxonomy Strategies LLC The business of organized information 99 Business Drivers: Desired Benefits Other desired benefits: Taxonomy Strategies LLC 1 2 3 4 5 6 7 8 9 10 11 Innovation Core to our business product Clients do all the above [From a consultant] Better navigation to diverse State web sites Increased knowledge sharing across the corporation Interoperability Dynamic web applications Improved user search experience Improve R&D Higher value to members [From a non-profit membership org.] For organization to have better understanding of their content The business of organized information 100 ROI: Cost Estimation Taxonomy Strategies LLC The business of organized information 101 Processes Use of search logs is improving Surprisingly sophisticated Basic data quality and communications need improvement Many solo operators Taxonomy Strategies LLC The business of organized information 102 Team Structures & Staffing Taxonomy Strategies LLC The business of organized information 103 Salary Survey Experience 0.6 Nice to see it really counts. Geography 0.5 California and the Northeast have highest salaries. Co. Size 0.5 Not very reliable, big changes from one datapoint Education 0.4 Many taxonomists have MLS or above. Industry 0.4 Surprisingly, retail has high salaries for taxonomists. Role 0.04 Taxonomists paid about like Information Architects Time at current job Taxonomy Strategies LLC -0.07 The business of organized information 104 Notes from Participants There is the constant struggle with individual [magazine] titles to hire trained librarians or data specialists instead of trying to save money by hiring an editor who can build articles AND create and assign metadata. This is a governance issue we have been struggling with since we have no monetary stake in the individual publications. We make recommendations, but have no higher level authority to require titles to hire trained staff for metadata. Reporting metrics have become a new area of confusion as we move to portalized pages consisting of objects in portlets, each with their own metadata. Key organizational issue is that the "problems" that stem from lack of systematic metadata/taxonomy creation are not "owned" by anyone, and consequently have no budget for their solution. Taxonomy Strategies LLC The business of organized information 105 Interim Conclusions TAXONOMY STRATEGIES The business of organized information 106 Observations (1) Practices which a single person or a small group can carry out are more commonly used Not surprising Very different than ERP/BPR, indicates that information management is not being sold to the “C-level” staff. People need to question how inclusive their “Organizational Metadata Standards” and “Taxonomy Roadmaps” actually are. We have found Taxonomy Roadmaps to be an advanced practice, due to a dependence on knowing upcoming IT development schedule Taxonomy Strategies LLC The business of organized information 107 Observations (2) Many of the basics are being skipped More organizations doing “Spell Checking” than “Query Log Analysis”. 69% have a taxonomy change plan, but only 41% have a plan for revisiting data if the taxonomy changes. 64% have a communications plan, but only 56% have a website. This seems to be linked to the previous observation – things that are easy for an individual get done before things that need an organizational effort, despite their level of ‘sophistication’. Taxonomy Strategies LLC The business of organized information 108 Interim Metadata Maturity Model (ca. May, 2006) Basic Practice Area Intermediate Advanced Search Capabilities Uniform Search Box Query Log Exam. Index Multiple Repos. Best Bets Facet Navigation UI Metadata and taxonomy standards System MD Stds. Organization MD Std. Multipe Repos Comply w/ MD Std. Reuse ERP Taxos Taxo Maint. Doc Taxonomy Roadmap Highly Abstract Subject Taxos (e.g. “Moods”) Metadata Maint. Doc Tools and tool selection Requirements, then Tools Bakeoff Datasets Budget for Bakeoffs Staff training and hiring Librarian or IA Expertise Search Analyst Role Cross-Functional Taxonomy Creation Cross-functional taxonomy maint. SME Catalogers Pre-hire Testing Data creation and QA CM Introduced ROT-Eliminatiion Semi-auto tagging Quality Measures Project management Project Plan X-Functional Teams Std. Proj. Methodol. Multi-Year Plan Communication Plan SMT Business Manager, instead of IT Manager Early Termination Executive support and ROI External Search ROI SMT in separate silos Intranet ROI Model CEO knows Search ROI Taxonomy Strategies LLC The business of organized information Limiting Tools, then Reqs. Use it or Lose It Budgets 109 Search and Metadata Maturity Quick Quiz Basic 1) Is there a process in place to examine query logs? 2) Is there a process for adding directories and content to the repository, or do people just do what they want? 3) Is there an organization-wide metadata standard, such as an extension of the Dublin Core, for use by search tools, multiple repositories, etc.? Intermediate 4) Does the search engine index more than 4 repositories around the organization? 5) Does the search engine integrate with the taxonomy to improve searches and organize results? 6) Are there hiring and training practices especially for metadata and taxonomy positions? 7) Is there an ongoing data cleansing procedure to look for ROT (Redundant, Obsolete, Trivial content)? 8) Are tools only acquired after requirements have been analyzed, or are major purchases sometimes made to use up year-end money? Advanced 9) Are there established qualitative and quantitative measures of metadata quality? 10) Can the CEO explain the ROI for search and metadata? Taxonomy Strategies LLC The business of organized information 110 Agenda 9:15 Metadata Definitions 9:30 Maturity Models 9:45 Metadata Maturity Model (ca. 2006) 10:15 Break 10:30 Stock Photo Business 10:40 Data Governance Practices in Stock Photo Agencies 11:40 Summary 11:45 Questions 12:00 Adjourn Taxonomy Strategies LLC The business of organized information 111 Agenda 9:15 Metadata Definitions 9:30 Maturity Models 9:45 Metadata Maturity Model (ca. 2006) 10:15 Break 10:30 Stock Photo Business 10:40 Data Governance Practices in Stock Photo Agencies 11:40 Summary 11:45 Questions 12:00 Adjourn Taxonomy Strategies LLC The business of organized information 112 Stock Photo Business Advertising, Editorial Content, Corporate Communications, and many other types of content rely on images to convey information and moods. When time and/or budget does not allow a commissioned shoot, stock photo houses can supply images. Fundamental problem for users: How to search for an image that conveys what you want? Fundamental problem for houses: How to describe images so that users can find them? Taxonomy Strategies LLC The business of organized information 113 How would you search for this image? Taxonomy Strategies LLC The business of organized information 114 Tagging by emotions Taxonomy Strategies LLC The business of organized information 115 “silence” Image Rights Criteria Objective criteria Conceptual refinement Taxonomy Strategies LLC The business of organized information 116 Clarification: Finger on Lips Taxonomy Strategies LLC The business of organized information 117 Scrolling through results… This is more of the mood I’m looking for… Taxonomy Strategies LLC The business of organized information 118 More like this Taxonomy Strategies LLC The business of organized information 119 Facets at gettyimages.com Taxonomy Strategies LLC The business of organized information 120 Key Questions Getty Images (and Corbis) have put a lot of effort into their websites for image purchase*. Internal staff at such organizations tell me that their intranets are nowhere near as easy to use. ROI is the reason why. Recall that retail had high salaries for taxonomists, because the ROI for a better shopping site is so clear. The front-ends are dependent on data. How is that data governed? How does that differ from how their intranets are governed? *Licensing, not purchasing, to be pedantic. Taxonomy Strategies LLC The business of organized information 121 Agenda 9:15 Metadata Definitions 9:30 Maturity Models 9:45 Metadata Maturity Model (ca. 2006) 10:15 Break 10:30 Stock Photo Business 10:40 Data Governance Practices in Stock Photo Agencies 11:40 Summary 11:45 Questions 12:00 Adjourn Taxonomy Strategies LLC The business of organized information 122 Who are the users & what are they looking for? Only 30-40% of organizations regularly examine their logs. Sophisticated software available, but don’t wait. 80% of value comes from basic reports Taxonomy Strategies LLC The business of organized information 123 Query log & click trail examination— Click trail packages iWebTrack NetTracker OptimalIQ SiteCatalyst Visitorville WebTrends Overkill Taxonomy Strategies LLC The business of organized information 124 Query log & click trail examination– Query log UltraSeek Reporting Top queries Queries with no results Queries with no click-through Most requested documents Query trend analysis Complete server usage summary Basic queries provide most of the value if organization has a process to review what is going one. Taxonomy Strategies LLC The business of organized information 125 Key Governance Aspects Roles and Responsibilities – Managers Reviewers Policies – For naming Required Fields Procedures – For reviewing and approving metadata placement For acting on poor metadata application Taxonomy Strategies LLC The business of organized information 126 Recommended Measure and Improve Mindset Measure - Determine current situation and what is wrong. • Too many documents in a category? Too many categories? People complaining about not finding material that is on the site? People asking for materials not on the site? Common searches without results? Decide – Decide how to change things to fix the problem. • Change navigation list? Add new categories? Add synonyms to search? Create new content? Confirm – Before rolling out changes, test them to make sure they will improve the problem. • Usability tests, Card sorts, Internal functionality tests, … Implement – Roll out the changes. Repeat – Monitor people’s behavior on the site as well as responding to reported problems. • Query log examination, Clicktrail examination, Google search result position, Stakeholder feedback, User surveys, Site analytics, etc. Taxonomy Strategies LLC The business of organized information 127 Taxonomy team: Generic roles Keeps team on track with larger business objectives. Stakeholder Committee Reality check on process change suggestions. Balances cost/benefit issues to decide appropriate levels of effort. Obtains needed resources if those on committee can’t accomplish a particular task. Content Owners Business Lead Technical Specialist Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc. Helps obtain data from various systems. Committee’s liaison to content creators. Content Specialist Estimates costs of proposed changes in terms of editorial Taxonomy Specialist Suggests potential taxonomy changes based on analysis of Taxonomy Strategies LLC process changes, additional or reduced workload, etc. query logs, indexer feedback. Makes edits to taxonomy, installs into system with aid of IT specialist. The business of organized information 128 Recommended Reading CMMI: http://chrguibert.free.fr/cmmi (Official site is http://www.sei.cmu.edu/cmmi/, but that is not the most comprehensible.) Joel Test http://www.joelonsoftware.com/articles/fog0000000043.html EIA Roadmap http://www.louisrosenfeld.com/presentations/031013-KMintranets.ppt Enterprise Search Report http://www.cmswatch.com/EntSearch/ Taxonomy Strategies LLC The business of organized information 129 Fun Questions The animals are divided into: (a) belonging to the emperor, (b) embalmed, (c) tame, (d) sucking pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from along way off look like flies. This was created to be as bad a classification as possible. What makes it so bad? Jorge Luis Borges, " THE ANALYTICAL LANGUAGE OF JOHN WILKINS" Works in 3 volumes (in Russian). St. Petersburg, "Polaris", 1994. V. 2: 87. Taxonomy Strategies LLC The business of organized information 130 Taxonomy Strategies LLC Contact Info Ron Daniel, Jr. 925-368-8371 rdaniel@taxonomystrategies.com Copyright 2009Taxonomy Strategies LLC. All rights reserved.