Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com Agenda Introduction – 2.0 is really 1.35 Semantic Search - Integrated Design Examples – Good, Bad, Ugly – Themes and Conclusions – Integrated Solutions – How to Beat the Crowd – People, Technology, Tags, Semantics Conclusion 2 KAPS Group: General Knowledge Architecture Professional Services Virtual Company: Network of consultants – 12-15 Partners – FAST, Inxight, Siderean,Nstein, etc. Consulting, Strategy, Knowledge architecture audit Taxonomies: Enterprise, Marketing, Insurance, etc. Services: – Taxonomy development, consulting, customization – Technology Consulting – Search, CMS, Portals, etc. – Metadata standards and implementation – Knowledge Management: Collaboration, Expertise, e-learning – Applied Theory – Faceted taxonomies, complexity theory, natural categories 3 2.0 – Reality Check - General Evolution, not Revolution Tyranny of the majority - worst type of central authority More Madness of Crowds than Wisdom of Crowds Enterprise 2.0 – still looking for a problem to solve – Social Networking is a small part of business “Things fall apart; the center cannot hold; Mere anarchy is loosed upon the world,… The best lack all conviction, while the worst Are full of passionate conviction.” - The Second Coming – W.B. Yeats 4 2.0 – Reality Check - Search Folksonomies don’t compare with taxonomies or ontologies Serendipity browsing is small part of search Fundamental Limits Limited areas of success – popular sites are popular – Quality Content – finance, science, etc – not good candidates – No mechanism for improving folksonomies – Scale – Too Big (million hits) – Too Little (200 items) – Amazon and LibraryThing – Need intrinsic value of tagging – not tagging for better tags Bad Tags - idiosyncratic or too broad, errors, limited reach – – Most people can’t tag very well – learned skill 5 Semantics and Search: An Integrated Approach: Elements Multiple Knowledge Structures – – – Facet – orthogonal dimension of metadata Taxonomy - Subject matter / aboutness Ontology – Relationships / Facts • Subject – Verb - Object Software - Text analytics, auto-categorization, entity extraction People – tagging, evaluating tags, fine tune rules and taxonomy People – Users, social tagging, suggestions Rich Search Results – context and conversation 6 7 8 9 10 11 12 Integrated Design – Facets & Semantics Design Issues - General What is the right combination of elements? – Faceted navigation, metadata, browse, search, categorized search results, file plan What is the right balance of elements? – Dominant dimension or equal facets Full Facets – Multiple intersecting filters – 1 or 2 filters (source / type) – No When to combine search, topics, and facets? – – Search first and then filter by topics / facet Browse/facet front end with a search box 13 Integrated Design – Facets & Semantics Design Issues - General Good Information Architecture – – – – – Space wars – summary or full facet display Simplicity vs. research power Source and Type are basics Standard Facets – People, Companies, Place, Industry Interactive interface – sliders, date ranges Semantics still hardest – summaries, related, rank Taxonomy – just another facet? – Keywords vs. simple taxonomy Tag Clouds / Clusters – how useful? Feedback – numbers of stories vs. top stories 14 Integrated Design – Facets & Semantics Design Issues - Users Homogeneity of Audience and Content Model of the Domain – broad – How many facets do you need? – More facets and let users decide – Allow for customization – can’t define a single set User Analysis – tasks, labeling, communities • Issue – labels that people use to describe their business and label that they use to find information Match the structure to domain and task – Users can understand different structures 15 Integrated Solution: Enterprise and eCommerce Semantics, Technology, People, Policy Design the right balance for each area – – Products – facets, Publishing – more software emphasis – for tags Enterprise – more precise targets, high quality content, more direct role for policy New Relationship of Central and Crowd – Not top down or bottom up – Interpenetration of opposites Variety of Knowledge structures – Folksonomies, taxonomies, ontologies, facets 16 Integrated Solutions: Technology Text Analytics – Taxonomy management, entity extraction, categorization, sentiment – – Auto-populate variety of metadata – author, title, date, etc. Relevance – best bets to weights and classes of documents Search – Integrated features, facets and clusters and tag clouds and feedback Enterprise Content Management – – Place to add metadata, supported by policy Gather input from authors, tag clouds plus 17 Integrated Solution: People Programmers, Librarians, Taxonomists, Metadata specialist – Integrate, design, develop rules, monitor activity & quality Authors, Subject Matter Experts – Input into design (important facets), rules, activity meaning Users – Web 2.0 – – – Feedback – quality and usability Suggestions – missing terms, bad categorization & entity Tags Clouds & folksonomy – for social networking features, not for information retrieval 18 Conclusions 90% of what you hear about Folksonomies (2.0) is hype – again – Folksonomies are a great source for first drafts and social research – Social Networking is really good – for social networking Semantic Infrastructure solution (people, policy, technology, semantics) and feedback is best approach Integrated design is essential – not facets as add on Semantics is still not there – hardest, but some progress Text Analytics (Entity extraction and auto-categorization) are essential Future – new kinds of applications: – Text Mining, research tools, sentiment 19 Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com