OECD’S APPROACH TO FACILITATING RESEARCH Semantic Tagging, Discoverability and Accessibility Terri Mitton, OECD Publishing terri.mitton@oecd.org Strategies for improving findabilty Accessibility Discoverability Tagging 2 Accessibility Accessibility 3 DATA PORTAL Home page Discover statistics in various formats (search, browse by topic, country) Quick access to OECD.Stat and tools for statisticians API services (for those who understand such things) https://data.oecd.org 4 DATA PORTAL Topic page Quick access to charts, maps and publications Quick access to datasets, ready-made tables Go from main indicator to more detail… Easy to use, understand Real-time data in easy to use charts Definition Link to source database Links to related indicators and publications 6 Compare countries Trends and rankings Trends and country ranking for selected indicators 7 How do we make data easy to find and understand? 8 Discoverability Discoverability 9 From data to discoverable content From data to discoverable content Discovery via OECD Search General public Researcher Search… www.oecd-ilibrary.org Chapters Tables Indicators Databases … Search… data.oecd.org Indicators Databases 12 Publications Is it enough still ? No !! Special web search (e.g. Scirus) 10% Specialist bibliographic database 10% General web search (e.g. Google) 10% Library systems 10% Specialist portal (e.g. Repec) 6% Email alerts 14% Content aggregator (e.g. Proquest) 9% Publisher's website 14% Source: Gardner and Inger (2012): How readers discover content in scholarly journals Author's website 5% Community service (e.g. Mendeley) 6% Website managed by key authors in field 6% Discovery usual suspects For professionals . . . . . . and the public Intensive work with industry key players Now indexing our 170.000 published objects including datasets, tables….. But we have also learned to let go we now encourage anyone to read and then share and embed our publications in their websites and blogs for free Embedded full books and charts Semantic tagging Tagging Searching 18 Semantic Enrichment… • Text analysis tools that identifies pertinent information • Combs through documents and extracts concepts • To enrich the documents we use ‘skill cartridges’ which contain taxonomies and specific business rules. 19 Example taxonomy « disabled students » « disabled students » « disabled students » « handicapped students » ? « disabled children »? 20 Not only… 21 Or… 22 OECD Taxonomy A B1 C1 B2 C2 D1 A A C3 D2 B1 C4 D3 E1 C1 D4 E2 F1 C3 D2 1. Topics D4 B2 C2 D1 C3 D2 2. Geographical areas F2 G1 G2 D4 E2 F1 F2 G1 C4 D3 E1 E2 F1 G2 C1 C4 D3 E1 F2 G1 C2 D1 B1 B2 G2 3. Document metadata Different ways of classifying OECD work 23 External Sources (RSS) Documents, Publications Fragment <XML> xxxxxxxx Xxxxxxxxxxx Xxxxxxxxxxx Xxxxxxxxxxx Xxxxxxxxxxx </XML> Content <XML> xxxxxxxx Xxxxxxxxxxx Xxxxxxxxxxx Xxxxxxxxxxx Xxxxxxxxxxx </XML> <xml>xxxxxxx</xml> Annotations Triples Semantic Enrichment OECD Subjects Finance Fiscal affairs Fraud Vocabularies <<Temis Luxid>> <<Triple Store>> Education Skills Attainment Linked Business Data Candidate Terms T2 P Business Logic E S C1 P R T1 C2 P olicy E vidence S tate of Affairs R ecommendation C ountry T heme … Linking, Inferencing Triples <<RDF>> <<Triple Store>> 24 OECD Semantic enrichment factory Documents, Publications Extract « fragments » Fragment <XML> xxxxxxxx Xxxxxxxxxxx Xxxxxxxxxxx Xxxxxxxxxxx Xxxxxxxxxxx </XML> <xml>xxxxxxx</xml> Annotations <xml>Industry</xml> <xml>Finance</xml Store annotations as triples Submit « fragments » to Luxid Annotation Web Service <<Luxid Annotation Factory>> <<cartridge>> <<cartridge>> <<cartridge>> Document Taxonomy (*) P.E.R.S. Classification Annotation Factory Enrichment Fragment Classification … Triple Store … (*) Policy/Evidence/Recommendation/State-of-Affairs 25 OECD.Discover Content <XML> xxxxxxxx Xxxxxxxxxxx Xxxxxxxxxxx Xxxxxxxxxxx Xxxxxxxxxxx </XML> Annotations Linked Data 26 « Education policy » use cases UC-1 UC-2 UC-3 UC-8 UC-4a UC-4b UC-5 UC-6 27 OECD.Discover statistics 132 OECD publications 321,000 fragments 195,800 PERS objects 6,300 recommendations 72,300 evidences 28,000 policies 89,200 state of affairs 28 Integration Web services and connectors have been developed to facilitate the integration of the taxonomy and semantic tools with the different OECD applications Luxid (Temis) and Sharepoint 2010 integration tested and the principle validated Luxid (Temis) and OpenText Content Server (OECD.Records) integration tested and in production 29 Thank you & Questions? Terri.mitton@oecd.org