Semantic MediaWiki Approach to Metadata Scott E. Thompson Manager - Data Architecture Ontario Teachers’ Pension Plan 2 Agenda 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up Why? Mashup of slides I’ve used before… – What is Semantic MediaWiki? – Proof of Concept – The Unexpected Wrap Up Questions 3 pinterest.com/thompland777 1. Why? 2. SMW? 3. The PoC 4. The Unexpected SELECT ?Person WHERE { ?Person :hasExperience :Semantic Technologies. ?Person :hasExperience :Meta Data. ?Person :hasExperience :Capital Markets } 5. Wrap Up 4 Ontario Teachers’ Pension Plan 1. Why? 2. SMW? 3. The PoC Fixed Income Public Equities Private Capital Real Estate Infrastructure Foreign Currency Commodities Hedge Funds 4. The Unexpected 5. Wrap Up 5 The Challenge: Metadata 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 6 Current: Low Confidence 1. Why? 3. The PoC 2. SMW? 42? 4. The Unexpected ETL IT Correct Trade Data Warehouse Reload Rerun Report Reload Data 5. Wrap Up 7 Future: Nirvana 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 8 Business Requirements 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up Findability of Data Ownership of Data Data Quality Consistent Business Terminology Added later… Ownership of Metadata Metadata Quality 9 Business Requirements 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up Value of Meta Data & Meta Data Tool 1. Allows business users / end users to gain the required insight into what the data and reports they are looking at means 2. Makes data available and visible to others 3. Creates a searchable set of information about the firm’s data. This allows data developers and users to search for existing data and avoid data duplication. 4. Provides a platform for sharing and publicizing data. This reduces the workload of developers (interfaces, reports, etc.) and users and increases efficiency. 5. Quality control, data restrictions and uses can be applied to the entire data set. 6. Metadata documentation transcends people and time. Staff turnover and balancing of multiple projects can be mitigated with metadata, providing data permanence and the documentation of institutional knowledge. 10 MDM? 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up MDM could stand for Master Data Management or Meta Data Management… coincidence? “Lets go get all the key pieces of data and put them in one place, which is really more of an enterprise data warehouse but master data management then says… it’s almost a map… here is what each of those data fields are, here is how you can find them, here is what they mean, here is where they came from.” Blake Johnson Consulting Professor Stanford University “The Truth and Power of Master Data Management” (Teradata) http://www.youtube.com/watch?feature=player_embedded&v=p6VHpIlDfu4#! 11 One Truth? 1. Why? 3. The PoC 2. SMW? 4. The Unexpected Pre-Trade Investment Strategy & Planning Portfolio Research & Analytics 5. Wrap Up Post-trade Trade & Deal Management Securities Operations Collateral &Cash Management Portfolio Accounting V = f(trade, market context, model, business context) Trades Trades Reconciliation Market Context Market Context Model Model Trades Business Context Business Context Market Context Model Business Context Total Fund Reporting Market Risk Management Credit & Counterparty Risk Management Liquidity Risk Management Performance Compliance 12 What is a Wiki? 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up Hawaiian for “quick” Allows large numbers of people to create and edit the same content Effective for reaching a credible consensus from a large group Wikipedia is the world’s largest collaboratively edited source of encyclopedic knowledge 13 What is the Semantic Web? 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 14 MediaWiki (Web 2.0) 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 15 Semantic MediaWiki (Web 3.0) 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 16 Future Opportunities 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up Simple search algorithms would suffice to provide a precise answer to the question… 17 Faceted Search 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 18 Graphs (relate/infer) 1. Why? 3. The PoC 2. SMW? otpp:Index-Linked Bond dbpedia: otpp:Fixed-Rate Bond Inflation-Linked Bond otpp:Amortizing Index-Linked Bond otpp:IndexLinked Bond 4. The Unexpected subClassOf otpp:Debt subClassOf otpp:Debt subClassOf otpp:Index-Linked Bond <sameAs> 5. Wrap Up dbpedia:Inflation Linked Bond 19 Who Needs Consistency? 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 20 Linked Open Data Graph (OLD) 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 21 1. Why? FIBO 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 22 Proof of Concept 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up Build a knowledgebase about: 1. Our structured data (schemas, tables, columns) 2. Our business terminology (business process, products, attributes) Prove that the technology could: 1. Automatically load technical metadata and relate it with business metadata 2. Customize workflow to collect and govern the manual business input 23 Data Architecture Ontology 1. Why? 3. The PoC 2. SMW? 4. The Unexpected Schema Group BelongsToA Schema IsPartOfA Instances: ACCT MREF MKT FIQR Table Instances: Table1 Table2 View1 View2 Instances: TOOLKIT CORE PRODUCT FUNCTIONAL BUAD 5. Wrap Up 24 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up Data Management Ontology 1. Why? 3. The PoC 2. SMW? 4. The Unexpected 5. Wrap Up Table ha sA hasDataOwner A ha has 25 sD at aS te w ar d Organizational Group Quality State Instances: User Authoratative Instances: Investment Division – Asset Mix & Risk Finance Division – Data Management SLA Instances: SLA1 SLA2 26 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 27 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 28 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 29 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 30 Workflow 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 31 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 32 Product Attribute Ontology 1. Why? 3. The PoC 2. SMW? 4. The Unexpected 5. Wrap Up CallsA ReferencesA Product Group Stored Procedure belongsToA Table ha sD hasA M ua Q Product es yT lit Quality Test t ha s A ttrib u Instances: Missing Stale Null Value Comparative Tolerance Changed Column s get om aFr Dat te Focus on this data entry form Product Attribute Metadata to be curated by DM Metadata to be curated by AM&R 33 % Sourced from Core Schemas? 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up {{#sparql: SELECT DISTINCT ?Product ?Product_attribute ?Column ?Schema WHERE { ?Product property:HasAttribute ?Product_Attribute . ?Product_attribute property:GetsDataFrom ?Column . ?Column MDM:belongsToSchema ?Schema . } |merge=true|link=all}} 34 Data Management Indexes 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 35 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 36 It’s a New Kind of Database! 1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up 37 SMW+ in a nutshell 1. Why? 2. SMW? MediaWiki 3. The PoC 4. The Unexpected 5. Wrap Up Semantic MediaWiki Web Server WYSIWYG extension Enhanced Retrieval Extension Deployment Framework “The smartest organizations are not those with the smartest people but those with the quickest access to their collective knowledge” - Rod Collins (wiki-management.com)