Semantic MediaWiki Approach to Metadata

advertisement
Semantic MediaWiki
Approach to Metadata
Scott E. Thompson
Manager - Data Architecture
Ontario Teachers’ Pension Plan
2
Agenda
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
Why?
Mashup of slides I’ve used before…
– What is Semantic MediaWiki?
– Proof of Concept
– The Unexpected
Wrap Up
Questions
3
pinterest.com/thompland777
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
SELECT ?Person
WHERE { ?Person :hasExperience :Semantic Technologies.
?Person :hasExperience :Meta Data.
?Person :hasExperience :Capital Markets }
5. Wrap Up
4
Ontario Teachers’ Pension Plan
1. Why?
2. SMW?
3. The PoC
Fixed Income
Public Equities
Private Capital
Real Estate
Infrastructure
Foreign Currency
Commodities
Hedge Funds
4. The Unexpected
5. Wrap Up
5
The Challenge: Metadata
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
6
Current: Low Confidence
1. Why?
3. The PoC
2. SMW?
42?
4. The Unexpected
ETL
IT
Correct
Trade
Data Warehouse
Reload
Rerun
Report
Reload
Data
5. Wrap Up
7
Future: Nirvana
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
8
Business Requirements
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
Findability of Data
Ownership of Data
Data Quality
Consistent Business Terminology
Added later…
Ownership of Metadata
Metadata Quality
9
Business Requirements
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
Value of Meta Data & Meta Data Tool
1. Allows business users / end users to gain the required insight into what the data
and reports they are looking at means
2. Makes data available and visible to others
3. Creates a searchable set of information about the firm’s data. This allows data
developers and users to search for existing data and avoid data duplication.
4. Provides a platform for sharing and publicizing data. This reduces the workload
of developers (interfaces, reports, etc.) and users and increases efficiency.
5. Quality control, data restrictions and uses can be applied to the entire data set.
6. Metadata documentation transcends people and time. Staff turnover and
balancing of multiple projects can be mitigated with metadata, providing data
permanence and the documentation of institutional knowledge.
10
MDM?
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
MDM could stand for Master Data Management
or Meta Data Management… coincidence?
“Lets go get all the key pieces of data and put
them in one place, which is really more of an
enterprise data warehouse but master data
management then says… it’s almost a map…
here is what each of those data fields are,
here is how you can find them, here is what
they mean, here is where they came from.”
Blake Johnson
Consulting Professor
Stanford University
“The Truth and Power of Master Data Management” (Teradata)
http://www.youtube.com/watch?feature=player_embedded&v=p6VHpIlDfu4#!
11
One Truth?
1. Why?
3. The PoC
2. SMW?
4. The Unexpected
Pre-Trade
Investment
Strategy &
Planning
Portfolio
Research &
Analytics
5. Wrap Up
Post-trade
Trade &
Deal
Management
Securities
Operations
Collateral &Cash
Management
Portfolio
Accounting
V = f(trade, market context, model, business context)
Trades
Trades
Reconciliation
Market Context
Market Context
Model
Model
Trades
Business Context
Business Context
Market Context
Model
Business Context
Total Fund Reporting
Market
Risk
Management
Credit & Counterparty
Risk
Management
Liquidity
Risk
Management
Performance
Compliance
12
What is a Wiki?
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
Hawaiian for “quick”
Allows large numbers of people to
create and edit the same content
Effective for reaching a credible
consensus from a large group
Wikipedia is the world’s largest
collaboratively edited source of
encyclopedic knowledge
13
What is the Semantic Web?
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
14
MediaWiki (Web 2.0)
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
15
Semantic MediaWiki (Web 3.0)
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
16
Future Opportunities
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
Simple search algorithms would
suffice to provide a precise answer
to the question…
17
Faceted Search
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
18
Graphs (relate/infer)
1. Why?
3. The PoC
2. SMW?
otpp:Index-Linked Bond
dbpedia:
otpp:Fixed-Rate Bond
Inflation-Linked Bond
otpp:Amortizing
Index-Linked Bond
otpp:IndexLinked Bond
4. The Unexpected
subClassOf
otpp:Debt
subClassOf
otpp:Debt
subClassOf
otpp:Index-Linked Bond
<sameAs>
5. Wrap Up
dbpedia:Inflation
Linked Bond
19
Who Needs Consistency?
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
20
Linked Open Data Graph (OLD)
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
21
1. Why?
FIBO
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
22
Proof of Concept
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
Build a knowledgebase about:
1. Our structured data (schemas, tables,
columns)
2. Our business terminology (business
process, products, attributes)
Prove that the technology could:
1. Automatically load technical metadata
and relate it with business metadata
2. Customize workflow to collect and
govern the manual business input
23
Data Architecture Ontology
1. Why?
3. The PoC
2. SMW?
4. The Unexpected
Schema Group
BelongsToA
Schema
IsPartOfA
Instances:
ACCT
MREF
MKT
FIQR
Table
Instances:
Table1
Table2
View1
View2
Instances:
TOOLKIT
CORE
PRODUCT
FUNCTIONAL
BUAD
5. Wrap Up
24
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
Data Management Ontology
1. Why?
3. The PoC
2. SMW?
4. The Unexpected
5. Wrap Up
Table
ha
sA
hasDataOwner
A
ha
has
25
sD
at
aS
te
w
ar
d
Organizational
Group
Quality State
Instances:
User
Authoratative
Instances:
Investment Division – Asset Mix & Risk
Finance Division – Data Management
SLA
Instances:
SLA1
SLA2
26
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
27
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
28
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
29
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
30
Workflow
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
31
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
32
Product Attribute Ontology
1. Why?
3. The PoC
2. SMW?
4. The Unexpected
5. Wrap Up
CallsA
ReferencesA
Product Group
Stored Procedure
belongsToA
Table
ha
sD
hasA
M
ua
Q
Product
es
yT
lit
Quality Test
t
ha
s
A
ttrib
u
Instances:
Missing
Stale
Null Value
Comparative
Tolerance
Changed
Column
s
get
om
aFr
Dat
te
Focus on this data entry form
Product Attribute
Metadata to be curated by DM
Metadata to be curated by AM&R
33
% Sourced from Core Schemas?
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
{{#sparql: SELECT DISTINCT ?Product ?Product_attribute ?Column ?Schema
WHERE { ?Product property:HasAttribute ?Product_Attribute . ?Product_attribute
property:GetsDataFrom ?Column . ?Column MDM:belongsToSchema ?Schema . }
|merge=true|link=all}}
34
Data Management Indexes
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
35
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
36
It’s a New Kind of Database!
1. Why?
2. SMW?
3. The PoC
4. The Unexpected
5. Wrap Up
37
SMW+ in a nutshell
1. Why?
2. SMW?
MediaWiki
3. The PoC
4. The Unexpected
5. Wrap Up
Semantic
MediaWiki
Web Server
WYSIWYG extension
Enhanced Retrieval Extension
Deployment Framework
“The smartest organizations are not
those with the smartest people but
those with the quickest access to their
collective knowledge”
- Rod Collins (wiki-management.com)
Download