Make it easier… …for readers to find, use and understand data

advertisement
OECD’S APPROACH
TO FACILITATING
RESEARCH
Semantic Tagging, Discoverability and
Accessibility
Terri Mitton, OECD Publishing
terri.mitton@oecd.org
Strategies for improving findabilty
Accessibility
Discoverability
Tagging
2
Accessibility
Accessibility
3
DATA PORTAL
Home page
Discover statistics in
various formats
(search, browse by topic,
country)
Quick access to
OECD.Stat and tools for
statisticians
API services (for those
who understand such
things)
https://data.oecd.org
4
DATA PORTAL
Topic page
Quick access to
charts, maps and
publications
Quick access to
datasets,
ready-made
tables
Go from main indicator to
more detail…
Easy to use, understand
Real-time data in easy
to use charts
Definition
Link to source database
Links to related
indicators and
publications
6
Compare countries
Trends and rankings
Trends and
country
ranking for
selected
indicators
7
How do we make data easy to find and
understand?
8
Discoverability
Discoverability
9
From data to discoverable content
From data to discoverable content
Discovery via OECD Search
General public
Researcher
Search…
www.oecd-ilibrary.org
Chapters
Tables
Indicators
Databases
…
Search…
data.oecd.org
Indicators
Databases 12
Publications
Is it enough still ? No !!
Special web search (e.g.
Scirus)
10%
Specialist bibliographic
database
10%
General web search (e.g.
Google)
10%
Library systems
10%
Specialist portal (e.g.
Repec)
6%
Email alerts
14%
Content aggregator (e.g.
Proquest)
9%
Publisher's website
14%
Source: Gardner and Inger (2012): How readers
discover content in scholarly journals
Author's website
5%
Community service (e.g.
Mendeley)
6%
Website managed by key
authors in field
6%
Discovery usual suspects
For professionals . . .
. . . and the public
Intensive work with industry key players
Now indexing our
170.000
published objects
including
datasets, tables…..
But we have also learned
to let go
we now encourage anyone to
read and then
share and embed
our publications in
their websites
and blogs for free
Embedded full books and charts
Semantic tagging
Tagging
Searching
18
Semantic Enrichment…
• Text analysis tools that identifies pertinent
information
• Combs through documents and extracts
concepts
• To enrich the documents we use ‘skill
cartridges’ which contain taxonomies and
specific business rules.
19
Example taxonomy
« disabled students »
« disabled students »
« disabled students »
« handicapped students » ?
« disabled children »?
20
Not only…
21
Or…
22
OECD Taxonomy
A
B1
C1
B2
C2
D1
A
A
C3
D2
B1
C4
D3
E1
C1
D4
E2
F1
C3
D2
1. Topics
D4
B2
C2
D1
C3
D2
2. Geographical areas
F2
G1
G2
D4
E2
F1
F2
G1
C4
D3
E1
E2
F1
G2
C1
C4
D3
E1
F2
G1
C2
D1
B1
B2
G2
3. Document metadata
Different ways of classifying OECD work
23
External
Sources (RSS)
Documents,
Publications
Fragment
<XML>
xxxxxxxx
Xxxxxxxxxxx
Xxxxxxxxxxx
Xxxxxxxxxxx
Xxxxxxxxxxx
</XML>
Content
<XML>
xxxxxxxx
Xxxxxxxxxxx
Xxxxxxxxxxx
Xxxxxxxxxxx
Xxxxxxxxxxx
</XML>
<xml>xxxxxxx</xml>
Annotations
Triples
Semantic
Enrichment
OECD Subjects
Finance
Fiscal affairs
Fraud
Vocabularies
<<Temis Luxid>>
<<Triple Store>>
Education
Skills
Attainment
Linked Business Data
Candidate Terms
T2
P
Business
Logic
E
S
C1
P
R
T1
C2
P olicy
E vidence
S tate of Affairs
R ecommendation
C ountry
T heme
…
Linking,
Inferencing
Triples
<<RDF>>
<<Triple Store>>
24
OECD Semantic enrichment factory
Documents,
Publications
Extract « fragments »
Fragment
<XML>
xxxxxxxx
Xxxxxxxxxxx
Xxxxxxxxxxx
Xxxxxxxxxxx
Xxxxxxxxxxx
</XML>
<xml>xxxxxxx</xml>
Annotations
<xml>Industry</xml>
<xml>Finance</xml
Store annotations as triples
Submit « fragments »
to Luxid
Annotation
Web Service
<<Luxid Annotation Factory>>
<<cartridge>>
<<cartridge>>
<<cartridge>>
Document
Taxonomy
(*)
P.E.R.S.
Classification Annotation Factory Enrichment
Fragment
Classification
…
Triple Store
…
(*) Policy/Evidence/Recommendation/State-of-Affairs
25
OECD.Discover
Content
<XML>
xxxxxxxx
Xxxxxxxxxxx
Xxxxxxxxxxx
Xxxxxxxxxxx
Xxxxxxxxxxx
</XML>
Annotations
Linked Data
26
« Education policy » use cases
UC-1
UC-2
UC-3
UC-8
UC-4a
UC-4b
UC-5
UC-6
27
OECD.Discover statistics
132 OECD publications
321,000 fragments
195,800 PERS
objects
6,300 recommendations
72,300 evidences
28,000 policies
89,200 state of affairs
28
Integration
Web services and connectors have been developed to facilitate the
integration of the taxonomy and semantic tools with the different OECD
applications
Luxid (Temis) and Sharepoint 2010 integration tested and the principle
validated
Luxid (Temis) and OpenText Content Server (OECD.Records)
integration tested and in production
29
Thank you &
Questions?
Terri.mitton@oecd.org
Download