SDMX - Paris21

advertisement
SDMX
Basics
David Barraclough
OECD SDMX Coordinator
Overview
•
•
•
•
•
What is SDMX?
Why SDMX?
SDMX at OECD
How to start with SDMX?
Some SDMX concepts
– How is data exchanged
– The main tools
– Content-oriented guidelines
• Future of SDMX
What is SDMX (not)?
Not simply a technical format!
What is SDMX?
• Statistical Data and Metadata eXchange
• Released in 2002 “SDMX is an initiative to
foster standards for the exchange of statistical
information.”
• Sponsor organisations:
– BIS, ECB, EUROSTAT, IMF, OECD, UN, World Bank
What is SDMX?
•
•
•
•
•
Format: XML and EDI (rebranded GESMES)
SDMX Information model
Web service standards: APIs
SDMX Registry standards
Content-oriented guidelines
Why SDMX? The Business Case
• Reusable, open-source (free) tools save money and time
• Standard codes and naming help improve reuse and save
time
– Reuse of categories
– Less mapping/data processing saving
– Shopping list of concepts when defining structures
• Strongly-typed structures help improve validation and
processing
– Heavy-lifting processing of data messages can be automated
– Text format is human-readable
– Easier to create new tools around the agreed format
Why SDMX? The Business Case
• Standard technical architecture promotes more timely, better
quality data
– Timely because less manual conversion is needed
– Quality because automated processing means less human error
• SDMX Information model
– Provides a common terminology
– Makes tool development much easier
– Information model described later
What’s in it for Data Reporters?
• SDMX Registry helps structure metadata
• SDMX Tools exist and are free
• One dissemination channel instead of
packaging data for multiple consumers
• Can easily disseminate SDMX from existing
data warehouse with SDMX-RI
• Lots of SDMX methodology available and
growing
Why not use…?
Issues
CSV
Not structured, hard to validate
No metadata
Excel
Metadata tied to presentation
Proprietary format
Licensing
Hard to process and automate
FAME, SAS,
STATA
Proprietary format
Licensing
GESMES
No information model
Proprietary format
Few tools or international support
XML
No context to tags
SDMX adds context to XML
XBRL, DDI
Not focused on aggregated data exchange
The SDMX XML Data file format:
“Global DSDs”
Domains at various stages of implementation:
• National Accounts
• Balance of Payments
• Foreign Direct Investment FDI
In draft:
• Harmonized Trade IMTS
• R&D
• Education
Many other “Shared ” DSDs.
SDMX at OECD
Harmonized Trade data
• Synchronised from UN database every night
• Only “Delta” is synched in our database. Required because
trade database is huge
• SDMX standards support querying the delta for a given date
Harmonized Trade
SDMX at OECD
• OECD.Stat SDMX web service is used for:
– Data resellers receive data in standard
format, easy to process
– Incremental updates are possible by
slicing data
– Querying autonomously. Standard API
is easy to use in programs
SDMX at OECD
<Demo of OECD.Stat web service>
How to start with SDMX?































Data Structure Definition
Data set
Structure specific data set
Structure specific time series data set
Generic time series data set
Generic data set
Data flow
Data flow definition
Category
Category map
Category scheme
Category scheme map
Code
Code map
Codelist
Codelist map
Hierarchy
Hierarchical code
Hierarchical codelist
Hybrid code map
Hybrid codelist map
Concept
Concept map
Concept scheme
Concept scheme map
Metadata structure definition
Metadata set
Metadata flow
Metadata flow definition
Metadata concept
Metadata concept scheme
Reporting category
Reporting category map
Structure map
Structure set
Structure usage
Constraints
Annotation
Representation
Identifiable artefact ref
Maintainable artefact ref
Structure ref
International string
Localised string
Agency
Agency scheme
Contact
Provision agreement
Data and metadata provisioning
Data provider
Data provider scheme
Data provider ref
Data consumer
Data consumer scheme
Organisation map
Organisation unit
Organisation unit scheme
Organisation scheme map
Metadata target
Attribute descriptor
Data attribute
Metadata report
Report structure
Metadata attribute
Measure descriptor
Primary measure
Component map
Transition
Enumerated attribute value
XHTML attribute value
Text attribute value
Other non enumerated attribute value
Target data key
Target object key
Level
Coding format
Source code
Source hierarchical code
Source codelist
Source hierarchical codelist
Hierarchical code reference
Target code
Target codelist
Target hierarchical code
Target hierarchical codelist
Dimension descriptor
Dimension
Time dimension
Measure dimension
Group dimension descriptor
Data set target
Target data set
Report period target
Target report period
Dimension description values target
Identifiable object target
Target identifiable object
Constraint content target
Reporting taxonomy
Reporting taxonomy map
Series key
Group key
Reporting year start day
Attachment constraint
No specified relationship
Primary measure relationship
Group relationship
Dimension relationship
Measure key value
Coded key value
Uncoded key value
Time key value
Time dimension value
Component value
Observation
Uncoded observation
Coded observation
Uncoded attribute value
Coded attribute value
Scheme map
To text format
To value type
Data key set
Data key
Metadata key set
Metadata key
Constraint role
Content constraint
Cube region
Metadata target region
Constraint role type
Reference period
Release calendar
Member selection
Member value
Range period
Start period
End period
Before period
After period
Registration
Process
Process step
Process artefact
Simple datasource
Rest datasource
Web service datasource
Computation
Transition
Transformation
Transformation scheme
Operator scheme
Reference node
Constant node
Operator
Operator node
Parameter
How to start with SDMX?
• Not much needed, but at least:
– Understand the business case – the value in doing the project
– SDMX.org Learning and working groups:
• Understand the basic SDMX terms,
but don’t try to understand
the whole of the standard…
SDMX Information Model
• What is an Information Model? Examples:
Information Model
Objects
Used by
Excel
Sheets, Cells, Rows
Formulae, VBA
Relational database Database, Table,
Column
SQL, Interface
OECD metadata
OECD.Stat,
Metastore
42 categories
• SDMX IM designed for statistical data and metadata
exchange
• SDMX IM focused on aggregated data, but can be used
for microdata
SDMX Information Model
• Benefits of having an information model:
– Common vocabulary (Code list, Concept, Dataset)
– IM objects are fit-for-purpose
• Clearly defined relationships between objects and their
usage
– SDMX formats and tools are built around the IM
• Interoperable tools
• IM is highly structured, easier to use a part of it rather
than implementing full SDMX standard
Basic SDMX Artefacts
• DSD: Data Structure Definition
– Defines a cube/dataset for a domain such as National Account
– States dimensions, their members, and attributes
– Understand difference between a dimension and attribute
• Concept
– Either a dimension or attribute, e.g.
• Dimensions: Age, Location, Sector, Time
• Attributes: Observation status, Unit multiplier
• Code list
– Dimension or attribute members
– Each code list item has a code and description
• Concept Scheme
– List of all concepts for domain before splitting them into DSDs
Basic SDMX Artefacts
Example with National Accounts
Concept
Scheme
National Accounts
DSD
NA Main
Concept
Frequency
Code List
Frequency CL
Concept
Reference area
Code List
Area CL
Concept
Sector
Code List
Sector CL
Concept
Observation Status
Code List
Observation Status CL
How is SDMX data exchanged?
Web Services used for automation
• Web Service: a web site without a user
interface
• Instead of user interface there is an API
(Application Programming Interface)
• Used for machine-to-machine processing
• SDMX has a standard API
– Means that same software can use API from
many locations
How is SDMX data exchanged?
•
Push mode
– Data provider sends data files to each collector
– Each collector gets the data
•
Pull mode
– Data provider publishes the data once
– Each collector gets the data from the provider
•
Data hub
– Data published to a central location (the hub)
– Consumers get notification when data
is published
• Pull mode offers more efficient dissemination and collection of data,
enables client-drive slicing, and increases timelines of data
• SDMX uses web services to support Pull mode
Content-oriented Guidelines
• Common code lists:
–
–
–
–
Country
Observation Status
Currency
Etc.
• Rules in coding
• Guidelines for SDMX projects and creating new DSDs, etc.
• Benefits:
– Promote best practices in artifact creation, governance
– Alignment between domains
– Speed-up SDMX projects. Provide shopping list of existing code lists
• Help SDMX projects with recommendations
SDMX Project Steps
Map data
flows between
organisations
•
•
•
Data formats
Reporting
forms or
tables
Mailbox/web
List domain
concepts for
entire domain
• Becomes
Concept Scheme
Define code
lists
• Codify all items
using SDMX
guidelines
• Hierarchy can
come later
Concepts
dimension or
attribute
Create DSDs
from concepts.
Use data flows
• Dimension
uniquely
identifies data
• Attribute adds
info to data, e.g.
flags
• Each dimension
grouping is a
DSD
• Ways to avoid
many DSDs
Pilot DSDs
• 1st
pilot:Reporters
provide
feedback on DSD
structures
• 2nd pilot:
Reporters send
data, Consumers
process it
SDMX Main Tools
• SDMX Registry
Directory of the structural metadata
• SDMX Converter
converts between formats (Excel, GESMES, CSV, etc.)
• SDMX Reference Infrastructure SDMX Export
and mapping
for existing database
Mapping
SDMX Tools
<Demo of Global Registry>
Future of SDMX
• SDMX Validation Language
– Automate basic level of data validation e.g. a+b=c
– Transform data
• More standard code lists
– E.g. Seasonal adjustment
• Better, more reusable tools, e.g. Mapping
– Plug-and-play modules to transform, validate messages
• More guidelines and harmonised structures
– Such as Global DSDs
• Use SDMX for reference metadata exchange
Thank you
Any questions?
David Barraclough
OECD SDMX Coordinator
Download