SDMX Basics David Barraclough OECD SDMX Coordinator Overview • • • • • What is SDMX? Why SDMX? SDMX at OECD How to start with SDMX? Some SDMX concepts – How is data exchanged – The main tools – Content-oriented guidelines • Future of SDMX What is SDMX (not)? Not simply a technical format! What is SDMX? • Statistical Data and Metadata eXchange • Released in 2002 “SDMX is an initiative to foster standards for the exchange of statistical information.” • Sponsor organisations: – BIS, ECB, EUROSTAT, IMF, OECD, UN, World Bank What is SDMX? • • • • • Format: XML and EDI (rebranded GESMES) SDMX Information model Web service standards: APIs SDMX Registry standards Content-oriented guidelines Why SDMX? The Business Case • Reusable, open-source (free) tools save money and time • Standard codes and naming help improve reuse and save time – Reuse of categories – Less mapping/data processing saving – Shopping list of concepts when defining structures • Strongly-typed structures help improve validation and processing – Heavy-lifting processing of data messages can be automated – Text format is human-readable – Easier to create new tools around the agreed format Why SDMX? The Business Case • Standard technical architecture promotes more timely, better quality data – Timely because less manual conversion is needed – Quality because automated processing means less human error • SDMX Information model – Provides a common terminology – Makes tool development much easier – Information model described later What’s in it for Data Reporters? • SDMX Registry helps structure metadata • SDMX Tools exist and are free • One dissemination channel instead of packaging data for multiple consumers • Can easily disseminate SDMX from existing data warehouse with SDMX-RI • Lots of SDMX methodology available and growing Why not use…? Issues CSV Not structured, hard to validate No metadata Excel Metadata tied to presentation Proprietary format Licensing Hard to process and automate FAME, SAS, STATA Proprietary format Licensing GESMES No information model Proprietary format Few tools or international support XML No context to tags SDMX adds context to XML XBRL, DDI Not focused on aggregated data exchange The SDMX XML Data file format: “Global DSDs” Domains at various stages of implementation: • National Accounts • Balance of Payments • Foreign Direct Investment FDI In draft: • Harmonized Trade IMTS • R&D • Education Many other “Shared ” DSDs. SDMX at OECD Harmonized Trade data • Synchronised from UN database every night • Only “Delta” is synched in our database. Required because trade database is huge • SDMX standards support querying the delta for a given date Harmonized Trade SDMX at OECD • OECD.Stat SDMX web service is used for: – Data resellers receive data in standard format, easy to process – Incremental updates are possible by slicing data – Querying autonomously. Standard API is easy to use in programs SDMX at OECD <Demo of OECD.Stat web service> How to start with SDMX? Data Structure Definition Data set Structure specific data set Structure specific time series data set Generic time series data set Generic data set Data flow Data flow definition Category Category map Category scheme Category scheme map Code Code map Codelist Codelist map Hierarchy Hierarchical code Hierarchical codelist Hybrid code map Hybrid codelist map Concept Concept map Concept scheme Concept scheme map Metadata structure definition Metadata set Metadata flow Metadata flow definition Metadata concept Metadata concept scheme Reporting category Reporting category map Structure map Structure set Structure usage Constraints Annotation Representation Identifiable artefact ref Maintainable artefact ref Structure ref International string Localised string Agency Agency scheme Contact Provision agreement Data and metadata provisioning Data provider Data provider scheme Data provider ref Data consumer Data consumer scheme Organisation map Organisation unit Organisation unit scheme Organisation scheme map Metadata target Attribute descriptor Data attribute Metadata report Report structure Metadata attribute Measure descriptor Primary measure Component map Transition Enumerated attribute value XHTML attribute value Text attribute value Other non enumerated attribute value Target data key Target object key Level Coding format Source code Source hierarchical code Source codelist Source hierarchical codelist Hierarchical code reference Target code Target codelist Target hierarchical code Target hierarchical codelist Dimension descriptor Dimension Time dimension Measure dimension Group dimension descriptor Data set target Target data set Report period target Target report period Dimension description values target Identifiable object target Target identifiable object Constraint content target Reporting taxonomy Reporting taxonomy map Series key Group key Reporting year start day Attachment constraint No specified relationship Primary measure relationship Group relationship Dimension relationship Measure key value Coded key value Uncoded key value Time key value Time dimension value Component value Observation Uncoded observation Coded observation Uncoded attribute value Coded attribute value Scheme map To text format To value type Data key set Data key Metadata key set Metadata key Constraint role Content constraint Cube region Metadata target region Constraint role type Reference period Release calendar Member selection Member value Range period Start period End period Before period After period Registration Process Process step Process artefact Simple datasource Rest datasource Web service datasource Computation Transition Transformation Transformation scheme Operator scheme Reference node Constant node Operator Operator node Parameter How to start with SDMX? • Not much needed, but at least: – Understand the business case – the value in doing the project – SDMX.org Learning and working groups: • Understand the basic SDMX terms, but don’t try to understand the whole of the standard… SDMX Information Model • What is an Information Model? Examples: Information Model Objects Used by Excel Sheets, Cells, Rows Formulae, VBA Relational database Database, Table, Column SQL, Interface OECD metadata OECD.Stat, Metastore 42 categories • SDMX IM designed for statistical data and metadata exchange • SDMX IM focused on aggregated data, but can be used for microdata SDMX Information Model • Benefits of having an information model: – Common vocabulary (Code list, Concept, Dataset) – IM objects are fit-for-purpose • Clearly defined relationships between objects and their usage – SDMX formats and tools are built around the IM • Interoperable tools • IM is highly structured, easier to use a part of it rather than implementing full SDMX standard Basic SDMX Artefacts • DSD: Data Structure Definition – Defines a cube/dataset for a domain such as National Account – States dimensions, their members, and attributes – Understand difference between a dimension and attribute • Concept – Either a dimension or attribute, e.g. • Dimensions: Age, Location, Sector, Time • Attributes: Observation status, Unit multiplier • Code list – Dimension or attribute members – Each code list item has a code and description • Concept Scheme – List of all concepts for domain before splitting them into DSDs Basic SDMX Artefacts Example with National Accounts Concept Scheme National Accounts DSD NA Main Concept Frequency Code List Frequency CL Concept Reference area Code List Area CL Concept Sector Code List Sector CL Concept Observation Status Code List Observation Status CL How is SDMX data exchanged? Web Services used for automation • Web Service: a web site without a user interface • Instead of user interface there is an API (Application Programming Interface) • Used for machine-to-machine processing • SDMX has a standard API – Means that same software can use API from many locations How is SDMX data exchanged? • Push mode – Data provider sends data files to each collector – Each collector gets the data • Pull mode – Data provider publishes the data once – Each collector gets the data from the provider • Data hub – Data published to a central location (the hub) – Consumers get notification when data is published • Pull mode offers more efficient dissemination and collection of data, enables client-drive slicing, and increases timelines of data • SDMX uses web services to support Pull mode Content-oriented Guidelines • Common code lists: – – – – Country Observation Status Currency Etc. • Rules in coding • Guidelines for SDMX projects and creating new DSDs, etc. • Benefits: – Promote best practices in artifact creation, governance – Alignment between domains – Speed-up SDMX projects. Provide shopping list of existing code lists • Help SDMX projects with recommendations SDMX Project Steps Map data flows between organisations • • • Data formats Reporting forms or tables Mailbox/web List domain concepts for entire domain • Becomes Concept Scheme Define code lists • Codify all items using SDMX guidelines • Hierarchy can come later Concepts dimension or attribute Create DSDs from concepts. Use data flows • Dimension uniquely identifies data • Attribute adds info to data, e.g. flags • Each dimension grouping is a DSD • Ways to avoid many DSDs Pilot DSDs • 1st pilot:Reporters provide feedback on DSD structures • 2nd pilot: Reporters send data, Consumers process it SDMX Main Tools • SDMX Registry Directory of the structural metadata • SDMX Converter converts between formats (Excel, GESMES, CSV, etc.) • SDMX Reference Infrastructure SDMX Export and mapping for existing database Mapping SDMX Tools <Demo of Global Registry> Future of SDMX • SDMX Validation Language – Automate basic level of data validation e.g. a+b=c – Transform data • More standard code lists – E.g. Seasonal adjustment • Better, more reusable tools, e.g. Mapping – Plug-and-play modules to transform, validate messages • More guidelines and harmonised structures – Such as Global DSDs • Use SDMX for reference metadata exchange Thank you Any questions? David Barraclough OECD SDMX Coordinator