NAME OF THE SHARE-PSI WORKSHOP : USES OF OPEN DATA WITHIN GOVERNMENT FOR INNOVATION AND EFFICIENCY TITLE OF THE BEST PRACTICE: …PUBLISHING STATISTICAL DATA IN LINKED DATA FORMAT 1 OUTLINE OF THE BEST PRACTICE Linked Open Data (LOD) is a growing movement for organizations to make their existing data available in a machine-readable format. There are two equally important viewpoints to LOD: publishing and consuming. This Best practice refers to both sub-processes; however more examples reported by government organizations are linked to the publication process. 2 MANAGEMENT SUMMARY 2.1 CHALLENGE Statistical data is used as the foundations for policy prediction, planning and adjustments, and therefore has a significant impact on the society (from citizens to businesses to governments). The process of collecting and monitoring socio-economic indicators can be considerably improved if the data produced by government organizations such as Statistical Offices, National Banks, Employment services, etc. are published in Linked Data Format. 2.2 SOLUTION Linked Data paradigm has opened new possibilities and perspectives for government organizations to open data and interchange information. Data is open if it is technically open (available in a machinereadable standard format, which means it can be retrieved and meaningfully processed by a computer application) and legally open (explicitly licensed in a way that permits commercial and non-commercial use and re-use without restrictions), see the World Bank Open Data Essentials, http://opendatatoolkit.worldbank.org/en/essentials.html The Linked Data approach enables datasets to be linked together through references to common concepts. A dataset is represented in the form of a graph, using the Resource Description Framework (RDF) as a general-purpose language. Linked Data publication process refers to a set of activities related to extraction, transformation, validation, exploration and publication of RDF datasets originating from different sources (e.g., databases) on the Web. The ready for use RDF datasets can be either stored locally or registered at a metadata catalog e.g. build with CKAN open-source tool. In 2014, The RDF Data Cube Vocabulary was published by the W3C Government Linked Data Working Group as a Recommendation for publishing multi-dimensional data on the Web. 3 BEST PRACTICE IDENTIFICATION 3.1 WHY IS THIS A BEST PRACTICE? WHAT'S THE IMPACT OF THE BEST PRACTICE? The approach contributes to the standardization of the process of publishing and re-use of multidimensional data on the Web. The approach is based on RDF Data Cube vocabulary that is mature enough to be used for publishing statistical data as it improves interoperability and allows comparison of data from different statistical sources. The vocabulary underlies SDMX (Statistical Data and Metadata eXchange), an ISO standard for exchanging and sharing statistical data and metadata among organizations and provides a layer on top of data to describe domain semantics, dataset's metadata, and other crucial information needed in the process of statistical data exchange. 3.2 LINK TO THE PSI DIRECTIVE (Please use one or more of the categories listed on the last page of this document, as many as relevant) - Policies and legislation (legal requirements, licenses etc..) / Licensing of information/data and metadata Open Data platform(s) / Publication and deployment of information/data and metadata Techniques w.r.t. opening up of data / Technical requirements and tools Dataset structures, formats, APIs / Structuring of information/data, formats, APIs Encouraging (commercial) re-use Documentation of information/data, creation of metadata 3.3 WHY IS THERE A NEED FOR THIS BEST PRACTICE? To spread experience and encourage government organizations to follow existing approaches. 4 WHAT DO YOU NEED FOR THIS BEST PRACTICE? This best practice is based on a set of tools for automating the data extraction and publication process. However the EU research community delivered many open-source tools for publishing the statistical data in Linked Data format, see e.g. the LOD2 Statistical Workbench (https://www.w3.org/2013/sharepsi/wiki/images/6/65/Samos_Workshop_2014_-_IMP_submission.pdf ). 5 APPLICABILITY BY OTHER MEMBER STATES? Many EU States (especially the Statistical Offices) already publish their data in Linked Data format. Most often these services are available on national Web portals, while the metadata is harvested on European level e.g. by the Publicdata.eu. Additionally, the European Commission maintains the Open Data Portal as a metadata catalogue available as Linked Data, seehttp://open-data.europa.eu/en/linked-data. 6 CONTACT INFO - RECORD OF THE PERSON TO BE CONTACTED FOR ADDITIONAL INFORMATION OR ADVICE. Valentina Janev, Institute Mihajlo Pupin, valentina.janev@institutepupin.com CATEGORIES FOR USE IN SECTION 3.2 - Policies and legislation (legal requirements, licenses etc..) / Licensing of information/data and metadata Open Data platform(s) / Publication and deployment of information/data and metadata Dataset criteria and priorities and value and scope w.r.t. datasets Charging issues and proposals Techniques w.r.t. opening up of data / Technical requirements and tools Organisational structures and skills Dataset structures, formats, APIs / Structuring of information/data, formats, APIs Encouraging (commercial) re-use Persistence and maintenance of information/data and metadata Data quality issues and solutions / Quality assurance, feedback channels and evaluation Documentation of information/data, creation of metadata Selection of information/data to be published according to various criteria Data discoverability