Title of the Best Practice

advertisement
NAME OF THE SHARE-PSI WORKSHOP : USES OF OPEN DATA WITHIN
GOVERNMENT FOR INNOVATION AND EFFICIENCY
TITLE OF THE BEST PRACTICE: …PUBLISHING STATISTICAL DATA IN LINKED
DATA FORMAT
1 OUTLINE OF THE BEST PRACTICE
Linked Open Data (LOD) is a growing movement for organizations to make their existing data available in
a machine-readable format. There are two equally important viewpoints to LOD: publishing and
consuming. This Best practice refers to both sub-processes; however more examples reported by
government organizations are linked to the publication process.
2 MANAGEMENT SUMMARY
2.1 CHALLENGE
Statistical data is used as the foundations for policy prediction, planning and adjustments, and therefore
has a significant impact on the society (from citizens to businesses to governments). The process of
collecting and monitoring socio-economic indicators can be considerably improved if the data produced
by government organizations such as Statistical Offices, National Banks, Employment services, etc. are
published in Linked Data Format.
2.2 SOLUTION
Linked Data paradigm has opened new possibilities and perspectives for government organizations to
open data and interchange information. Data is open if it is technically open (available in a machinereadable standard format, which means it can be retrieved and meaningfully processed by a computer
application) and legally open (explicitly licensed in a way that permits commercial and non-commercial
use and re-use without restrictions), see the World Bank Open Data Essentials,
http://opendatatoolkit.worldbank.org/en/essentials.html
The Linked Data approach enables datasets to be linked together through references to common
concepts. A dataset is represented in the form of a graph, using the Resource Description Framework
(RDF) as a general-purpose language. Linked Data publication process refers to a set of activities related
to extraction, transformation, validation, exploration and publication of RDF datasets originating from
different sources (e.g., databases) on the Web. The ready for use RDF datasets can be either stored
locally or registered at a metadata catalog e.g. build with CKAN open-source tool.
In 2014, The RDF Data Cube Vocabulary was published by the W3C Government Linked Data Working
Group as a Recommendation for publishing multi-dimensional data on the Web.
3 BEST PRACTICE IDENTIFICATION
3.1 WHY IS THIS A BEST PRACTICE? WHAT'S THE IMPACT OF THE BEST PRACTICE?
The approach contributes to the standardization of the process of publishing and re-use of multidimensional data on the Web. The approach is based on RDF Data Cube vocabulary that is mature
enough to be used for publishing statistical data as it improves interoperability and allows comparison
of data from different statistical sources. The vocabulary underlies SDMX (Statistical Data and Metadata
eXchange), an ISO standard for exchanging and sharing statistical data and metadata among
organizations and provides a layer on top of data to describe domain semantics, dataset's metadata, and
other crucial information needed in the process of statistical data exchange.
3.2 LINK TO THE PSI DIRECTIVE
(Please use one or more of the categories listed on the last page of this document, as many as relevant)
-
Policies and legislation (legal requirements, licenses etc..) / Licensing of
information/data and metadata
Open Data platform(s) / Publication and deployment of information/data and metadata
Techniques w.r.t. opening up of data / Technical requirements and tools
Dataset structures, formats, APIs / Structuring of information/data, formats, APIs
Encouraging (commercial) re-use
Documentation of information/data, creation of metadata
3.3 WHY IS THERE A NEED FOR THIS BEST PRACTICE?
To spread experience and encourage government organizations to follow existing approaches.
4 WHAT DO YOU NEED FOR THIS BEST PRACTICE?
This best practice is based on a set of tools for automating the data extraction and publication process.
However the EU research community delivered many open-source tools for publishing the statistical
data in Linked Data format, see e.g. the LOD2 Statistical Workbench (https://www.w3.org/2013/sharepsi/wiki/images/6/65/Samos_Workshop_2014_-_IMP_submission.pdf ).
5 APPLICABILITY BY OTHER MEMBER STATES?
Many EU States (especially the Statistical Offices) already publish their data in Linked Data format. Most
often these services are available on national Web portals, while the metadata is harvested on European
level e.g. by the Publicdata.eu. Additionally, the European Commission maintains the Open Data Portal
as a metadata catalogue available as Linked Data, seehttp://open-data.europa.eu/en/linked-data.
6 CONTACT INFO - RECORD OF THE PERSON TO BE CONTACTED FOR
ADDITIONAL INFORMATION OR ADVICE.
Valentina Janev, Institute Mihajlo Pupin, valentina.janev@institutepupin.com
CATEGORIES FOR USE IN SECTION 3.2
-
Policies and legislation (legal requirements, licenses etc..) / Licensing of
information/data and metadata
Open Data platform(s) / Publication and deployment of information/data and metadata
Dataset criteria and priorities and value and scope w.r.t. datasets
Charging issues and proposals
Techniques w.r.t. opening up of data / Technical requirements and tools
Organisational structures and skills
Dataset structures, formats, APIs / Structuring of information/data, formats, APIs
Encouraging (commercial) re-use
Persistence and maintenance of information/data and metadata
Data quality issues and solutions / Quality assurance, feedback channels and evaluation
Documentation of information/data, creation of metadata
Selection of information/data to be published according to various criteria
Data discoverability
Download