To Boldly Go PC-Axis Reference Group, Copenhagen, 2014 Central Statistics Office, Cork, Ireland Kevin Healy , kevin.healy@cso.ie (00353 21) 453 5719 Eoin MacCuirc eoin.mccuirc@cso.ie (00 353 21) 453 5504 Linked Open Data The Tower of Babel “If as one people speaking the same language they have begun to do this, then nothing they plan to do will be impossible for them. Come, let us go down and confuse their language so they will not understand each other.” Tim Berners Lee – Founder of the Web “In an extreme view, the world can be seen as only connections, nothing else. We think of a dictionary as the repository of meaning, but it defines words only in terms of other words. I liked the idea that a piece of information is really defined only by what it's related to, and how it's related. There really is little else to meaning. The structure is everything. There are billions of neurons in our brains, but what are neurons? Just cells. The brain has no knowledge until connections are made between neurons. All that we know, all that we are, comes from the way our neurons are connected.” How open is the data? - Linked Open Data star scheme Tim Berners-Lee suggested a 5-star deployment scheme for Linked Open Data and Ed Summers provided a nice rendering of it. In the following, examples are given for each level. The example data used throughout is 'the temperature forecast for Galway, Ireland for the next 3 days': ★ make your stuff available on the Web (whatever format) under an open license 1 example ... ★★ make it available as structured data (e.g., Excel instead of image scan of a table) 2 example ... ★★★ use non-proprietary formats (e.g., CSV instead of Excel) 3 example ... ★★★★ use URIs to identify things, so that people can point at your stuff4 example ... ★★★★★ link your data to other data to provide context 5 example http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/ Linked Open Data cloud Media User-generated Government Publications Cross-domain Geo http://lod-cloud.net/ Life sciences Linked open data -The Semantic Web Copenhagen – 99,100,000 hits looking for a needle in a haystack URI – Uniform Resource Identifier give the thing a name and an address The following picture shows the desired relationships between a resource and its representing documents: Tim’s cool URIs Cool URIs don't change What makes a cool URI? A cool URI is one which does not change. What sorts of URI change? URIs don't change: people change them. It is the the duty of a Webmaster to allocate URIs which you will be able to stand by in 2 years, in 20 years, in 200 years. This needs thought, and organization, and commitment. The Web of Things – The Internet of Things The Internet of Things is coming, but it needs a semantic backbone to flourish. With some 25 billion devices expected to be connected to the Internet by 2015 and 50 billion by 2020, providing interoperability among the things on the IoT “is one of the most fundamental requirements to support object addressing, tracking, and discovery as well as information representation, storage, and exchange.” So write the authors of Semantics for the Internet of Things: Early Progress and Back to the Future, Payam Barnaghi and Wei Wang, Centre for Communication Systems Research, University of Surrey, Guildford, UK and Cory Henson, Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing. “The suite of technologies developed in the Semantic Web … such as ontologies, semantic annotation, Linked Data and semantic Web services … can be used as principal solutions for the purpose of realizing the IoT,” they state. “Defining an ontology and using semantic descriptions for data will make it interoperable for users and stakeholders that share and use the same ontology.” Where is the CSO with all this? • In partnership with DERI/NUIG/INSIGHT • One of the first NSIs in the world to upload census data as linked open data – data.cso.ie – Census 2011 • One of the organisations involved in the EU Open Cube pilot projects • Launched apps4gaps competition data.cso.ie Census – Linked Open Data • 12 million RDF triples from Census • Geographical entities (counties, cities, etc.) • Codelists CSO/NUIG collaboration summary position • Most technical work done by students/interns at NUIG • CSO supplied data, use cases, and expertise • Lots of manual work and ad-hoc solutions • Results not fully “owned” by CSO • Skills needed to maintain/extend are mostly in NUIG 18-19 November 2013 OpenCube kick-off meeting 15 Open Cube Project Pilots 18-19 November 2013 OpenCube kick-off meeting OpenCube Pilots Pilot Focus Tool/platform Data sets Type of users Number of users Evaluation Cycle DCLG Publish Swirrl’s 50-100 open PublishMyDa datasets ta regarding finance, planning Performance, land use, housing and homlessness. Public servants (members of the DCLG statistical data management team) as well as statisticians/ researchers 3-4 members of the data management team and 5 test users (statisticians, research analysts) 2 evaluation cycles: M9M12 and M18-M21 Flemish Gov Publish/ FluidOps’ Reuse IWB 1100 open datasets VRIND A varied 5-10 audience ranging from public servants to data scientists 2 evaluation cycles: M9M12 and M18-M21 Central Statistics Office Publish/ OpenCube Reuse toolkit 2011 Census dataset & StatBank dataset Public servants 2 evaluation cycles: M9M12 and M18-M21 25 employees Open Cube business case for the CSO • Publishing statistics from StatBank as linked data • Publishing statistics from StatBank as SDMXML • Facilitate the creation of general reports aimed at the general public • Assist with answering queries from the public • Help third parties to tell stories with CSO data CSO goals (independent from OpenCube) • Own the data.cso.ie process and technology – Enable in-house maintenance, changes, etc. • Publish StatBank* data as Linked Open Data – Ongoing publication process – Adhering to release schedule is critical – Publish data that are regularly updated (monthly, quarterly, annual) as linked open data ( Census 2011 static data) *StatBank is the CSO published time series database (PC Axis) • Deploy tools that enable analytics and exploitation of linked data – Both internally and externally The Role of the CSO in the Future of Linked Data in Ireland As the technology trends that drive adoption of Linked Data continue further, and the importance of Open Data increases, the CSO is well-positioned to play a leading role as a “hub” in the Irish data Web. Some key steps include: 1. Proactively encourage the adoption of standard classifications and metadata for Open Data that are published by different public bodies within Ireland. The CSO is already documenting classifications on its StatCentral (Portal) website, and has more experience in disseminating data on the Web than perhaps any other organisations in the public sector. Ideally, the classifications themselves would be published as Linked Data. 2. Going beyond pure classifications, encourage the use of standard identifiers (URIs) for geographical areas. 3. Support Linked Data as a new dissemination format for the CSO StatBank. Key economic and demographic statistics are necessary in all sorts of data analysis tasks, and ideally they should be published as Linked Data directly by the source (CSO). Application Programming Interface (API) StatBank API StatBank API – by theme StatBank API – Download http://www.cso.ie/StatbankServices/StatbankServices.svc/jsonservice/responseinstance/AAA01 Key Indicators , quick tables and multi-quick tables Key Economic Indicators http://www.cso.ie/indicators/Maintable.aspx Quicktables http://www.cso.ie/Quicktables/GetQuickTables.aspx?FileName=CNA13.asp&TableName=Population+1901+-+2011&StatisticalProduct=DB_CN Multi-quicktables http://www.cso.ie/multiquicktables/quickTables.aspx?id=qnq34 Public Sector Statistics Network (PSSN) PSSN – Organisations hosted OGP as a driver http://www.ogpireland.ie/ data.gov.ie – Irish OGP portal http://data.gov.ie/dataset Context and Impact Indicators CSO - Context and Impact Context and Impact Indicators 2011 2012 2013 238 306 304 Visits 2,387,000 2,303,441 2,718,287 Page views 10,070,000 13,997,031 17,034,035 Downloaded files 1,539,000 1,733,833 1,856,176 400,400 1,042,750 1,282,674 Visits 131,400 158,117 179,527 Page views 300,200 418,564 451,788 3,030 5,644 8,548 -28% -4.7% n/a Printed output No. of releases and publications Online output – CSO website StatBank table accesses Online output – StatCentral site Publication of statistics on social media Followers (at year-end) Burden Reduction Annual reduction in statistical burden on business Questions?