Public Administration Select Committee Study 6: Statistics and Open Data Submission on behalf of the Demographics User Group 2 September 2013 Executive Summary The Demographics User Group (DUG)1 represents 14 major commercial companies – Barclays, Boots, Camelot, Centrica, Co-operative Group, E.ON, Everything Everywhere, GSK, John Lewis, Marks & Spencer, Sainsbury’s, Tesco, and Whitbread – which make extensive use of government statistics and geographical data to understand local markets and consumers, and make decisions about large investments in delivering better services. These are the tip of the iceberg of 2.3 million businesses in the UK, many of which can increase their efficiency, and grow, by using data gathered by government, which has the great advantage of consistent collection across the whole of the country. The key themes of this note are to: Recognise the importance of government open data to business Welcome the acceleration in progress in recent years Alert PASC to the fact that users outside the public sector (such as businesses and charities) are not able to enjoy all the free access arrangements which have been made for public sector users Urge PASC to press for the National Address Gazetteer and the Postcode Address File to be made open data PASC’s questions 1. Why is open data important? 1.1 Companies require the best possible information available to understand their markets, and to make major investment decisions such as the opening of new retail outlets. Government collects huge volumes of data about citizens, and the country’s physical infrastructure, and this has the additional advantage of often being done consistently for the whole of Great Britain, or even the entire United Kingdom. We endorse the Shakespeare Review’s approach that information collected at public expense should be publicly available (subject to confidentiality limitations). 1.2 The case for open data often focuses on the opportunities for new informationbased companies to start-up and to grow, but we believe that the benefits of increasing the efficiency of existing business-to-consumer companies (such as retailers) are much larger: for example, the use of a better address register to reduce the failure rate of millions of home deliveries would result in considerable savings over one year. 1 http://www.demographic.co.uk/dug.html 2. Why does the Government need an open data strategy? 2.1 The great benefit of the Government’s open data strategy is that it sets the tone in the public services, creating the expectation that data will be released unless there is good reason. 3. What should the Government’s aims be for the release of open data? Are the Government’s stated key outcomes in its Open Data Strategy the right ones? 3.1 We strongly support the Government’s policy on Open Data, believing that as the use of data increases and extends to new users, this creates new value. 4. How can those engaged in open data, and those engaged in producing government statistics work together effectively to produce new data? 4.1 This requires continuous dialogue, so that data owners can better understand users’ needs. The Office for National Statistics has long put significant effort into consultation with users of the Census – the biggest of all open datasets – and this has been very successful. The Royal Statistical Society’s Statistics User Forum provides an excellent mechanism for understanding the needs of a wide range of user groups. And the recently established Open Data User Group provides a valuable vehicle for identifying users’ needs and priorities. 5. How can more statistics and administrative data of all kinds become more freely available? 5.1 The use of government administrative files to create new statistics needs to be accelerated. Administrative files accumulated by departments such as HMRC, DWP, Education, the NHS, and the Home Office are immensely rich potential sources of information about the population and its social characteristics. In recent years more use has been made of such files to produce aggregate statistics for small areas, but we believe that there is scope to create much more value at relatively low cost in two ways: Existing statistics, but for smaller areas. Although the 2001 Census produced many statistics down to Output Area level (c.120 households), statistics produced in the following decade from administrative files have been created only for larger, cruder areas. Simply aggregating administrative records files to OA level would increase the value of the information, and be a quick win. New statistics from underutilised administrative files. This was encouraged in the Treasury Select Committee’s report “Counting the Population” in 2008, and is being taken forward by ONS in its “Beyond 2011” investigation of alternatives to another Census, which is creating a massive opportunity to use government administrative information for statistical purposes. In particular, HMRC is the obvious source of information on Income, and Wealth. 2 6. Is open data presented well and of adequate quality? a. Are the formats of the data being published accessible, useable and understandable to the public? b. What metadata is needed to make releases useful? c. Who will use the data released? 6.1 The top priority for users of open data has been to see the principle of free access established, and datasets published, even if their quality is less than perfect, and their formats not ideal. This leaves scope for improvement, and also for better ways to search for and find relevant data. The website www.data.gov.uk now has >10,000 datasets, and effort now needs to be made to highlight those which are of particular interest to certain categories of users. For example, http://www.retailresearchdata.org/Default.aspx the ESRC’s Retail Research Data website, enables insight and store location analysts working in retail organisations to get easy access to (just) those free datasets which may be of value to their businesses: less means more. 6.2 In this way, usage can spread from a small number of specialists, to more mainstream analysts, and then to the much more numerous wider public, greatly increasing use and hence the value of the data. This has been very apparent with the increased use of the Census in the last two decades. 7. How successful has the Government’s Open Data initiative been in changing behaviour in the Civil Service and wider public sector? 7.1 The Government’s initiative has had significant success in changing behaviour in Whitehall Departments, and DUG really appreciates its involvement in Transparency Boards such as Welfare (DWP), and Tax (HMRC). 7.2 However, in the wider public sector, progress has in some cases been fiercely resisted by BIS and the Treasury. Two matters are of great concern to us: Firstly, the defence of the Trading Fund model, which can result in setting prohibitively high prices aimed at a very small captive market of business users: this approach is still used by Ordnance Survey for some of its data. Secondly, arrangements have been made through the Public Sector Mapping Agreement for public sector bodies to have free access at the point of use to Ordnance Survey’s mapping, the National Address Gazetteer, and the Post Office’s Postcode Address File, and this is an enlightened policy. But it does not extend to other users such as business, or charities: this is iniquitous, especially when the government is encouraging businesses to grow. It is to be hoped that the Shakespeare Review’s proposal for National Core Reference Data will finally provide a mechanism to solve the problem, but, having seen the case repelled for 15 years, we are not optimistic. 7.3 We also feel that many civil servants do not realise that “commercial users” are of two distinct types: value-added resellers (VARs, which sell data to other businesses), and business-to-consumer companies (which provide services to very large numbers of citizens). For the latter, key government data sets are "business as usual" resources, and are therefore structurally important in maintaining and growing profitability, jobs, and tax revenues. It follows that government should not be content to seek only the views of VARs when wishing to understand the needs of commercial users. 3 8. Which datasets are the most important? a. What are the best examples of data being made open and resultant benefits to business or society? 8.1 The decision to make the 2001 Census freely available was a great milestone on the journey to open data: use by commercial companies greatly increased, and it has been used as the basis for thousands of investment decisions. The arrival of the 2011 Census is just as significant. More broadly, members of DUG have welcomed many new open data sources, especially (some) Ordnance Survey mapping, postcode directories, DWP statistics for small areas, Land Registry house prices, and GP prescribing information. 8.2 The scope for further progress is summarised in our Data Manifesto (see Annex). Of these, our top priorities are: The National Address Gazetteer All the mapping, including boundaries, needed by government, and provided for in the Public Sector Mapping Agreement 9. How effective is the work being undertaken by the Cabinet Office to monitor the progress of Departments in publishing their agreed datasets? 9.1 It appears to be very effective indeed. Keith Dugmore Director, Demographics User Group 20 Russell House Cambridge Street London SW1V 4EQ TelNos: 020 7834 0966 (landline); 07976 750094 (mobile) Email: dugmore@demographic.co.uk PASC – Open Data – DUG Submission 4 Annex: DUG members’ needs for data from government – a manifesto (Updated January 2013) This is the latest version of priorities identified by the 14 large commercial companies who are members of the Demographics User Group www.demographicsusergroup.co.uk We believe that they would also benefit many of the country’s other 2.3 million businesses and, indeed other organisations such as charities, and citizens generally. Introduction: In many cases individual unit records are the ideal flexible data source, but if they need to be protected for confidentiality, tagging them with an Output Area code, or aggregating them to OA statistics, maximises value. General issues: access / timeliness / format. Six topics are identified (bold & in green) as priorities. In some cases (marked #) the 2011 Census will provide new information (but only as at March 2011). Broad Category Geographical Backdrop Retail centres Workplaces (# & see some information in the 2011 Census) People’s movements / transport / location / commuting (# & see some information in the 2011 Census) Specifics (1) All the mapping, including boundaries, needed by government, and provided for in the Public Sector Mapping Agreement Infrastructure developments & plans Flood maps Retail Outlets: Numbers & type Speedily updated, inc. pop-up shops Historical Data Locations, and numbers of workers: Head Offices Local Units Business & Science parks Traffic flows: Mode (road, rail, bus, tram, bike, + pedestrians) and Destinations (workplace, retail, etc.). Addresses – home & others 2 Ordnance Survey LAs / CLG / OS Env Agency Valuation Office Agency / LAs? Inter Departmental Business Register? Department for Transport for all those in this section? Car parks; Congestion charge areas (2) Counts of people at locations (& mobile phone coverage) Telecommunication Possible Government sources Time-based data: Seasonality Weekday / weekend Day & day part Mobile coverage / phones per cell by time Broadband access / usage / speed Cable & broadband exchange traffic National Statistics Postcode Directory – omitted fields, e.g. delivery points PAF, & Postcode changes (3) The National Address Gazetteer (& see Govt news 29 Nov 20112) Better data from OFCOM: more recent, and for all UK ONS Royal Mail GeoPlace/OS http://www.cabinetoffice.gov.uk/resource-library/open-data-measures-autumn-statement-2011 5 Properties – housing & business People & their circumstances (# & see some information in the 2011 Census) Business, the economy & investment Weather Addresses of premises (schools, hospitals, (various) surgeries, clinics, etc.) Council Tax bands for domestic properties, & VOA & LAs receipts Housing stock, & sales & their prices Land Registry (& see Govt news 29 Nov 2011) House rents LAs / CLG? House building & conversion completions LAs / CLG Planning applications – domestic and LAs / CLG business properties Valuation lists for business properties VOA Aggregate data from government data silos DWP, HMRC, – person & household; ideally a single Education, etc. customer / citizen view LAs Electoral Roll (if not opted out) County Court Judgments for debt – personal MoJ & Registry & corporate Trust (4) Household income & disposable income / cost of housing / wealth HMRC ONS Immigration / migration; house occupancy / multiple occupation Company information Efficiency by area; GDP by area Levels of government investment: geographical location & nature Weather: historic and current data, and also forecasts Companies House ( & see Govt news 29 Nov 2011) ONS / HMT / BIS? ONS / HMT / BIS? Met Office (& see Govt news 29 Nov 2011) ONS’s Statistics Neighbourhood Statistics, and new statistics created from administrative databases (5) Recreate existing statistics at Output Area level (c.f. the current higher / less valuable Super OA level), and create new OA-level statistics starting with the topics identified by the Beyond 2011 project Government’s existing sample surveys (e.g. those held at the University of Essex) (6) Provide access to anonymised unit records for analytical purposes. All surveys should be coded with ONS’s Output Area Classification (OAC). The Living Costs and Food Survey, the Wealth & Assets Survey, Understanding Society, and National Well-Being (the “Happiness Index”) are of particular interest to commercial companies. DUG’s Data Manifesto – January 2013 Keith Dugmore 6 ONS and government departments ONS / ESRC