UK Data Access Practices Felix Ritchie Overview • The legislative model • The data model • The security model • Developments • Current key concerns The legislative model (1) • Mixture of statutes and common law until… • Statistics and Registration Services Act 2007 – – – – Didn’t abolish existing gateways for research Created a new gateway – ‘Approved Researchers’ Allowed for cross-govt data sharing… …but not for research purposes unless specifically agreed – Clarified limits of European data sharing – ONS given a statutory duty to support research The legislative model (2) • No theoretical limits on who can have access to enormous range of govt data – both within govt and in academia • …but not a free-for-all • ONS has a duty to protect confidentiality – even for Approved Researchers – data release has to be consistent with need → the data model The data model (1) • ‘Spectrum’ of access points balancing – value of data – ease of use – disclosure risk • for a given level of confidentiality, maximise data use and convenience • no ‘one-size-fits-all’ solution – no absolute prohibitions – trade-off is made explicit – users determine appropriate level of access Use of confidential data: the access spectrum Type of access None VML ONS sites Little SDC of inputs None Restrictions on users Many SDC of outputs Complete Examples: Secure data service Special licences Licensed data archive Internet Complete RDCs Anonymisation VML Govt sites Census data Original data Data for ONS linking Enterprise data Original data Identified data for ONS linking Identifiable data for analysis Complete None None ONS contractor Govt. users only Anon. CD-ROM Web tables Web tables The data model (2) • Options should cover most cases – Can’t be perfect in every case – But the jump from one solution to another reflects data utility and patterns of research use • Pretty efficient – – – – Fairly transparent Users balance their own costs/benefits Economies of scale delivering mass solutions eg UKDA, VML • How do we define/describe access points? → the security model The security model (VML version) • valid statistical purpose safe projects • trusted researchers + safe people • anonymisation of data + safe data • technical controls around data + safe setting • disclosure control of results + safe outputs safe use Use of confidential data: the access spectrum for ONS data at present Safety criterion VML SDS (provisional) One-off cases “Special Licence” UK Data Archive Internet People* ARs/ Civil Servants ARs ? ARs UK academics Anyone Projects Scrutiny by MRP Scrutiny by MRP Scrutiny by MRP Scrutiny by MRP Academic projects None Data (in theory) Any Unidentified Anonymised, low risk of identification Anonymised, almost no risk of identification Anonymised, no risk of identification Anonymised, low risk of identification Anonymised, almost no risk of identification Anonymised, no risk of identification N/A Data (in practice) Unidentified Unlinkable? Settings Secure thin client Secure thin client ? Use on restricted IT systems Use by academics only None Outputs ONS staff checked SDS staff checked, ONS guidelines ? Researchers agree to follow ONS guidelines No checking No checking *AR = Approved Researcher Access: a summary • No theoretical restrictions • wide ranging and flexible legal basis Remote access in the UK: the VML (1) • Probably the most important research data resource in the UK after the UK Data Archive (and the internet) • Expanding access from other govt depts. • Data acquisitions: – – – – internal ONS versions of social datasets Other government dept data Administrative data Census 2011 detailed microdata? Remote access in the UK: the VML (2) • Highly theorised – Particularly in disclosure control • Strong researcher relationship – compulsory training gives initial investment in researcher buy-in • Next stage: full cost-benefit analysis – Planning model in context of new alternatives – CBA to include purpose of RDC Developments in remote access • VML clones being set up in academia – Possibly elsewhere in govt too – No possibility of VML being accessible over internet in near future – Likely to develop into a two-tier system • VML practices and models adopted – for increasing range of data – across wider range of operations Current key concerns • IT – lack of resource – still some basic operational issues unresolved • Delays in increasing access points – partly money, partly IT, partly culture • Demand growth – 30%-50% each year 2003-2008 – Likely to be higher 2009-10 Current potential concerns • Potential in Statistics Act – possibility for ONS’ policies to be challenged – surprising (unwelcome) demands for information? • social data in VML partially a pre-emptive response • New data types bringing new rules • Fragmentation of RDC practice in UK Background concern: fear of the new • Relative risk still poorly understood – Example • VML temporarily closed for potential security flaw • One data area returned to old non-VML solution: letting external visitors log on using ONS staff usernames • VML was re-opened after a week for ONS staff and only three weeks later for external visitors • But the flaw could only be exploited by ONS staff… • Resistance to virtual solutions in favour of familiar – remote access always seen as a limitation despite much better data quality – ‘distributed access’ no substitute for ‘distributed data’ Not current key concerns • Staff resources – Fast training time – Supportive researcher base • researcher buy-in => very lean processes • Methodological issues – RDC-specific SDC methods proving robust • Legal issues – Statistics law so far proving flexible enough to provide reasonable responses to all needs • “reasonable”=ONS and researchers happy that balance between access and confidentiality is fair Summary • Clear legislative model and strong theoretical basis – policy decisions relatively easy • Main difficulty for ONS is managing expansion of demand – meeting ONS internal needs (just, for now) – long way off meeting external demands Contact • Felix Ritchie felix.ritchie@ons.gsi.gov.uk • Microdata Analysis and User Support maus@ons.gsi.gov.uk VML resources Target June 09 Current Minimum G6 G7 G7 SEO HEO/RO Strategic management Operational management HEO/RO HEO/RO EO Operations Support AO/AA Operations and analysis Strategic resources