Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session Oct 2015 Finnish administrative registers • several comprehensive national registers • contain unit level data on individuals, families, housing, enterprises • compiled and maintained for administrative or statistical purposes, e.g. – Population Register Centre (VRK) – Population information system – Social Insurance Institution (KELA) – Registers on obtained social benefits – National Institute for Health and Welfare (THL) – Medical Birth Register, – Care Registers for Social Welfare and Health Care (HILMO), – Finnish Cancer Register – Ministry of Labour (TEM) – Register over job seekers – Statistics Finland (Tilastokeskus) 2 21.9.2015 Statitics Finland /Researcher Services Secondary usage of administrative registers • Production of official statistics is to a large extent based on registers in Finland - the population and housing census has been based totally on register sources since 1990 - Handbook: Use of Registers and Administrative Data Sources for Statistical Purposes – Best Practices of Statistics Finland • Register-based research – 20 % of doctoral thesis’ within medicine in Finland include data from national registers 3 21.9.2015 Statitics Finland /Researcher Services 4 21.9.2015 Statitics Finland /Researcher Services Prerequisites for register-based research • Common personal identification number in all registers – first used in 1964 ( between 1964-1970 two different systems) – since 1971 a digital population register – all Finns have a PIN data from different registers can be linked by PIN e.g. for research purposes • Legislation that allows the use of confidential personal data for scientific research • Trust in register keepers and researchers • Comprehensive, well documented registers 5 21.9.2015 Statitics Finland /Researcher Services Legislative basis for research use of data from Statistics Finland - Statistics Act (280/2004) - In 2013 the Statistics Act was amended to better facilitate the use of data gathered at Statistics Finland for research purposes. - New objective of the Act – To extend the use of the data collected for statistical purposes in scientific studies and statistical surveys on social conditions. - Possibility for researchers to gain access to confidential data from which only the direct identifiers have been removed. – Before 2013 statistical authorities could not give permission to such confidential data from which the statistical unit could be indirectly identified. – Gain access = see and analyze data by a remote access system 6 21.9.2015 Statitics Finland /Researcher Services Remote access system (FIONA) - In use at Statistics Finland since 2009, development project 2014-2015 - Model taken from Sweden, Denmark and the Netherlands - Researchers use data on Statistics Finland’s server at their own workplace via a secured Internet connection, data remains at SF - Researchers use a Windows remote desktop, and have access to the data they have obtained permission to as well as to metadata - The researchers have access to wide range of statistical programs : STATA, SPSS, R, SAS, Python Anaconda, … - Each research project has its dedicated folders and storage space in the system - Technical maintenance of the FIONA-system transferred to CSC-It Centre for Science in 2015 - Number of users and data sets in the remote access system is growing steadily, currently about 150 active users 7 21.9.2015 Statitics Finland /Researcher Services Confidentiality - Research data sets are stored on Statistics Finland’s /CSC’s servers - Only mouse, keyboard and graphic signals are transferred - Access to the system only from preapproved IP-addresses - A disposable SMS password is sent each time the researcher logs in to FIONA - All data transfers from and to FIONA are handled by personnel at the Researcher Services of SF – Outputs are checked so that direct or indirect identification is not possible and files are saved for possible future reference - Access to data is terminated when the permit for the project expires - FIONA environment is separated from the production network - The system will be audited in fall 2015 after being transferred to CSC 8 21.9.2015 Statitics Finland /Researcher Services A typical process in applying for sensitive research data 9 A researcher applies for a licence to access data for a research project The application must include a research plan and a pledge of secrecy The Ethics Committee is consulted in cases involving large datasets with confidential data If the data can be given out the licence is granted (possibly with modifications) A contract is signed specifying the dataset and the fee as well as the date of delivery The data is put together, edited and uploaded to the remote access system The researcher uses a remote connection to analyse the data and sends the results to Research Services The results are checked to make sure that no units (persons, companies) can be identified The results are sent to the researcher and they can be used in publications 21.9.2015 Statitics Finland /Researcher Services Present process for obtaining register data for research Searching for data sets and applying for permits from several different authorities, with varying practices RESEARCHER Researcher responsible of data security and disposal of data sets Delivering data using varying practices Possible corrections and re-sending § @ Authority @ § Internet § §@ Authority @ Authority Data protection Authority § 21.9.2015 •Handling permit applications •Control and specification •Compiling data-sets Statistics Finland § Statitics Finland /Researcher Services 10 Researcher FMAS Remote access system Services that require permit Services that require registration • Remote desktop for analysing data (programs and tools) • Separated server space for data and metadata • Output service for results, Input service for researcher’s data • Centralized digital permit application service Interface service for data and meta data, Pseudonymization Organization A Public services Organization B • Data catalogue • Helpdesk for research and tuition Administration services for user rights Organization C Organization D Organization E - Commonly agreed metadata standards – Data warehouse 21.9.2015 Statitics Finland /Researcher Services - Archive of multiple user files 11 Linking data from different sources - Present method – Register keepers send the data requested by the researcher over a secure connection , by recommended mail, with courier services etc. to Statistics Finland – The data includes the Finnish PIN or BIN ( or a pseudocode created by the register keeper and the key is sent separately) – Statistics Finland creates a project specific pseudocode, changes the PIN (BIN) in the research data sets and uploads the data in the remote access system - Aim – Pseudocodes should be used in all data deliveries – Register keepers should be able to upload their data direct to the remote access system using a standard pseudonymization method 12 21.9.2015 Statitics Finland /Researcher Services Pseudonymization –project specific FIONA nvaoepanwzl, age 15 bleokldawgs, age 44 nvaoepanwzl, woman bleokldawgs, man Statistics Finland Common9843 Project 211 Project 211 123456-111A, woman 234567-222C, man Other registerkeeper Common9843 123456-111A, age 15 Project 211 13 21.9.2015 234567-222C, age 44 Statitics Finland /Researcher Services De-identification Deidentification nvaoepanwzl, woman bleokldawgs, man nvaoepanwzl, age 15 bleokldawgs, age 44 To be developed…. - We see a problem with the set pseudocodes of the ’ready-made’ data files • Solution 1: Create project specific pseudocode also for projects that use the ’ready made’ – Problem: A copy of ’ready made’ data sets has to be made for each project -> much excessive disc space is needed • Solution 2: Send the seed code that has been used for the ’ready made’ files to the other register keepers – Problem: The key PIN /BIN - pseudocode used by Statistics Finland will be widely known 14 21.9.2015 Statitics Finland /Researcher Services