Statistics, Development and Human Rights Session I-Pa 8c Challenges and Lessons of a National Policy of Data Protection in the Field of Statistics Paul CHEUNG Montreux, 4. – 8. 9. 2000 Statistics, Development and Human Rights Challenges and Lessons of a National Policy of Data Protection in the Field of Statistics Paul CHEUNG Government Chief Statistician, Singapore Department of Statistics 100, High Street #05-01 The Treasury 179434 Singapore, Singapore T. + 65 332 7691 F. + 65 334 3464/7689 paul_cheung@singstat.gov.sg ABSTRACT Challenges and Lessons of a National Policy of Data Protection In the Field of Statistics Singapore is one of the few developing countries that has progressively utilized a networked system of databases as the cornerstone of the national statistical infrastructure. Databases on people, housing, commercial establishments, and land provide multi-dimensional information for statistical compilation and analysis. The availability of the population database, for example, has enabled us to complete successfully our first-ever register-based Census for the year 2000. In moving towards this system of databases, a coherent national policy of data protection has to be established. The safeguard of data privacy is particularly important in our decentralized statistical system, where multiple agencies are involved in data collection and compilation activities. Drawing from the experiences of countries with strong data protection traditions, Singapore has firmly established a progressive data protection regime covering the entire statistical system. The principles and provisions of our data protection regime are described in the paper. RESUME Défis et leçons d’une politique nationale de protection des données dans le domaine des statistiques Singapour est l’un des rares pays en voie de développement à avoir progressivement utilisé un système de banques de données reliées en réseau comme pierre angulaire de l’infrastructure statistique nationale. Des banques de données sur les personnes, le logement, les établissements commerciaux et le sol fournissent des informations multidimensionnelles aux fins de compilation et d’analyse statistiques. La disponibilité de la banque de données sur la population, par exemple, nous a permis de réaliser avec succès notre tout premier recensement sur la base des registres pour l’année 2000. En passant à ce système de banques de données, il convient d’instaurer une politique nationale cohérente de protection des données. La sauvegarde du caractère privé des données est particulièrement importante dans notre système statistique décentralisé, où de multiples agences sont impliquées dans les activités de collecte et de compilation des données. Tirant un enseignement de l’expérience de pays affichant une tradition établie en matière de protection des données, Singapour a instauré un régime de protection des données progressif englobant l’ensemble du 2 Montreux, 4. – 8. 9. 2000 Statistics, Development and Human Rights système statistique. Les principes et dispositions de notre régime de protection des données font l’objet d’une description dans ce document. 1. Introduction With the advent of modern information technology, the National Statistical Offices (NSOs) in developing countries are moving rapidly towards a new platform of capturing, editing and archiving statistical data. Personal computers, client-server systems, hand-held data entry units, GPS devices, and many other modern gadgets provide these countries with the technological capabilities in data capture and storage never seen before1. The integration of multiple datasets to create new information, something that is commonly done in some developed countries, is now technically possible and can be readily implemented. While the hardwares have improved and new capabilities are nutured, the NSOs are now confronted with new challenges in developing a set of ethical codes to go side by side with these technological advancements. A key ethical concern centers on the protection of data privacy rights of the companies and individuals who are the data suppliers. The need for data privacy protection has probably existed for as long as data are being collected for statistical purposes on a voluntary basis. The promise of privacy protection is part of the social contract between the NSOs and the respondents. This social contract specifies that, in exchange for the willingness of the respondents to reveal valued information to the government, the data collected will be adequately protected and will be used only for legitimate and approved purposes. The perception by the respondents of whether a NSO could be trusted in fulfilling the provisions of the social contract, through the adoption of strict standards on confidentiality of individual returns, would have a significant impact on their willingness to co-operate with the NSO. As more data are being amassed, a new area of responsibility on database and data transmission security has arisen for the NSOs in dealing with data privacy. This responsibilty has three dimensions. First, the NSOs has to deal with the security of the databases and the information contained therein. Second, it has to manage accessibility and usage as huge amounts of confidential data are at stake. Third, it has to address the concerns of the respondents that the process of gathering and compiling the data are secured enough as new modes of data collection has emerged They have placed implicit trust in the statistical system that the detailed information provided by them through the mail or the internet would not get loss along the way. A new road map would be required for the NSOs to address these three areas of responsibilities. This paper presents Singapore’s experiences in exploiting information technology for data management and in addressing data privacy issues. It has two parts. It first looks at the need for data protection in the evolution of Singapore’s statistical system : the tendency towards greater decentralization, the proliferation of databases, and the reliance on technology to facilitate submission of data from the respondents. Drawing from the experiences of countries with strong data protection traditions, Singapore has firmly established a progressive data protection regime covering the entire statistical system. The principles and provisions of our data protection regime are described in the second part of the paper. 1 A good example is the project by ESCAP on testing new information technology applications in the developing countries. GPS devices were deployed in Bangladesh for mapping activities. Hand held devices were tested for data entry in the Philippines. The OCR technology was used in Indonesia for data capture in their surveys. Similar examples could be found in other developing countries. 3 Montreux, 4. – 8. 9. 2000 Statistics, Development and Human Rights 2. Increasing Need for Data Protection 2.1 Data Protection in a Decentralised System In Singapore, a coherent national policy of data privacy protection is particularly important in view of its decentralised statistical system. A large number of statistical agencies as well as government departments and statutory boards collect and compile official statistics. Many of them deal directly with the respondents by issuing questionnaires. The advantages of a decentralized system are many. However, it carries with it a much higher risk of data leakage as there are more data centers and many more people are involved in handling the data. To coordinate the statistical activities in this decentralized system, the Government Chief Statistician is designated by law as the national statistical coordinator, with the Singapore Department of Statistics (DOS) serving as the coordinating center. The adoption of sound data protection principles and practices across the system is recognised as an important task to maintain the high level of trust that the general public places on official statistics. 2.2 Data Protection in Record Linkage and Database Development The need for greater data protection has become more pertinent in recent years because of the prevalence of IT applications in Singapore’s administrative and statistical system. The rapid introduction of powerful computer workstations, the availability of data management systems, the usage of identification variables in different databases, and the capacity to store huge amounts of data has made record linkages technically possible to implement within a shorter time period. The high quality of Singapore’s administrative data has also made it a viable source of data for statistical purposes. With the use of common protocols and national statistical standard classifications in the computer systems of various government agencies, record linkage has become a viable means through which data from different sources could be integrated. Singapore has now developed databases on people, housing, commercial establishments, and land which jointly provide multi-dimensional information for statistical compilation and analysis. These databases are the National Database of Dwellings (NDD), Household Registration Database (HRD), Commercial Establishment Information System (CEIS) and the Integrated Land Use Database (ILUD). The availability of the population database, for example, has enabled us to complete successfully our first-ever register-based Census for the year 2000. These statistical databases are updated continuously, by linking with existing government databases through unique identification numbers. While the move towards a networked system of databases brings about savings in costs and respondent burden, it also means that a breach in data security will have more serious consequences, be it an illegal access to the data or the accidental or intentional alteration or destruction of the data. This is because a great deal of information on persons and establishments are stored together in one way or another with a unique identification code attached to it. 2.3 Data Protection with New Modes of Data Transmission Given the increasing usage of Internet services among households and commercial establishments, the Singapore statistical system has to respond to the popularity of this new technology and to evolve our data collection strategies. DOS has introduced two new internet-based systems to facilitate the collection and capturing of statistical information in its establishments surveys and population census. The first service, known as ‘E-Survey’, was introduced in August 1999. It allows companies and businesses to submit statistical information via Internet and is now made available to all the 4 Montreux, 4. – 8. 9. 2000 Statistics, Development and Human Rights respondents of the annual surveys of commerce and services. The surveys cover about 16,000 business establishments and were previously conducted using the conventional mail-in and mailback method. The E-survey provides respondents with a convenient, fast and easy alternative to mail and fax submission. The second service is the adoption of the Internet to collect information from households via the Internet for the Population Census 2000. With the database in the background, households are able to check the records in the database for census enumeration and to update them if necessary. New information can also be provided through the Internet. While such initiatives provide substantial manpower savings and benefits, they raise new security issues, such as: • Ensuring security of data stored on the server; • Ensuring login security to prevent unauthorised access to information; and • Ensuring that data is not intercepted as it is sent over the Internet. 3. Data Protection Measures Singapore’s statistical system employs a three-prong approach in data protection, namely legal measures, administrative measures and technical measures. These ensure that data are protected against illegal access and use, the accidental or intentional alteration or destruction, intentional or accidental disclosure, as well as theft. Experiences have shown that these measures are effective within the Singapore context. 3.1 Legal Measures The Statistics Act is the principal piece of legislation governing the collection, compilation and publication of statistics by DOS and gazetted Research and Statistics Units (RSUs) in the public sector. The Statistics Act empowers DOS and the RSUs to collect and access information. Concurrently, the Statistics Act contains several provisions for data protection. The guarantees on data confidentiality apply to the whole Singapore Statistical System. No other legislation has the power to override these guarantees. The main provisions of the Statistics Act on data protection are summarised below: (a) Information collected is to be used only for statistical purposes. The release of data on individual persons or establishments for the purpose of administrative decision-making, or other similar purposes is forbidden. (b) Data is to be published only in a way that prevents the identification of the individual person, household, company or institution, unless their written authorisation is obtained. Thus data are released in aggregate form, and cases where responses might enable an indirect identification were grouped into broad categories to prevent disclosure. (c) Where the information had been obtained from a third party who is bound under any written law, which restricts the disclosure of information so obtained, such information can be disclosed only with the written permission of that third party. (d) If information is also obtainable under other written law that restricts the disclosure of information obtained, the Minister can provide for the application of that law, with or without adaptations. (e) While DOS can request identifiable statistical information from RSUs and other agencies specified in the Statistics Act, the transfer of individual data from DOS to RSUs and other government agencies is not allowed. Also, RSUs are not allowed access to individual records held by other RSUs. (f) Anyone found violating the Statistics Act is liable on conviction to a fine not exceeding $5,000 or to imprisonment for a term not exceeding one year or to both. 5 Montreux, 4. – 8. 9. 2000 Statistics, Development and Human Rights Another legislation that provides data protection at a more general level is the Official Secrets Act, which prohibits government servants from releasing official information to unauthorized persons. While thus far no one has been prosecuted under the Statistics Act for the release or abuse of information, the Official Secrets Act has been invoked in a number of cases for unauthorized release of official information. Thus, within the government service, there is clear understanding that legal provisions are taken seriously and that illegal behaviors will be prosecuted to the full extent of the law. 3.2 Administrative Measures While the legal measures provide clear guidelines on accountabilities and penalties in case of breach of data security, it is necessary to have general administrative measures in place to further reduce the risk of disclosure. Examples of such measures include: (a) the induction of new employees to the organizational culture which emphasises the importance and need for data protection; (b) keeping critical files kept under lock and key; (c) classifying information or material according to their sensitivity and security implications, and accord the appropriate level of protection; and (d) enforcing “need-to-know” principle whereby the knowledge, possession or access to information or data is strictly confined to authorised persons. To limit the linkages with administrative records to what is necessary for the production of statistics, DOS undertakes careful evaluation of the nature of the records to be linked. Linkage is permitted provided certain conditions are fulfilled : first, the linkage is for meaningful statistical purposes consistent with the Statistics Act; second, confidentiality provisions of relevant laws are followed; third, there is substantial cost or respondent burden savings; fourth, data timeliness is improved, and most importantly, fifth, there is no detriment to the privacy of the respondents. The extensive use of data from external sources for enhancing statistical databases on individual persons has necessitated the upgrading of administrative measures for data protection. There is significant merging of new information and updating of more current information via record linkage each month. The majority of the datasets received from external agencies are in the form of tapes, diskettes and cartridges and are collected from the agencies in person by the Department’s officers. This prevents any tampering of the datasets via electronic means. For some of the datasets that are received electronically, these are transmitted via an established secure network. Before the uploading and merging of individual information is undertaken, only two authorised officers can register the datasets by keying in the system input identification code and date. This prevents any unauthorised uploading or updating of the information to the database. Protocols have also been established for review and endorsement by the appropriate officers for any change in processing of records via record linkage and for each transaction to upload, update or process the source data. In addition, the database that stores individual information is located at a secure physical site and is not connected to any external network. This physical site has restricted access and is out of bounds to all unauthorised visitors. Access rights to the database are limited to only a few officers who are directly involved in managing and maintaining the database. The authorised users are briefed and screened according to an established set of security protocols. They are given unique userids and passwords which are revoked after 3 months if there is no usage throughout. Any change in procedures for granting of access rights have to be reviewed and endorsed by the database owner in the Department. 6 Montreux, 4. – 8. 9. 2000 Statistics, Development and Human Rights Strict measures are also taken in the management and utilisation of statistical databases on companies and businesses. For data merging and record linkage, data from external sources are put through validity checks and verified to be error-free before being merged and updated into the database. Even within the Department, user access to the databases are restricted by user classification with controlled information accessibility. 3.3 Technical Measures With IT becoming an integral part of DOS’s organisational processes, IT security controls are playing an increasingly important role in data security. At the general level, examples of such controls are: (a) Computer systems require users to identify themselves using unique userids and passwords. Passwords are changed at least once every 30 days. Employees are not to disclose or share their userids and passwords. (b) Firewalls are installed to insulate the connections between internal systems and external systems against unauthorised entry. (c) The sender of classified or sensitive email has to know the authority of the intended recipient to receive such email. A secure email system employing government-approved encryption standards is deployed to send or receive sensitive email. (d) Laptop and other portable computing devices used for official work are to be physically locked in a desk or cabinet when unattended. Computer systems are installed with devices and programs to prevent the propagation and execution of viruses and other harmful codes. In addition, specific measures are implemented to provide added protection for statistical data collections conducted via the internet. For business surveys conducted via the web-based E-Survey, access accounts for respondents are pre-created. The respondents are informed of their unique company reference number and randomly generated Survey PIN through the mail, together with a hard copy of the survey forms. These two keys, the unique company reference number and Survey PIN, are required before respondents are allowed to log in to the E-Survey server. The E-survey makes use of the Secure Sockets Layer (SSL) encryption protocol, to ensure that the information transferred over the Internet is secure and protected. The E-Survey is one of the modules under the Survey Answering Guide Expert (SAGE) project initiated by DOS. SAGE utilizes intelligence systems technology, form processing technology and Intelligent Character Recognition (ICR) technology in the system design. The SAGE is a sophisticated software tool which allows end users to design, create and maintain survey systems. It operates in a Windows-based environment with user friendly Graphical User-Interfaces. The SAGE System supports multiple modes of data collection via mail (paper), fax and Internet. One of the major components of the SAGE system is the Security Manager (SM). This module defines and governs the security and access control aspects of the entire system. It controls the list of users who are granted access to SAGE and manages the access authorisation level within the different SAGE modules For the Population Census 2000, a 20% sample survey is conducted to collect additional topics, using a tri-modal data collection strategy, viz internet enumeration, computer-assisted telephone interview (CATI) and field work. Singapore is the first country in the world to collect the Census information from households via the internet with on-line database update. In taking this risky step, DOS has put in place an integrated system of security measures to ensure confidentiality and security of the information provided. These include the following: 7 Montreux, 4. – 8. 9. 2000 Statistics, Development and Human Rights (a) (b) (c) (d) (e) To ensure confidentiality, all selected households receive a notification letter with a unique, randomly generated password and uniquely generated House IDs. The passwords are generated based on a unique algorithm and uses 128 byte encryption techniques. Apart from the password and House ID, respondents need to key in the Unique Identification Number (UIN) of two persons living in the house before they can log in and retrieve the Internet Census questionnaire and their household record in the database via the Census web site. Single-member households need only key in their password, House ID and personal UIN to log in and access the form. The checking of the password is performed in a secure manner with Privylink. This uses the password as a key to generate a random sequence at the respondent’s computer. This random sequence is then transmitted over the Internet. As the respondent’s password is not sent across the Internet, the password cannot be intercepted and read. At the server end, the random sequence received is encrypted with a key server. If the decrypted sequence matches, the respondent is authenticated and granted access. All the data entered by respondents and transmitted over the Internet are encrypted. The on-line database is protected from hacking by a DMZ (Demilitarised Zone) utilising two layers of computer firewalls. 4. Data Access and Utilisation In general, aggregated statistical data at broad levels are disseminated for general information through hardcopy publications, on-line time series database service and the websites of the Department and various other government agencies. More detailed statistical data could be provided to specific users upon request, but care is taken to ensure that breakdowns into more disaggregated levels do not inadvertently reveal individual information. For a small economy like Singapore where certain industries are dominated by a few companies, data for such industries are not released at all. The data for these industries are combined with those for related industries or the residual group of industries for dissemination. As a general rule of thumb, aggregated statistics are released only where the number of establishments is more than ten. Access to individual information in statistical databases are restricted to authorized officers within the section responsible for the management of the databases. No access is given to any person outside the section. For the statistical databases on dwellings and establishments which are used as sampling frames, sample selection services are provided to external agencies for a fee. Internally, the computerized sampling system performs stratification, where relevant, by the appropriate variables such as type of dwelling in the case of households and industry for establishments. For sampling of dwellings, only addresses of selected dwellings are provided, but not individual information such as persons residing in them. For sampling of establishments, only non-sensitive individual information such as names, addresses and business activities are provided. In the past, datasets from selected surveys had been provided to university researchers after removing individual identifications. This however is not commonly done, and the datasets are provided under stringent conditions. The present norm is that should a researcher wish to utilize individual level data for analysis, the programming will be done on site by authorized staff on behalf of the researcher. This practice of limited access has been criticized as being overly restrictive, and is contrary to the practices in some developed countries. Currently, there is no plan to change this conservative policy on public use of datasets. 5. Concluding Remarks 8 Montreux, 4. – 8. 9. 2000 Statistics, Development and Human Rights Safeguarding of data is of paramount importance to the Singapore statistical system. Intensive IT usage and advanced data applications have heightened data security concerns. Singapore has firmly established a progressive data protection regime, which employs legal, administrative, and technical measures to safeguard data security. The provisons are similar to the developed countries with tight data protection measures. It should be noted that such provisions are by no means final, and they need to be reviewed regularly in response to changing technology, respondents’ concerns and user demand for data access. 9 Montreux, 4. – 8. 9. 2000