Challenges and lessons of a national policy of data protection in the

advertisement
Statistics, Development and Human Rights
Session I-Pa 8c
Challenges and Lessons of a National Policy of
Data Protection in the Field of Statistics
Paul CHEUNG
Montreux, 4. – 8. 9. 2000
Statistics, Development and Human Rights
Challenges and Lessons of a National Policy of Data
Protection in the Field of Statistics
Paul CHEUNG
Government Chief Statistician, Singapore Department of Statistics
100, High Street #05-01
The Treasury
179434 Singapore, Singapore
T. + 65 332 7691 F. + 65 334 3464/7689
paul_cheung@singstat.gov.sg
ABSTRACT
Challenges and Lessons of a National Policy of Data Protection In the Field of Statistics
Singapore is one of the few developing countries that has progressively utilized a networked
system of databases as the cornerstone of the national statistical infrastructure. Databases on
people, housing, commercial establishments, and land provide multi-dimensional information for
statistical compilation and analysis. The availability of the population database, for example, has
enabled us to complete successfully our first-ever register-based Census for the year 2000. In
moving towards this system of databases, a coherent national policy of data protection has to be
established. The safeguard of data privacy is particularly important in our decentralized statistical
system, where multiple agencies are involved in data collection and compilation activities. Drawing
from the experiences of countries with strong data protection traditions, Singapore has firmly
established a progressive data protection regime covering the entire statistical system. The
principles and provisions of our data protection regime are described in the paper.
RESUME
Défis et leçons d’une politique nationale de protection des données dans le domaine des
statistiques
Singapour est l’un des rares pays en voie de développement à avoir progressivement utilisé
un système de banques de données reliées en réseau comme pierre angulaire de l’infrastructure
statistique nationale. Des banques de données sur les personnes, le logement, les établissements
commerciaux et le sol fournissent des informations multidimensionnelles aux fins de compilation et
d’analyse statistiques. La disponibilité de la banque de données sur la population, par exemple,
nous a permis de réaliser avec succès notre tout premier recensement sur la base des registres pour
l’année 2000. En passant à ce système de banques de données, il convient d’instaurer une politique
nationale cohérente de protection des données. La sauvegarde du caractère privé des données est
particulièrement importante dans notre système statistique décentralisé, où de multiples agences
sont impliquées dans les activités de collecte et de compilation des données. Tirant un enseignement
de l’expérience de pays affichant une tradition établie en matière de protection des données,
Singapour a instauré un régime de protection des données progressif englobant l’ensemble du
2
Montreux, 4. – 8. 9. 2000
Statistics, Development and Human Rights
système statistique. Les principes et dispositions de notre régime de protection des données font
l’objet d’une description dans ce document.
1. Introduction
With the advent of modern information technology, the National Statistical Offices (NSOs) in
developing countries are moving rapidly towards a new platform of capturing, editing and archiving
statistical data. Personal computers, client-server systems, hand-held data entry units, GPS devices,
and many other modern gadgets provide these countries with the technological capabilities in data
capture and storage never seen before1. The integration of multiple datasets to create new
information, something that is commonly done in some developed countries, is now technically
possible and can be readily implemented.
While the hardwares have improved and new capabilities are nutured, the NSOs are now
confronted with new challenges in developing a set of ethical codes to go side by side with these
technological advancements. A key ethical concern centers on the protection of data privacy rights
of the companies and individuals who are the data suppliers. The need for data privacy protection
has probably existed for as long as data are being collected for statistical purposes on a voluntary
basis. The promise of privacy protection is part of the social contract between the NSOs and the
respondents. This social contract specifies that, in exchange for the willingness of the respondents
to reveal valued information to the government, the data collected will be adequately protected and
will be used only for legitimate and approved purposes. The perception by the respondents of
whether a NSO could be trusted in fulfilling the provisions of the social contract, through the
adoption of strict standards on confidentiality of individual returns, would have a significant impact
on their willingness to co-operate with the NSO.
As more data are being amassed, a new area of responsibility on database and data
transmission security has arisen for the NSOs in dealing with data privacy. This responsibilty has
three dimensions. First, the NSOs has to deal with the security of the databases and the information
contained therein. Second, it has to manage accessibility and usage as huge amounts of confidential
data are at stake. Third, it has to address the concerns of the respondents that the process of
gathering and compiling the data are secured enough as new modes of data collection has emerged
They have placed implicit trust in the statistical system that the detailed information provided by
them through the mail or the internet would not get loss along the way. A new road map would be
required for the NSOs to address these three areas of responsibilities.
This paper presents Singapore’s experiences in exploiting information technology for data
management and in addressing data privacy issues. It has two parts. It first looks at the need for data
protection in the evolution of Singapore’s statistical system : the tendency towards greater
decentralization, the proliferation of databases, and the reliance on technology to facilitate
submission of data from the respondents. Drawing from the experiences of countries with strong
data protection traditions, Singapore has firmly established a progressive data protection regime
covering the entire statistical system. The principles and provisions of our data protection regime
are described in the second part of the paper.
1
A good example is the project by ESCAP on testing new information technology applications in the developing
countries. GPS devices were deployed in Bangladesh for mapping activities. Hand held devices were tested for data
entry in the Philippines. The OCR technology was used in Indonesia for data capture in their surveys. Similar
examples could be found in other developing countries.
3
Montreux, 4. – 8. 9. 2000
Statistics, Development and Human Rights
2. Increasing Need for Data Protection
2.1 Data Protection in a Decentralised System
In Singapore, a coherent national policy of data privacy protection is particularly important in
view of its decentralised statistical system. A large number of statistical agencies as well as
government departments and statutory boards collect and compile official statistics. Many of them
deal directly with the respondents by issuing questionnaires. The advantages of a decentralized
system are many. However, it carries with it a much higher risk of data leakage as there are more
data centers and many more people are involved in handling the data. To coordinate the statistical
activities in this decentralized system, the Government Chief Statistician is designated by law as the
national statistical coordinator, with the Singapore Department of Statistics (DOS) serving as the
coordinating center. The adoption of sound data protection principles and practices across the
system is recognised as an important task to maintain the high level of trust that the general public
places on official statistics.
2.2 Data Protection in Record Linkage and Database Development
The need for greater data protection has become more pertinent in recent years because of the
prevalence of IT applications in Singapore’s administrative and statistical system. The rapid
introduction of powerful computer workstations, the availability of data management systems, the
usage of identification variables in different databases, and the capacity to store huge amounts of
data has made record linkages technically possible to implement within a shorter time period. The
high quality of Singapore’s administrative data has also made it a viable source of data for statistical
purposes. With the use of common protocols and national statistical standard classifications in the
computer systems of various government agencies, record linkage has become a viable means
through which data from different sources could be integrated.
Singapore has now developed databases on people, housing, commercial establishments, and
land which jointly provide multi-dimensional information for statistical compilation and analysis.
These databases are the National Database of Dwellings (NDD), Household Registration Database
(HRD), Commercial Establishment Information System (CEIS) and the Integrated Land Use
Database (ILUD). The availability of the population database, for example, has enabled us to
complete successfully our first-ever register-based Census for the year 2000.
These statistical databases are updated continuously, by linking with existing government
databases through unique identification numbers. While the move towards a networked system of
databases brings about savings in costs and respondent burden, it also means that a breach in data
security will have more serious consequences, be it an illegal access to the data or the accidental or
intentional alteration or destruction of the data. This is because a great deal of information on
persons and establishments are stored together in one way or another with a unique identification
code attached to it.
2.3 Data Protection with New Modes of Data Transmission
Given the increasing usage of Internet services among households and commercial
establishments, the Singapore statistical system has to respond to the popularity of this new
technology and to evolve our data collection strategies. DOS has introduced two new internet-based
systems to facilitate the collection and capturing of statistical information in its establishments
surveys and population census.
The first service, known as ‘E-Survey’, was introduced in August 1999. It allows companies
and businesses to submit statistical information via Internet and is now made available to all the
4
Montreux, 4. – 8. 9. 2000
Statistics, Development and Human Rights
respondents of the annual surveys of commerce and services. The surveys cover about 16,000
business establishments and were previously conducted using the conventional mail-in and mailback method. The E-survey provides respondents with a convenient, fast and easy alternative to
mail and fax submission.
The second service is the adoption of the Internet to collect information from households via
the Internet for the Population Census 2000. With the database in the background, households are
able to check the records in the database for census enumeration and to update them if necessary.
New information can also be provided through the Internet.
While such initiatives provide substantial manpower savings and benefits, they raise new
security issues, such as:
• Ensuring security of data stored on the server;
• Ensuring login security to prevent unauthorised access to information; and
• Ensuring that data is not intercepted as it is sent over the Internet.
3. Data Protection Measures
Singapore’s statistical system employs a three-prong approach in data protection, namely
legal measures, administrative measures and technical measures. These ensure that data are
protected against illegal access and use, the accidental or intentional alteration or destruction,
intentional or accidental disclosure, as well as theft. Experiences have shown that these measures
are effective within the Singapore context.
3.1 Legal Measures
The Statistics Act is the principal piece of legislation governing the collection, compilation
and publication of statistics by DOS and gazetted Research and Statistics Units (RSUs) in the public
sector. The Statistics Act empowers DOS and the RSUs to collect and access information.
Concurrently, the Statistics Act contains several provisions for data protection. The guarantees on
data confidentiality apply to the whole Singapore Statistical System. No other legislation has the
power to override these guarantees.
The main provisions of the Statistics Act on data protection are summarised below:
(a) Information collected is to be used only for statistical purposes. The release of data on
individual persons or establishments for the purpose of administrative decision-making,
or other similar purposes is forbidden.
(b) Data is to be published only in a way that prevents the identification of the individual
person, household, company or institution, unless their written authorisation is obtained.
Thus data are released in aggregate form, and cases where responses might enable an
indirect identification were grouped into broad categories to prevent disclosure.
(c) Where the information had been obtained from a third party who is bound under any
written law, which restricts the disclosure of information so obtained, such information
can be disclosed only with the written permission of that third party.
(d) If information is also obtainable under other written law that restricts the disclosure of
information obtained, the Minister can provide for the application of that law, with or
without adaptations.
(e) While DOS can request identifiable statistical information from RSUs and other
agencies specified in the Statistics Act, the transfer of individual data from DOS to
RSUs and other government agencies is not allowed. Also, RSUs are not allowed
access to individual records held by other RSUs.
(f) Anyone found violating the Statistics Act is liable on conviction to a fine not exceeding
$5,000 or to imprisonment for a term not exceeding one year or to both.
5
Montreux, 4. – 8. 9. 2000
Statistics, Development and Human Rights
Another legislation that provides data protection at a more general level is the Official Secrets
Act, which prohibits government servants from releasing official information to unauthorized
persons. While thus far no one has been prosecuted under the Statistics Act for the release or abuse
of information, the Official Secrets Act has been invoked in a number of cases for unauthorized
release of official information. Thus, within the government service, there is clear understanding
that legal provisions are taken seriously and that illegal behaviors will be prosecuted to the full
extent of the law.
3.2 Administrative Measures
While the legal measures provide clear guidelines on accountabilities and penalties in case of
breach of data security, it is necessary to have general administrative measures in place to further
reduce the risk of disclosure. Examples of such measures include:
(a) the induction of new employees to the organizational culture which emphasises the
importance and need for data protection;
(b) keeping critical files kept under lock and key;
(c) classifying information or material according to their sensitivity and security
implications, and accord the appropriate level of protection; and
(d) enforcing “need-to-know” principle whereby the knowledge, possession or access to
information or data is strictly confined to authorised persons.
To limit the linkages with administrative records to what is necessary for the production of
statistics, DOS undertakes careful evaluation of the nature of the records to be linked. Linkage is
permitted provided certain conditions are fulfilled : first, the linkage is for meaningful statistical
purposes consistent with the Statistics Act; second, confidentiality provisions of relevant laws are
followed; third, there is substantial cost or respondent burden savings; fourth, data timeliness is
improved, and most importantly, fifth, there is no detriment to the privacy of the respondents.
The extensive use of data from external sources for enhancing statistical databases on
individual persons has necessitated the upgrading of administrative measures for data protection.
There is significant merging of new information and updating of more current information via
record linkage each month. The majority of the datasets received from external agencies are in the
form of tapes, diskettes and cartridges and are collected from the agencies in person by the
Department’s officers. This prevents any tampering of the datasets via electronic means. For some
of the datasets that are received electronically, these are transmitted via an established secure
network. Before the uploading and merging of individual information is undertaken, only two
authorised officers can register the datasets by keying in the system input identification code and
date. This prevents any unauthorised uploading or updating of the information to the database.
Protocols have also been established for review and endorsement by the appropriate officers for any
change in processing of records via record linkage and for each transaction to upload, update or
process the source data.
In addition, the database that stores individual information is located at a secure physical site
and is not connected to any external network. This physical site has restricted access and is out of
bounds to all unauthorised visitors. Access rights to the database are limited to only a few officers
who are directly involved in managing and maintaining the database. The authorised users are
briefed and screened according to an established set of security protocols. They are given unique
userids and passwords which are revoked after 3 months if there is no usage throughout. Any
change in procedures for granting of access rights have to be reviewed and endorsed by the database
owner in the Department.
6
Montreux, 4. – 8. 9. 2000
Statistics, Development and Human Rights
Strict measures are also taken in the management and utilisation of statistical databases on
companies and businesses. For data merging and record linkage, data from external sources are put
through validity checks and verified to be error-free before being merged and updated into the
database. Even within the Department, user access to the databases are restricted by user
classification with controlled information accessibility.
3.3 Technical Measures
With IT becoming an integral part of DOS’s organisational processes, IT security controls are
playing an increasingly important role in data security. At the general level, examples of such
controls are:
(a) Computer systems require users to identify themselves using unique userids and
passwords. Passwords are changed at least once every 30 days. Employees are not to disclose or
share their userids and passwords.
(b) Firewalls are installed to insulate the connections between internal systems and external
systems against unauthorised entry.
(c) The sender of classified or sensitive email has to know the authority of the intended
recipient to receive such email. A secure email system employing government-approved encryption
standards is deployed to send or receive sensitive email.
(d) Laptop and other portable computing devices used for official work are to be physically
locked in a desk or cabinet when unattended.
Computer systems are installed with devices and programs to prevent the propagation and
execution of viruses and other harmful codes.
In addition, specific measures are implemented to provide added protection for statistical data
collections conducted via the internet. For business surveys conducted via the web-based E-Survey,
access accounts for respondents are pre-created. The respondents are informed of their unique
company reference number and randomly generated Survey PIN through the mail, together with a
hard copy of the survey forms. These two keys, the unique company reference number and Survey
PIN, are required before respondents are allowed to log in to the E-Survey server. The E-survey
makes use of the Secure Sockets Layer (SSL) encryption protocol, to ensure that the information
transferred over the Internet is secure and protected.
The E-Survey is one of the modules under the Survey Answering Guide Expert (SAGE)
project initiated by DOS. SAGE utilizes intelligence systems technology, form processing
technology and Intelligent Character Recognition (ICR) technology in the system design. The
SAGE is a sophisticated software tool which allows end users to design, create and maintain survey
systems. It operates in a Windows-based environment with user friendly Graphical User-Interfaces.
The SAGE System supports multiple modes of data collection via mail (paper), fax and Internet.
One of the major components of the SAGE system is the Security Manager (SM). This module
defines and governs the security and access control aspects of the entire system. It controls the list
of users who are granted access to SAGE and manages the access authorisation level within the
different SAGE modules
For the Population Census 2000, a 20% sample survey is conducted to collect additional
topics, using a tri-modal data collection strategy, viz internet enumeration, computer-assisted
telephone interview (CATI) and field work. Singapore is the first country in the world to collect the
Census information from households via the internet with on-line database update. In taking this
risky step, DOS has put in place an integrated system of security measures to ensure confidentiality
and security of the information provided. These include the following:
7
Montreux, 4. – 8. 9. 2000
Statistics, Development and Human Rights
(a)
(b)
(c)
(d)
(e)
To ensure confidentiality, all selected households receive a notification letter with a
unique, randomly generated password and uniquely generated House IDs. The
passwords are generated based on a unique algorithm and uses 128 byte encryption
techniques.
Apart from the password and House ID, respondents need to key in the Unique
Identification Number (UIN) of two persons living in the house before they can log in
and retrieve the Internet Census questionnaire and their household record in the
database via the Census web site. Single-member households need only key in their
password, House ID and personal UIN to log in and access the form.
The checking of the password is performed in a secure manner with Privylink.
This uses the password as a key to generate a random sequence at the respondent’s
computer. This random sequence is then transmitted over the Internet. As the
respondent’s password is not sent across the Internet, the password cannot be
intercepted and read.
At the server end, the random sequence received is encrypted with a key server. If the
decrypted sequence matches, the respondent is authenticated and granted access.
All the data entered by respondents and transmitted over the Internet are encrypted.
The on-line database is protected from hacking by a DMZ (Demilitarised Zone)
utilising two layers of computer firewalls.
4. Data Access and Utilisation
In general, aggregated statistical data at broad levels are disseminated for general information
through hardcopy publications, on-line time series database service and the websites of the
Department and various other government agencies. More detailed statistical data could be
provided to specific users upon request, but care is taken to ensure that breakdowns into more
disaggregated levels do not inadvertently reveal individual information. For a small economy like
Singapore where certain industries are dominated by a few companies, data for such industries are
not released at all. The data for these industries are combined with those for related industries or
the residual group of industries for dissemination. As a general rule of thumb, aggregated statistics
are released only where the number of establishments is more than ten.
Access to individual information in statistical databases are restricted to authorized officers
within the section responsible for the management of the databases. No access is given to any
person outside the section. For the statistical databases on dwellings and establishments which are
used as sampling frames, sample selection services are provided to external agencies for a fee.
Internally, the computerized sampling system performs stratification, where relevant, by the
appropriate variables such as type of dwelling in the case of households and industry for
establishments. For sampling of dwellings, only addresses of selected dwellings are provided, but
not individual information such as persons residing in them. For sampling of establishments, only
non-sensitive individual information such as names, addresses and business activities are provided.
In the past, datasets from selected surveys had been provided to university researchers after
removing individual identifications. This however is not commonly done, and the datasets are
provided under stringent conditions. The present norm is that should a researcher wish to utilize
individual level data for analysis, the programming will be done on site by authorized staff on
behalf of the researcher. This practice of limited access has been criticized as being overly
restrictive, and is contrary to the practices in some developed countries. Currently, there is no plan
to change this conservative policy on public use of datasets.
5. Concluding Remarks
8
Montreux, 4. – 8. 9. 2000
Statistics, Development and Human Rights
Safeguarding of data is of paramount importance to the Singapore statistical system. Intensive
IT usage and advanced data applications have heightened data security concerns. Singapore has
firmly established a progressive data protection regime, which employs legal, administrative, and
technical measures to safeguard data security. The provisons are similar to the developed countries
with tight data protection measures. It should be noted that such provisions are by no means final,
and they need to be reviewed regularly in response to changing technology, respondents’ concerns
and user demand for data access.
9
Montreux, 4. – 8. 9. 2000
Download