Document 13062272

advertisement
LONDON’S GLOBAL UNIVERSITY SLMS Health Informatics –
Pseudonymisation ISO/TS
25237:2008 Overview
Document Information
Document Name
Author
Issue Date
Approved By
Next review
ISO/TS 25237:2008 Overview
Shane Murphy
02/08/2013
Chair of SLMS IGSG
Three years
Document History
Version
0.1
1.0
Date
13/06/2013
02/08/2013
Summary of change
First draft for discussion
Approved by Chair of SLMS IGSG
SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 1 of 18 ISO/TS 25237:2008 Overview
Contents Document Information ..................................................................................................................... 1 Document History ............................................................................................................................. 1 ISO/TS 25237:2008 Overview .......................................................................................................... 2 Executive Summary ........................................................................................................................ 4 Protection of identities in healthcare ................................................................................................ 4 Future requirements for IG Toolkit V12 ........................................................................................... 4 ISO Technical Requirements ............................................................................................................ 5 Separation of personal data from payload data ............................................................................. 5 Anonymisation ..................................................................................................................................... 5 Pseudonymisation ............................................................................................................................... 5 Reversible Pseudonymisation: ...................................................................................................... 5 Level of assurance for privacy protection .................................................................................... 5 Level 1 .............................................................................................................................................. 6 Level 2 .............................................................................................................................................. 6 Level 3 .............................................................................................................................................. 6 Data Classification .......................................................................................................................... 6 Re-identification of pseudonymised data .................................................................................... 6 Pseudonymisation Service Characteristics ..................................................................................... 7 Minimum requirements: .................................................................................................................. 7 Pseudonymisation Process (method and implementation) .......................................................... 7 Design criteria .................................................................................................................................. 7 Entities in the Model ....................................................................................................................... 7 Model Workflow ............................................................................................................................... 8 Pseudonymisation workflow diagram ............................................................................................... 9 Preparation of data ....................................................................................................................... 10 Pseudonymisation Process ......................................................................................................... 10 Pseudonymisation techniques to protect privacy ......................................................................... 10 Identifiable person variables ........................................................................................................ 11 Aggregation variables ................................................................................................................... 11 Outlier variables ............................................................................................................................ 12 Structured data variables ............................................................................................................. 12 SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 2 of 18 Non-Structured data variables .................................................................................................... 12 Image data ..................................................................................................................................... 13 Inference risk assessment ........................................................................................................... 13 Privacy and security ..................................................................................................................... 13 Re-identification process .................................................................................................................. 13 Re-identification as part of normal procedure ........................................................................... 13 Re-identification as part of an exceptional event ..................................................................... 13 Specification of interoperability of interfaces ............................................................................. 14 Pseudonymisation Policy Framework ............................................................................................ 14 Privacy Policy for each study ...................................................................................................... 14 Operational best practice objectives .......................................................................................... 15 Implementation of practices for re-identification ....................................................................... 16 Appendix 1 ......................................................................................................................................... 17 De-identification Standard for Health and Social Care ................................................................ 17 Standard de-identification processes ......................................................................................... 17 Appendix 2 ......................................................................................................................................... 18 K-anonymity ....................................................................................................................................... 18 SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 3 of 18 Executive Summary
The standard ISO/TS 25237:2008 is linked to ISO27799 – Health Informatics-­‐ Information Security management in health using ISO/IEC 27002. It therefore uses a series of controls to mitigate risks. The standard ISO/TS 25237:2008, is summarised below: •
•
•
•
•
•
•
The standard is considered to be ideal for secondary use of clinical data including research. Defines a basic methodology for pseudonymisation It sets out a policy framework and minimum requirements to manage problem areas, requirements for practices that support integrity and specifications for the planning and implementation of Pseudonymisation services. The workflow specifications also support quality assurance requirements. Policy framework and minimum requirements for controlled re-­‐identification. Standards for interfaces are included to facilitate the necessary assurances for interoperability. Lastly, risk assessment for re-­‐identification provides guidance to mitigate risks to an acceptable level. The standard confines itself to the protection of personal data stored in databases. There are a set of standard terms and definitions. Protection of identities in healthcare
De-­‐identification is the process to remove the association between identifying data and the data subject. Information Standards Board for Health & Social Care have recommended the use of mandatory techniques, including K-­‐anonymity to ensure de-­‐identification is sufficiently robust – see Appendix 2 and this will be part of IG Toolkit v12 requirements. Pseudonymisation is a method to link pseudonymised data to the same data subject across several data records, or information systems without revealing the data subject’s identity. Pseudonymisation can be either reversible or irreversible. Anonymisation does not enable the linking of data to the same subject across several data records, or information systems. Consequently it is impossible to re-­‐identify anonymised data. The organisational objective of using Pseudonymisation is to mitigate legal, reputational and financial risk factors by preventing the unauthorised or accidental disclosure of personal data about a healthcare data subject. Future requirements for IG Toolkit V12
Information Standards Board for Health & Social Care (ISB1523 Amd 20/2010) have published an Anonymisation standard. The document sets out their requirements for a de-­‐identification standard. It is summarised in Appendix 1. It appears that this standard MUST be followed by PIs and Data Controllers in V12 of the IG Toolkit from 2014/15. SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 4 of 18 ISO Technical Requirements
Separation of personal data from payload data
Conceptually personal data can be split into two parts •
•
Payload data – Payload data consists of anonymous data (data that does not directly identify an individual) Identifying data – data that is capable of identifying a known individual Pseudonymisation techniques should seek to reduce the level of identifying data by aggregating such data into groups All record releases must consider the following risk analysis to evaluate: •
•
•
•
Purpose for the data release Minimum dataset necessary for that purpose Disclosure risks including re-­‐identification Release strategies available A strategy of identification concealment should be agreed upon. This is derived from the release process and the risk analysis. Anonymisation
Is simply the process that removes the association between the identifying data set and the data subject. This can be accomplished through two different methods: a) Removing or changing characteristics in the associated characteristics-­‐data-­‐set. The association is therefore no longer unique and relates to several different data subjects; b) Increase the population in the data subjects so that the association between the data set and the data subject is no longer unique. Pseudonymisation
Reversible Pseudonymisation:
Reversible Pseudonymisation: requires a controlled environment and standard operating procedures. Security needs to be utilised in relation to the methods used to reverse the pseudonymisation. A key or look up table may be used to reverse the Pseudonymisation process. Level of assurance for privacy protection
The level of risk for unauthorised re-­‐identification can be assessed and estimated. It should be clear as to what is meant and understood by identifiablity. This being the case then risks will be easier to assess. The re-­‐identification risk assessments should be carried out in line with relevant related IG Policy documents. Consideration of privacy impact assessments (PIA) should also be involved where appropriate. SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 5 of 18 The assurance levels must take into account: Level 1 – Risks associated with the person identifying data elements: Is utilised when the capabilities of a potential attacker possesses poor capabilities. Level 2 – Risks associated with aggregating data variables: Is utilised in instances when the potential attacker is deemed to have other external knowledge that they are able to utilise and match against the pseudonymised data to obtain unauthorised knowledge in respect of data subject(s). The risk analysis model should include assumptions about attacks and attackers and the likelihood of the availability of external data sets Level 3 -­‐ Risks associated with rare data in the populated database: Rare data is also referred to as grey data in the Caldicott 2 report. This is information when processed together with other data can indirectly lead to identification of a data subject. Rare data is not necessarily medical data but is other data such as post code data or even observational material that can lead to the identification of a data subject. ISO/TS 25237:2008 states that Level 3 is a very difficult area to achieve a confident level of assurance. The running of risk analyses on populated models is necessary to provide a high level of anonymity. Data Classification
Covers the following: payload data; observational data; pseudonymised data; anonymised data; research data; healthcare identifiers; victims of violence and VIPs and genetic information The main requirements identified, are that the concepts of data protection, data privacy and confidentiality are not diminished. The requirement for informed consent and the difficulties that this may present are recognised. However, for the use of pseudonymised data, research may only proceed with the data subject being involved. For the use of anonymised data this requirement is relaxed. For the use of both research data and secondary use of personal information there is a need for risk assessment. The need for risk assessment refers to each additional secondary use. Genetic information must be afforded the same level of protection as other comparable sensitive medical data. Informed consent should be provided so that the healthcare data subject’s reasonable expectations are managed. Re-identification of pseudonymised data
The following coded values are to be used in the event that re-­‐identification is required: 1)
2)
3)
4)
5)
6)
7)
Data integrity verification/validation Data duplicate record verification/validation Request for additional data Link to supplemental information variables Compliance audit Communicate significant findings Follow-­‐up research SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 6 of 18 The approach provides researchers with a controlled way of cleansing data with the ability to reference source identifiers for the situations in which such re-­‐identified data is required. Pseudonymisation Service Characteristics
Minimum requirements:
a) Healthcare research data subjects must have assurances that their information is processed confidentially (NHS Care Record Guarantee; UCL Data Sharing Agreement; UCL staff contracts; UCL Contractor Non-­‐Disclosure agreements); b) The service must provide physical security protection (UCL Physical Security Risk Assessment; The Pseudonymisation Service physical security measures must be in place and subject to review); c) Operational security protection is in place (UCL IG Toolkit compliance in respect of the Information Security Initiative – Information Asset Register; Confidentiality Audits; Incident Management Procedures; Information Risk Assessment Tool; Pseudonymisation Service standard operating procedures; Remote Access Guidelines); d) Re-­‐identification keys, transformation tables and protection needs to be subject to multi-­‐
person controls and or multi-­‐organisation controls consistent with the assurances claimed by the service (Controls employed by PI’s and through Pseudonymisation Service standard operating procedures; Use of 7-­‐Zip); e) The Pseudonymisation service shall be under the control of UCL SMLS (contractually and operationally) and may be contractually controlled through a suitably amended UCL Data Sharing Agreement; f) Identification of any applicable legal constraints such as Regulatory Investigatory Powers Act 2000 part 3 (in respect of the encryption keys and the ability to deliver them if required through a Court Order or under statutory powers) concerning the release of re-­‐identification keys and other associated controls that protect the identity of individuals. Ideally these details associated with the Pseudonymisation service should be contained as assurances in such documents as Data Sharing Agreements/SLAs; g) Quality and availability of service needs to be specified and provided in accordance with the information provision and access needs (Data Sharing Agreement and SLA); h) Some identifiers may be blanked, suppressed (replaced with an x) or blurred in a way to ensure re-­‐identification possibilities are mitigated to an acceptable level. Pseudonymisation Process (method and implementation)
Design criteria
When data are being pseudonymised, identifying and payload data must be separated. Identifying data is translated into pseudonyms and the payload data is left unchanged. Pseudonymisation must be capable of always mapping a given identifier with the same pseudonym (to preserve linkage between records belonging to the same identity), or with a different pseudonym (context dependent, time dependent or location dependent). Entities in the Model
Four components: SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 7 of 18 1. Data Source: Prepares and structures the data for submission to the person identification and pseudonymisation. The data elements will be processed in a pre-­‐defined way. Submit the data to the person identification service and then to the Pseudonymisation service. Read and follow up the result code from the Pseudonymisation service. 2. Person Identification Service provider: manages identities communicated to the Pseudonymisation Service. 3. Pseudonymisation Service: performs the Pseudonymisation service. All information needed on which to base its policy decision shall be present in the session data. 4. Data Target: Is the entity that receives pseudonymised data from the Pseudonymisation service and that performs any further processing of the data. This may include checking for duplicates; statistical analysis of the data set and decryption of the data set. Model Workflow
The workflow below shows a best practice example. The following events are captured in the workflow: •
Data request from the data target •
Request with the Pseudonymisation service for domain patient identifiers and associated receipt acknowledgement •
Transmission of the pseudonymisation request •
Pseudonymisation request receipt issued by the Pseudonymisation Service. •
Pseudonymised data sent to data target. Once Pseudonymisation service is completed the data are sent to the target applications •
Delivery acknowledgement: receipt of the data is acknowledged by the data target (validation routines on the data format can be run) •
Finally, to ensure the process is complete, the Pseudonymisation service transmits an acknowledgement to the person identification service, who then send an acknowledgement to the data source. PIs to assess and confirm the Pseudonymisation service conforms. Inclusion of error messages for users of the service is a requirement. SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 8 of 18 Pseudonymisation workflow diagram
Key for workflow diagram DID = De-­‐identification Data IDAT = identifying data ID Service = Identification Service PID = Personal identifier PSN = Pseudonym Data Source Person Identification Service Pseudonymisation Service Data Target Data request Acknowledgement
ent receipt Request ID Service Acknowledgement Transfer IDAT to PID Request Pseudonymisation service Acknowledgement Transfer PID to PSN Submit DID Delivery Delivery acknowledged SLMS-­‐IG14 Health Informatics acknowledged -­‐ Pseudonymisation Overview v1.0
Page 9 of 18 Delivery acknowledged Preparation of data
This activity is carried out by the PIs or an external data source. The raw personal data must be split in two to populate identifying data and payload data (anonymous data) before submission to the Pseudonymisation service. Structuring: Techniques such as suppression, reduction in the detail, blurring and blocking unwanted payload data provides a greater depth of confidence in ensuring that the anonymous payload data is rendered safe and without the potential to breach confidence. The following processes should be in place: a) Data elements that are used to link, group and match are to be tagged so that the Pseudonymisation service knows the location and how to handle them. b) The Privacy Policy shall influence the preparation of data by converting data elements into more generic identifiers (e.g. date of birth is changed to age group bands). c) The Privacy Policy should identify those elements that are not required for further processing in the target applications and ensure they are discarded. d) The anonymous part of the raw personal data is placed in the payload part of the personal data element Pseudonymisation Process
The process consists of the following: a) The source submitting personal data to the Pseudonymisation service must split the data into two distinct elements. These are identifying data and anonymous payload data. What is considered identifying and payload data are dependent upon the target level of anonymity in the security policy of the data collection project Pseudonymisation techniques to protect privacy
There is a legal obligation upon PIs to ensure confidentiality of personal data. If such data is disclosed then the Information Commissioner’s Office ( ICO ), may proceed with an investigation and apply a financial penalty notice. The individual PI Privacy Policy needs to identify the thresholds for the protection of privacy. These thresholds need to be applied to identifying information and non-­‐
identifying information. In addition a stance needs to be taken in respect of each study in connection with re-­‐identification and the finality of such action. Standard operating procedures should be adhered with to prevent any unauthorised access. Key requirements for the protection of privacy include: •
•
The domains where a pseudonym will be used; and Procedures for the protection of the pseudonym key. Risk assessment is an essential element in the protection of privacy and there is a legal obligation upon PI’s to ensure adequate consideration has been given to prevent re-­‐identification of personal data because of other publicly available data sets. SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 10 of 18 Identifiable person variables
Care must be afforded to all identifying data and in particular the following (all of which can be expanded): person names; person identifiers; biometrics; digital certificates; mother’s maiden name and any other family links; residential address; electronic comms details(email, phone no’s and device identifiers etc.); DOB; Admission discharge dates, episodes of care dates; postal and telephone codes; language spoken; religion; ethnicity; gender; country of birth; occupation; rare diagnoses; uncommon procedures; occupation and deformities. Where there are specific standards that maintain confidentiality then these should also be followed e.g. DICOM supplement 55. Aggregation variables
Absolute data references are best avoided, the following should ideally be aggregated: •
•
•
DOB and specific ages should be avoided and a re-­‐identification risk analysis conducted. Admission, discharge, episodes of care dates should also be aggregated Post codes and any other location codes should be truncated. If the truncated code identifies less than 20,000 then the code should be changed to one which does not identify the area. Demographic data represent indirect identifiers. They must either be removed or aggregated. If such data must remain PI’s have a duty of care to perform a risk analysis for unauthorised re-­‐
identification. The following demographic data is included in this requirement: •
•
•
•
•
•
•
•
•
•
•
Language spoken at home; Person’s language of communication; Religion; Ethnicity; Person gender; Country of birth; Occupation; Criminal history; Person legal orders; Other addresses (work, mailing etc.) Birth plurality The policy to be adopted by a PI must ensure that a risk assessment is performed to consider unauthorised individuals using external data combined with the pseudonymised data to identify individuals. This is in accord with assurance level 2 privacy protection. Unauthorised individuals may have access to a wide range of data sources to assist in identifying a data subject. Observational data should not be ruled out, such as the admission of a patient for an episode of care. SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 11 of 18 Outlier variables
It is highly recommended that PIs remove the following fields of data based upon a risk assessment: •
•
•
•
•
Rare diagnoses Uncommon procedures Some rare occupations Recessive traits Distinct deformities. In performing the risk assessment the PI shall incorporate risk mitigation for each associated risk. This will be conducted on at least an annual basis. A template for level 3 privacy protection assessment is available from the IG lead. Assurance level 3 privacy protection must considering outliers of data. The re-­‐identification risk can be seriously influenced by the data itself, e.g. by the presence of outliers or rare data. Outliers or rare data can indirectly lead to identification of a data subject. Outliers do not necessarily consist of medical data. For instance, if, on a specific day, only one patient with a specific pathology has visited a clinic, then observational data on who has visited the clinic that day can indirectly lead to identification. When assessing a Pseudonymisation procedure, a variable model-­‐based risk analysis will help to quantify the vulnerability and take into account the existence and content of different databases. This will assist to deliver a higher level of anonymity. Identified risks must be managed in accordance with the SLMS Risk Management procedures. All risks should be mitigated to an acceptable level and approved by IGSG. Structured data variables
Structured data give some indication of what information can be expected and where it can be expected. It is then up to re-­‐identification risk analysis, by PIs, to make assumptions about what can lead to (unacceptable) identification risks, ranging from simple rules of thumb up to analysis of populated databases and inference deductions. Non-Structured data variables
This typically concerns freeform text, text voice data and image and in the context of Pseudonymisation must remain suspect and prone to re-­‐identification risks. PI’s should adopt the following approach to this form of data: •
•
Locate any identifiable information Remove all identifiable and other information that is not required (in line with processing the minimum data set) SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 12 of 18 Image data
For NHS related image data adherence to DICOM Supplement 55 is expected. Additional risk assessment should consider the identifiable characteristics of the image and any associated notations. Inference risk assessment
The UCL SLMS IDHS Information Asset Owner and individual PIs are responsible for reviewing pseudonymised data repositories in respect of inference risk and to protect against disclosure that would inadvertently identify data subjects. As a mitigating measure it is strongly recommended that the onus of responsibility must be contractually firmly placed upon the information source, by the PI, to ensure the confidentiality of the data subject from unintentional disclosures. Privacy and security
Pseudonymisation provides no absolute guarantee of confidentiality. In view of this risk to the data subject, the data should be considered as “personal data”, and only used for the purpose for which it was obtained. Re-identification process
ISO/TS 25237:2008 requires two procedures for re-­‐identification. These are: •
•
Re-­‐identification as part of the normal business processing; and Re-­‐identification as part of an exceptional event Re-identification as part of normal procedure
The following shall apply to this procedure: •
•
•
Re-­‐identification is normally done as part of an automated process; No pre-­‐authorisation is required on an individual case basis; and The process ensures the completeness and integrity of the data. Re-identification as part of an exceptional event
The following shall apply to this re-­‐identification procedure: All exceptional event requests for re-­‐identification shall be logged and must include as a minimum the following information: •
•
•
•
•
•
•
•
Date/time of the request; Name of staff requesting; Name of authorising member of staff; Justification of the re-­‐identification; Confirmation from the requestor that the re-­‐identification data will be used for the research purpose and details of any disclosure and the party involved; Date and time of the re-­‐identification; Security arrangements for the re-­‐identified data; and Retention and destruction (date and time) arrangements for the re-­‐identification data. SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 13 of 18 The re-­‐identification process shall be carried out in a secure manner by the PI or trusted member of staff and all necessary precautions must be taken to limit access to lists that connect identifiers and pseudonyms. All due care shall be taken to ensure the integrity and completeness of the re-­‐
identified data, especially if such data are to be used for diagnosis or treatment. The re-­‐identified data must include details of its origin and a caveat that the data may be incomplete as it was derived from a clinical research database or data warehouse. The re-­‐identified data will remain the responsibility of the PI, who will continue to act in the capacity of data controller under the Data Protection Act 1998. The exception will be in circumstances where re-­‐identified data has been disclosed to a third party with appropriate legal authority. The PI should seek legal advice prior to any disclosure from UCL Data Protection Officer. If the disclosure is authorised, the DPO will provide suitable legal advice to limit the use of the re-­‐identified data for an authorised purpose and that the third party assume all the responsibilities of a data controller as specified in the Data Protection Act 1998. Specification of interoperability of interfaces
The secure mechanism for processing personal data through the Pseudonymisation Service and to produce the Pseudonymised data is via the SLMA Data Safe Haven UCL SLMS uses cryptographic algorithms 7-­‐Zip and recommends the use of AES-­‐256 as the encryption method. Procedures to use for the required passphrase are located here. The passphrase shall under no circumstances be sent with the encrypted data file. It must be sent separately and use a different method of communication. Note: not in place at present time UCL SLMS operates a Pseudonymisation service for UCL PIs and can enable access to trusted external PIs. The Pseudonymisation Service is capable of providing functionality to support: •
•
Conversion of Pseudonymisation results from one or more service providers in a controlled manner without direct re-­‐identification of the data subject; and Integration of an external party’s data, so that the data from one data subject processed by either party is linkable without the need for re-­‐identification of the data subject. Pseudonymisation Policy Framework
Privacy Policy for each study
Where appropriate each study that uses Pseudonymisation utilises the following PI generic Privacy Policy in association with the UCL Information Security Policy; UCL Data Protection Policy and the SLMS Information Governance Policy. •
SLMS-­‐IG99 SLMS Privacy Policy SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 14 of 18 Operational best practice objectives
The following are the operational best practice objectives of ensuring confidentiality and privacy of personal health data: •
•
•
Pseudonyms secured from unauthorised re-­‐identification; Robust re-­‐identification procedures implemented; and Reliable and secure allocation of unique pseudonyms to research subjects. Operational best practice shall be maintained and subject to continuous service improvement to ensure the on-­‐going trust of the public, healthcare service users and healthcare providers. The UCL SLMS Pseudonymisation service shall have the following attributes: •
•
•
•
•
•
•
•
•
Independent of the organisations supplying source data; Guarantee security and trustworthiness of source, processes and integrity of software modules; Guarantee security and trustworthiness of operating environment, platforms and infrastructure. (restrict all unnecessary network traffic; disable all unnecessary operating system services; provide technical, physical, procedural and personnel controls in accordance with ISO2799) Implementation of monitoring and quality assurance services and programmes to assure quality of service; to monitor against network penetration and malicious attacks; Cryptographic key management shall be under multi-­‐person control; identifiers shall be encrypted by two keys, one under control of the data source and the other under the control of the Pseudonymisation service; Pseudonymisation Service shall be documented, recorded and audited thus demonstrating system integrity; Business continuity for the Pseudonymisation service is assured through backup and a disaster recovery plan Internal audit procedures shall be documented and executed on no less than a monthly basis; External audit procedures and requirements are as follows: o It can be proven that published operating procedures are complied with to the satisfaction of all relevant parties; o Auditor is totally independent of the audited Pseudonymisation service provider; o Auditor will have no conflict of interests, including financial, involving the Pseudonymisation service provider; o The Auditor shall be a qualified information systems auditor and member of the relevant professional body; SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 15 of 18 If the auditor finds any of the Pseudonymisation services to be deficient, then all relevant parties will be notified immediately. All participants of the system shall: o Maintain integrity of the Pseudonymisation service key(s) o Maintain physical, network, personnel, and technical controls of the associated systems in accordance with ISO 27799 o Be responsible for the anonymisation of payload data and privacy protection of any pseudonymised information resources maintained by UCL SLMS Risk assessments shall be conducted regarding access by the data source to the generated pseudonyms and specification of such restrictions shall be expressed in operational policies. o
•
•
Implementation of practices for re-identification
The Pseudonymisation service provides support for controlled re-­‐identification. The re-­‐identification procedures identified within this document shall be shared with all relevant parties as and when necessary. The UCL SLMS re-­‐identification practices incorporate the following: •
•
•
•
•
Re-­‐identification procedures are subject to a segregation of duties that involves a requestor and authoriser, and where necessary multi-­‐organisational control and assurances; If re-­‐identification is time sensitive UCL SLMS will make all reasonable endeavours to comply with the communicated time sensitive requirements; Audits shall: o Be provided for all re-­‐identification events in accordance with RFC 3881; o Minimally include: § The party to whom the identity was disclosed; § The time/date of the re-­‐identification; § And the reason for the re-­‐identification. Re-­‐identification from the Pseudonymisation service shall re-­‐identify only the local pseudonym from the source organisation; The data controller is responsible for the re-­‐identification of the healthcare research subject, and may, validate further the re-­‐identification request SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 16 of 18 Appendix 1
De-identification Standard for Health and Social Care
Standard developed through firstly the ICO and NHS Information Centre and now adopted and published by the Information Standards Board for Health & Social Care (ISB1523 Amd 20/2010). This is available here, and will be applicable from 2014/15 in IG Toolkit v12. The UK legal requirement is that disclosures of data should not identify individuals. There is a duty of care upon data controllers to risk assess the potential for aggregated data to identify individuals. As a minimum PIs or data controllers must: •
•
•
•
•
Assess the likelihood of additional information being used to identify an individual. Carry out a risk assessment and assess the likelihood of your target audience having access to additional information that may assist in revealing the identity of individuals. Know if the disclosure is to a closed audience or public. Carry out de-­‐identification plans. Be aware of the likelihood of the investigative journalist and their ability to seek out and find additional information to reveal identities. Standard de-identification processes
1. Identify nature of information to publish and data source(s) 2. SLMS PIs or data controllers MUST carry out a risk assessment regarding the possibility of specific individuals being identified from the published material or that together with other available information they may be identified. 3. When assessing risk the PI or data controller MUST follow the procedure outlined in section 4.3 of the NIGB guidance “Assess risk and specify data de-­‐identification”, or otherwise conform to other good practice guidance that can be justified as being of a an equal standard. 4. PIs or data controllers MUST carry out de-­‐identification plans. 5. When establishing the de-­‐identification plan the PI or data controller MUST follow the procedure outlined in section 4.3 of the NIGB guidance “Assess risk and specify data de-­‐
identification”, or otherwise conform to other good practice guidance that can be justified as being of a an equal standard. 6. When publishing the results the PI or data controller MUST follow the procedure outlined in section 4.2 of the NIGB guidance “Publish non-­‐identifying data”, or otherwise conform to other good practice guidance that can be justified as being of a an equal standard. 7. Six standard de-­‐identification plans are available covering when the re-­‐identification risk ranges from normal to high. These plans comprise of different standard de-­‐identification techniques. The techniques are mandatory in the given circumstances for each plan. 8. For deriving aggregate data the following techniques can be used: aggregation and statistical disclosure control. 9. For deriving individual level data the following techniques can be used: data suppression; k-­‐
anonymity; reduction in detail of indirect identifiers and suppression of direct identifiers. SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 17 of 18 Appendix 2
K-anonymity
A dataset anonymised to k1 standard has at least one other record on the potentially identifying variables. If the data is k=5 and the identifying variables are deemed to be age and ethnicity then the dataset has at least 5 records for each value combination of age and ethnicity. Information asset owners need to set a value of K that is commensurate with the level of re-­‐
identification probability risk that can be tolerated Variables such as gender, date of birth, postal code, and race are commonly used quasi-­‐identifiers Absolute guarantees of confidentiality when using anonymity are invalid. The most appropriate framework to consider these matters is through risk management. (Statistical Disclosure Control) SLMS-­‐IG14 Health Informatics -­‐ Pseudonymisation Overview v1.0
Page 18 of 18 
Download