Collecting and Protecting
Sensitive Data in Research
CU Morningside IRB
Joyce Plaza MS, MBE, CIP
419 West 119 Street
New York, NY 10027
Fax: 212-851-7044
April 8, 2015
• Review the privacy and confidentiality
protections criteria the IRB must consider
• Provide definitions relevant for
research data
• Provide examples of the
types of protections
• To describe the specific protections used
in the Guilamo-Ramos: High Use Alcohol
Venues Study
April 8, 2015
45 CFR 46, 21 CFR 56
Require the IRB to ensure that
certain criteria are satisfied prior
to approval of human subjects
§46.111/ 56.111 Criteria
IRB shall determine that all of the following
requirements are satisfied:
(1) Risks to subjects are minimized
(2) Risks to subjects are reasonable in
relation to anticipated benefits,
(3) Selection of subjects is equitable.
(4) Informed consent will be sought from
each prospective subject or the subject's
legally authorized representative
(5) Informed consent will be appropriately
(6) When appropriate, the research plan
makes adequate provision for monitoring
the data collected to ensure the safety of
(7) When appropriate, there are adequate
provisions to protect the privacy of
subjects and to maintain the
confidentiality of data.
Submission Issues
The manner in which data are
collected, recorded and maintained
are reviewed by the IRB and influence
their determinations.
April 8, 2015
Identifiable data
• Any information about a living individual
that is linked, associated with, or contains
the name or any details of the individual
that would allow someone to be able to
directly or indirectly identify a subject
from the information collected.
April 8, 2015
Sensitive Data with Potential
High Risk
• Information about a living individual
that would potentially cause serious risk
or harm to a subject if there was a
breach of confidentiality (e.g., Social
Security numbers, HIV status,
substance abuse, criminal activity,
negligence in the work place, etc.)
IRB Terminology
Related to Data
• De-identified – identifiers have been removed
from the dataset in a manner that any member
of the research team is not able to identify the
individual from whom such information was
• Coded – identifiers have been removed from the
dataset but can readily be found through the use
of a master list that is accessible to the
IRB Terminology
Related to Data
Anonymous vs. Confidential
• Anonymous –any information about a living
individual that was collected in a manner that
identifiers were never associated with the
information and that no one was ever able to
identify from whom the information was
• Confidential (is not anonymous) - protection of
study participants’ data such that an individual
participant’s data is protected and will not be
disclosed except to another authorized person.
Definition of Privacy
“The quality of being secluded from
the presence or view of others.”
“Refers to a person’s desire to control
the access of others to themselves.”
• Is there a risk to subjects’ privacy
when collecting the data?
Is there risk to privacy
when recruiting?
Guilamo-Ramos: High Use Alcohol
Venues Study - recruited in alcohol-use
venues; adults unable to participate in
screening or take a detailed flyer due to
privacy concerns were provided with a
card only containing contact information
for the study.
Privacy considerations when
collecting sensitive data
• Are interviews conducted in a private
• Are subjects reminded that they do not have
to answer any questions they do not want?
• Focus Groups: Are focus group participants
reminded that they should also keep the
discussion confidential.
Guilamo-Ramos: High Use
Alcohol Venues Study:
Conducted interviews in the home or
at a neutral site chosen by the subject.
Definition of Confidentiality
• “Discretion in keeping information secret”
Refers to the researcher’s handling of the
subject’s identifiable private information.
• Is there a risk of a breach of confidentiality
at any time during study procedures? All
data that can potentially cause harm to
subjects upon a breach should have direct
identifiers of the subjects replaced with a
Coded Data
• The link that cross-references the subject’s
identity with the code should be stored in a
separate location from the data and should be
• Consideration should be given by the Principal
Investigator as to how many and which staff
should have access to the link. Limiting the
number of staff who have access to the link
should be considered for more sensitive highrisk data.
Data Protection Plans
Any data that will be collected for research
purposes that is considered to pose risk or harm
to subjects upon a breach of confidentiality
should have the data protected for a potential
breach. The methods or processes for protecting
the confidentiality of the data should be
proportionate to the level of potential risk of the
Ensure that all study data
is protected
• Any other data that is collected during the
course of a research study, such as that
involving the regulatory or financial
management of the study, must also be
stored in a secure manner.
Guilamo-Ramos: High Use
Alcohol Venues Study:
• On consent forms, tapes, transcripts
and surveys, subjects were identified
by a random code number only.
Anonymous data
Guilamo-Ramos: High Use Alcohol Venues
Study: collected “refusal bias information” that
did not contain identifiers.
Guilamo-Ramos: High Use
Alcohol Venues Study
Links to the subject codes were kept in
locked files on a password protected
Guilamo-Ramos: High Use
Alcohol Venues Study:
Personnel Training
All project staff were required to complete
certain levels of training ( 40 hours) before
they were granted access to the codes. This
included training established by the
Dominican and International
organizations on the protection of human
Guilamo-Ramos: High Use
Alcohol Venues Study
Personnel signed confidentiality
statements requiring reporting breach of
confidentiality to PI.
Training included data safety,
confidentiality of participants, limits of
confidentiality and proper administration
of the protocol.
Storage of Research Data:
Paper files
• Consider separating data files from consent
• Recommend that paper records containing
research data should be stored in a locked
cabinet with access limited to research
• The level of security and restriction should
increase depending on the level of sensitive data
being captured in the research records.
Computerization of Data
• Electronic records containing research data
should be maintained on password-protected
devices with access limited to research
personnel. The level of security and restriction
(i.e., encryption, hashing, etc.) should increase
depending on the level of sensitive data being
captured in the electronic research records.
Patient Data: CUMC Policy
• CUMC Information Security Policies
require that all portable data files stored
on USB, CD/DVD, and mobile laptops that
include PHI be *encrypted* and
*password-protected* at all times.
Breach of Confidentiality
The three biggest sources of a breach of
data stored electronically:
• Laptops
• USB drives
• Web sites
Transferring data
• Electronic transfer: encryption needed
• All electronic transmission of patient
information over the Internet must be
*encrypted*. This includes email, file transfers
and other data transfer modalities.
• Paper transfer: transferred by snail mail, fed-ex,
hand carried by member of the study team? Data
transfer needs to be protected from a breach
(e.g., data transferred separately from consent
forms, codes).
Guilamo-Ramos: High Use
Alcohol Venues Study:
• Data transferred electronically from the
Dominican Republic to the US were
stripped of identifiers and contained only
code numbers.
Guilamo-Ramos: High Use
Alcohol Venues Study:
• Study team identified the most serious risk
as the potential loss of confidentiality.
• Participants were notified of the
confidentiality procedures in the informed
• Procedure for notifying the IRB of any
adverse events was included in the Study
Guilamo-Ramos: High Use
Alcohol Venues Study:
Collection of Private Health Information
also required HIPAA Form A.
The Health Information Technology for
Economic and Clinical Health Act (HITECH) Act
part of the American Recovery and
Reinvestment Act (ARRA) of 2009, has
established new notification requirements to
report the loss or theft of patient information
(Protected Health Information - PHI) that is not
protected by encryption. These requirements
apply in both the clinical and research context.
Archiving and Long-Term
Storage of Research Data
Data protection plans must consider all
record-keeping processes and storage of
data from the initial collection to poststudy storage or destruction or complete
de-identification of the data. Such plans
should include details to all modes of
storage: paper, electronic, video/audio
recordings, films, etc.
Audio/Video Recording of Data
Recordings and Transcriptions
Guilamo-Ramos: High Use Alcohol
Venues Study: after the audio recorded
interviews were transcribed, the
recordings were destroyed. Participants
were not identified by name on the
Physical Security of Data
• Computer located in a secure location (e.g.
a locked office)
• Who has access to this office
• Paper files – are they in a locked file
• Identifying codes and data kept separately
• Transcripts contain identifiers
• Will identifiers be destroyed anytime
Secondary Data
Requires IRB review if it contains
private identifiable information (either
direct identifiers or indirect identifiers)
If the data is sensitive, confidentiality
procedures are required.
Social Security Numbers
New York has enacted legislation to
protect the confidentiality of social
security numbers (SSNs). The "NY Social
Security Number Protection Law " which
became effective on January 1, 2008
imposes harsh penalties on organizations
that failed to protect the confidentiality of
Social Security numbers that they have
collected and stored.
Generally, SSNs should not be collected
unless permitted by Columbia policy
Any plan to collect social security numbers
(SSN) for research purposes must be submitted
and approved by the IRB prior to such collection.
The submission must include a justification for
the collection of SSNs and provide the following:
• an explanation of how and where the SSNs will
be stored;
• who will have access to the data;
• the plan to protect the confidentiality and
security of the data.
Certificates of Confidentiality
• To protect the confidentiality of sensitive higherrisk data obtain a Certificate of Confidentiality
(CoC) issued by the National Institutes of Health
(NIH), as well as other HHS agencies to protect
identifiable research information from forced
• Allows the investigator to refuse to disclose
identifying information on research participants
in any civil, criminal, administrative, legislative,
or other proceeding, whether at the federal,
state, or local level.
Study Description: Document
Privacy Protections
Describe how subject privacy will be
protected, and the limits to protection.
Protections should cover (e.g.,) screening
activities, HIPAA provisions, forums such
as focus groups where private information
may be shared, and recordings of research
activities, as applicable. Limitations such
as compelled disclosure and mandatory
reporting should also be described.
Study Description: Document
Data and Safety Monitoring
Describe how data and safety will be monitored
locally to identify unanticipated problems (e.g.,
events, outcomes, or occurrences that are
unexpected, at least possibly related to the
research, and suggest an increase in risk of harm
to subjects or others).
Study Description: Document
Potential Risks
Describe potential risks including data on risks
that have been encountered in past studies.
Study Description: Document
Confidentiality of Study Data
Describe how this will be maintained (if it is to
be maintained) locally, and during transmission
to another site, if applicable. Include a clear
description of how data will be stored,
specifically indicating whether data will contain
direct or indirect identifiers. Describe
protections related to accessing the study data,
whether in an electronic or paper form.
Publication of Research Results
• Any publication of research results must
be done in a manner in which subjects
cannot be identified unless expressed
written permission has been provided by
the subject(s).
Summary: Collecting
Sensitive Data
• Identify all risks to privacy/confidentiality
• Devise a comprehensive plan of
• Document the details for the IRB
• Train study personnel
• Monitor the data until the identifiable data
is discarded or complete de-identification
of the data.
Questions? Contact the IRB
For contact information see:
or call 212 305‐5883
Morningside IRB
For contact information see:
or call 212-851-7040

Joyce Plaza, M.S., M.B.E., CIP, Asst. Manager, MS IRB, Columbia