Report of the study group on data and application security for health care. Paul Kantor,
Gio Wiederhold, Bhavani Thuraisingham, Le Grunewald, Gail-Joon Ahn, Murat
Kantarcioglu, Peng Liu
This discussion finds itself in the context of the heightened federal emphasis on moving from a
situation in which most medical records are in the form of hand written physician or nurse’s
notes, to one in which electronic records are the primary means for storing and sharing
information about individual health. The President has set this as a high priority and it is widely
believed that introducing such a system will have great benefits in quality and efficiency, and
support the discovery of broad general facts about healthcare which are presently obscured by the
difficulty of assembling the records.
At the same time this transition poses very substantial risks to the perceived level of privacy and
confidentiality currently associated with health records. As an example, it is reported that in the
UK, where healthcare is nationalized, one may buy a person’s <excludes those using private
healthcare serevices> health record for about £25 on the streets. In the American context, this
concern is exacerbated by a general mistrust of the government, and a desire not to have all of
one’s records available for examination by a government agency. There are laws in the United
States which impede the ability of one agency to examine the records of another. On the other
hand, one might take the position that “privacy is an illusion” and that all records are available to
anyone who is willing to expend the effort to obtain them. In the end, the decision of how much
privacy is desired should be a personal decision and health care systems should record and
respect that decision and provide the desired type and level of security.
Healthcare records have interesting technical characteristics. They contain both structured and
unstructured material, and both forms are essential. There is some semi-structured data, where ,
for example, a diagnosis may be selected from a controlled vocabulary, and additional
physician’s notes are added. There are multimedia, including many types of image records. In
addition, there are time-series data: both slow time-series, as in the monitoring of a suspicious
area in a chest X-ray, and rapid series, such as new techniques for examining the beating heart.
Their intensity varies greatly, infrequent updates during healthy periods of a patient's life, and
massive updates during a hospitalization.
The data sets have both temporal and spatial dimensions, with the both dimensions varying
enormously in scale.
There are many kinds of security concerns which drive this study.
1. Patient’s records are maintained at specific hospitals, clincs, and physician offices, but
they ought to be available, when needed, at any site. However, wide access increases the
opportunities for information leaks.
2. Electronic devices can be used to support remote healthcare provided for patients at their
homes. The required two-way communication must be reliable and secure.
3. Doctors seek to consult with other doctors, and one can imagine a future in which there
are bulletin board broadcasts by physicians seeking help and information on a particularly
challenging or puzzling case. Again, patient privacy is at risk here.
4. There are cyber-physical systems such as heart monitors, or implanted defibrillators, and
computer-controlled infusion pumps for medications that pose risks. Most of these
devices involve embedded software, and parameters that are subject to external change.
These devices represent unique challenges, as failures and interference with these
applications could cause harm or even death to the patient.
5. Accessible electronic health records will facilitate data aggregation for research, essential
to deal with the variety and complexity of medical conditions, but such aggregations
increase the risk of privacy violations.
6. The medical information system must guard against fraud and billing manipulation.
To be effective, the study of healthcare data applications and security must involve computer
scientists working very closely with healthcare professionals, as well as with experts in policy,
privacy and law.
We can form a grid to represent the main security issues, and place examples into such a grid.
6 aspects
6 aspects
6 aspects
6 aspects
6 aspects
6 aspects
Story 1. Oklahoma State maintains a children’s healthcare database, which includes records
contributed by physicians at multiple hospitals. They use it to generate, or try to generate official
state-level statistics. But the database cannot really generate correct statistics. For example the
same child may have multiple records being known as “Baby A”, “Baby B”, or under one last
name and then another and so forth. So there is a database management problem here of a
familiar kind which is de-duplication, but it is complicated by the privacy considerations that are
associated with healthcare records.
In general, records that are collected for administrative purposes are incomplete and biased by the
desire of medical staff to achieve compliance with minimal effort. Quality can only by gained if
the information will provide feedback to the providers in order to monitor and improve quality.
Data control at Data Entry Time
Story 2. A patient visited his doctor one day and was told to his surprise “I see you’ve gained 50
pounds since your last visit”. The patient, who knew that at 50 pounds less he would be nothing
but skin and bones was astonished to hear this. The problem is that the data had been entered
incorrectly a year ago, with an “8” converted to a “3”. The underlying problem is that, because
data entry is done by people different from those who are seeing the patient, there is no common
sense check to prevent the entry of absolutely absurd information into the record.
The corresponding research question is to ask whether the systems can have built in sanity
checks, which raise a flag and question the data entry clerk, or the data entry physician, if the
information that’s being provided is too far out of line with reasonable expectations. In the case
of healthcare, this is complicated by the fact that data which are out of line may be extremely
salient. And yet a data clerk, embarrassed to be challenged by the system, might simply adjust the
number until the system does not complain, with consequent harm to the patients.
In modern settings, if data are collected and entered while the patient is present, a screen should
be available where the patient can inspect what is being entered and can prevent errors from being
entered the systems in the first place. If that is not possible, then the patient should receive a
record by email or on paper as soon as possible. Here again, privacy, if desired, must be dealt
with. It should be the patient's decision if the risk of data exposure is of greater concern to that
patient than having errors in the record. The decision should not be made by some authority who
fears legal consequences.
Retrospective Conversion: Humans and Computers
Story 3. This is in a way similar to story 2 but addresses the fact that in order to get from the
present paper system to the electronic system there are two steps. First, electronic entry has to be
available to every practitioner everywhere. Second, a reasonable amount of retrospective data
conversion may be needed. This is going to require some kind of new workforce, which will
primarily exist just for purposes of the transition to electronic health records. Most retrospective
data entry may only be done just prior to a scheduled appointment at a health care facility.
Hospital records older than one year are best abstracted.
Such an effort may temporarily double the number of people who are transcribing, and will not be
done by the same people who currently work to convert physician’s notes into text. These new
temporary workers will have to be intelligent and trainable, and they’ll have to have considerable
patience in reading a variety of difficult handwritings, often in a cryptic shorthand.
There are a great many people of high intelligence who used to work in the financial world and
are currently seeking gainful employment, and a major WPA-style project which put them to
work on this might benefit both them and the nation.
There is also a computer science research question which is to develop programs that can
automatically match from billing ICD-9 codes to the terms that are being entered to provide some
level of validation and reality check on the retrospective conversion.
Anonymous Posting to Seek Consultation; Authorization
Story 4. A patient’s primary care physician is puzzled and realizes that some kind of consult is
needed, but is not even sure who would be the most appropriate specialist. Such requests
complement accessing the medical literature, which is less specific. We can imagine that, with
electronic records, it would be possible to post a somewhat anonymized sketch of the
presentation, and obtain comments and guidance on reaching a second opinion. The questions
here are policy questions: how much should the systems disclose? How much is too much? When
there is such an online forum an indirect inference attack could succeed through attribute
aggregation and correlation among related postings. If something like this is going to be done,
should the patient have “control” of this process? The natural inclination is to suggest that the
default mode would be “yes”, and that the patient’s judgment could only be overridden by the
physician for particular kinds of delicate issues.
The research issues here seem to be, except for the question of an attack, primarily economic and
social issues.
Purpose Driven Access Control
Story 5. For research purposes, it would be very nice if a provider could multi-cast data driven
requests to various federated partners. As a result, patients’ records would be aggregated and then
possibly used by researchers. Here of course there is a great potential privacy threat. We must
understand how to accommodate patients’ concerns during the data gathering phase. In particular,
since we cannot know in advance which aspects of the data will be valuable to a subsequent
research effort, it is impossible to know a priori what kind of release should be requested from the
The related complex questions here might be called “privacy aware patient record integration”, as
well as the more familiar “patient record set anonymization”.
We understand that what’s needed is to bring this to an acceptable level, as it can never be
perfect. The corresponding research issue could be called purpose driven access control (PDAC).
We recognize also that the government, which is likely to have a central role in the electronic
health record revolution, may have a very different set of purposes from researchers. And, as
noted above, Americans are, at least officially, very unwilling to grant the government access to
highly personal data.
The research questions implied here seem to be essentially policy requirements, together with the
technical problems of ensuring that permission is indeed given by the people who had been
authorized to give it.
Regional Health Integration
Story 6. Regional Health Information Organizations (RHIO) are being promoted by federal and
state governments to enable providers and practitioners to share patient records. There are many
kinds of threats to privacy here, including query content privacy, data location privacy, patient
location privacy. There are also problems of assuring the trustworthiness of the source, of the
inquirer, and of the transmission method, to a level that will withstand a legal challenge if it has
been trusted, in good faith, by the recipient.
So the technical question is how to construct privacy preserving RHIO systems with adequate
content to serve actual patient care needs. An initial focus on a high-demand population, as the
elderly, may be appropriate and effective, since that population makes intensive use of local
Provider or Payer Fraud; Over-Testing
Story 7. It is not unheard for a doctor to double bill multiple insurance companies. Some
occurrences are due to poor recordkeeping, while other instances are motivated by a combination
of laziness and income enhancement. In some cases of misfeasance, a provider may amass a
variety of unneeded information simply in order to maximize billing. The technical problem is
appropriate fraud detection. Such a system must be immune to various kinds of collusion attack.
Another way to think of it is that it is healthcare information system auditing.
There is a related kind of misfeasance which is not quite fraud in the legal sense, but which is a
burden on the system. This includes charging for work such as tests that were actually done at a
related site, but could have been avoided if the available information were trusted and used. To
reduce the cost of health care, physicians have to accept test results done and documented by
others previously, and avoid retesting - a major cost contributor. That can only happen if the
documents can be trusted and are secure. There is also the aspect of fear of legal assault when
errors are made, but that will have to be dealt with by others. Generally, of course, giving lawyers
fewer excuses can only help.
Mission Critical Medical Devices; Attack by Remote Control
Story 8. There are some medical devices which are mission critical in the very specific sense that
people’s lives depend on them and if there are errors in the programs, they could kill people.
There have been documented examples where malfunction of a computer program resulted in a
harmful dose of radiation during the course of what was supposed to have been beneficial
radiation therapy.
The corresponding research question is: as we move towards the possibility of remote healthcare,
could a criminal or an attacker misuse the remote control channel and exploit it to trigger harmful
actions that would affect patients, due to deficiencies in the security design of the system?
Data Tampering
Story 9. There are various reasons to want to tamper with data. One is the case in which a
provider has misread or misinterpreted the data that were available, resulting in a diagnosis and
treatment that were ineffective. Faced with the potential for some legal action, the provider might
want to tamper with the data in order to cover up the error.
The related research issue is to develop tamper proof ways of storing data (such as compound
checksum methods) and to ensure that software is “tamper proof” and cannot be used to modify
existing data. In general, every item of data should be time stamped, not only at the source but
also when it enters an accessible record system. Such records should never be overwritten, but
only appended.
2.10 Removing image-based Inferences to Personal Identity
Story 10. There are some kinds of very high tech imaging methods, particularly MRI imaging, in
which it is considered possible to reconstruct a reasonably good image of the face of the patient.
Thus, if data of this type are shared without having been in some sense blurred, or made nonidentifiable, it represents a breach of patient privacy.
At a more mundane level, most X-ray images contain a plate in the corner that has the patient’s
name in radio-opaque ink, to ensure a match of the image to the patient. Technology for removal
of this type of text-bearing sub-image from an image is available and could probably be built into
image distribution software. [Wang, James Z., Gio Wiederhold, and Jia Li: "Wavelet-based
Progressive Transmission and Security Filtering for Medical Image Distribution"; in Stephen
Wong (ed.): Medical Image Databases; Kluwer publishers, 1998, pages 303-324.]
2.11 Man in the Middle Attacks on Remote Data
Story 11. For a number of reasons we might want streams of health care data collected in one
region to be delivered in real time to another region. In anticipated models of remote health care,
monitors send a stream of data to a remote doctor. Monitoring of composite data from a stream
of geographically collocated sources could provide early warning of either naturally occurring
epidemics or bio-terror attacks.
As this data flows, correlation attacks could support inference as to a sensitive medical condition
For various kinds of data, time is a critical parameter, and the data, although privacy protected,
should be able to support time series analysis. Such analysis is clearly more powerful if it is done
“within patients” rather than “between patients”.
2.12 Pre Approval of Data Release
Story 12. A patient, sitting and talking with doctor Bob, at hospital A, would like to get her
information from hospital B. However, the process at hospital B is very complicated. In the first
instance, they have to confer with the physician who was treating the patient, and created the
record. If that physician is unavailable, which is likely to be the case for three-quarters of the
hours in the week, the question will then be transferred to the legal office. It is much easier for
that office to say “no” than to examine the reasons for possibly saying “yes”.
This situation calls for some kind of complex conditional delegation or agency model, in which
the patient and the physician acting together would specify a range of conditions under which
data would automatically be released, upon request from the patient, or from a recognized
practitioner. Although, as a matter of law, the records belong to the patient, yet as a practical
matter, possession is nine-tenths of the law, and the records are held by the hospital. Typically a
request to a hospital for the records of a patient’s stay is interpreted as foreplay for a legal assault.
The result is that hundreds of pages of poorly copied material are delivered, at the very last
minute, and permitted by law.
Adequate Software Infrastructure
Healthcare technology has become greatly dependent on complex software, and medical
applications rely on the general infrastructure that is available. That infrastructure is not of a
quality that instills confidence that security and privacy can be maintained.
Straightforward technology improvements, if adopted, could greatly enhance trust in the
infrastructure. For instance, a simple analysis showed that currently some 48% of all software
attacks by intruding hackers involve buffer overflow. This highlights a need for developers of
healthcare systems to demand better compiler technology. Secure systems must indeed assume
that the underlying software is not perfect, and can be penetrated. But if the software is as rotten
as it seems to be, that effort becomes excessive. There is no reason for people to accept software
that ever has buffer overflows for instance. Dan Swinehart pointed out that Xerox Parc's software
was free of those errors, and so was the PL/1 compiler developed at the Stanford Medical School
in 1966 [see ]. The
modest performance hits are well compensated by the reduction of security costs and worries.
Developers in the security area should insist on only purchasing software that includes such
protection. If there are discriminating customers, the compiler folk will follow the money. Maybe
even educators will change their tone. Today the prime metric is speed of code, which is naively
assumed to be used perfectly.