slides

advertisement
Validating EMR
Audit Automation
Carl A. Gunter
University of Illinois
Accountable Systems Workshop
Root Problem Statement
Situation
• Access to hospital Electronic
Medical Record (EMR) data
suffers risk of high loss in the
event of false negatives
(incorrect refusal of access).
– Example: doctor acting on an
emergency cannot get access to
list of allergies.
• Hospital has highly trained
personnel in whom much
trust is vested.
Consequences
• Hospital access systems give
liberal access to records,
relying on accountability.
• Insider threats are serious
and abuses are widely
documented.
• Accesses are too numerous to
review manually by experts.
• Automated support is
required.
Validation Problem Statement
Ideal Approach
• Obvious approach: develop
anomaly detector (AD) with
rules and train classifiers on
bad and good accesses.
• Run the AD on the audit logs
and investigate positives
manually with domain
experts
Problem
• This requires considerable
dependence on experts.
• Assumes experts know how
to provide labels.
• Assumes experts can
formulate rules.
• Assumes labeled training sets
exist and that researchers will
be able to get access to them.
Primary Validation Approach
• The primary validation approach applied by
researchers in this area can be called the
Random Object Access Model (ROAM).
• ROAM is based on the premise that anomalous
users and accesses look random.
• Strategy
– Develop rules and train classifier on real data set
augmented with synthetic random users and
accesses.
– Test ability to recognize random users or accesses.
ROAM Assessment
Pro
• Likely that illegitimate
accesses appear random.
• Good ROAM classifier
prepares for expert review to
identify false positives.
• ROAM classifier may find
legitimate but interesting
hospital information flows.
• Provides a ready testing
strategy reminiscent of
“fuzzing”.
Con
• There no current quantified
evidence that random
accesses and illegitimate
accesses have strong overlap.
• Indeed, there is evidence that
in some cases legitimate
accesses look random.
• Some illegitimate accesses
may be systematic in ways
that defy detection by ROAM
classifiers.
Beyond ROAMing
• What are the prospects for alternative models?
• Example: introduce specific attacks experienced
“in the wild” similar to network traces enriched
with known attacks.
• Another idea: look at problems like
masquerading and open terminals.
• Behaviors are not random, but may display
learnable characteristics.
Random Topic Access Model
(RTAM)
Explored an alternative validation model based on topic
classification. Idea:
• Patients are “documents” and diagnoses, drugs, etc. are their
“words”.
• Use Latent Dirichlet Allocation (LDA) to learn topics that can be
used to classify patients.
• Use this to characterize users as readers of documents.
• Detect unusual readers.
• Detect readers of random topics.
Modeling and Detecting Anomalous Topic Access, Siddharth Gupta,
Casey Hanson, Carl A. Gunter, Mario Frank, David Liebovitz, and
Bradley Malin. IEEE Intelligence and Security Informatics, June 2013.
Topic Distributions
Neoplasm Topic
Obstetric Topic
Diagnosis Topics
Kidney Topic
Multidimensional Scaling:
Patient Diagnosis
RTAM: Random Users
•
r ~ Dir(𝛼) with n dimensions, where n is the number of topics.
a.) Direct or Masquerading User (α<1) : an anomalous user of some specialty
gains sole access to the terminal of another user in the hospital.
b.) Purely Random User (α=1): user is characterized by completely random
behavior, with little semantic congruence to the hospital setting.
c.) Indirect User: user type resembles an even blend of the topics of many
specialized users.
Random Topic Access Detection
(RTAD)
• Random Topic Access Detection (RTAD): an anomaly detection framework that
generates synthetic users using RTA and applies a standard spatial outlier, knearest neighbor k-NN detection scheme for classification.
• Methodology
1. LDA: define patient topics, and user typing to represent users in the topic
space.
2. RTA user injection: generate three types of anomalous users and insert into
each role at a 5% mix rate.
3. Detection (k-NN): if the ratio of the avg. distance from a user to its k nearest
spatial neighbors to the avg. pairwise distance among those neighbors is
greater than a threshold, call the user anomalous.
4. Evaluation Metric: best Area Under the Curve (AUC) for each 𝛼 , role
combination.
Results - I
The best AUC across all evaluated dimensions is plotted for each role performing
poor for 𝛼 > 1 .
Results - II
The best AUC across all evaluated dimensions is plotted for each role performing
well or near average for 𝛼 > 1.
Discussion and Conclusions
• Other strategies besides ROAM may capture new
types of threats.
• Good progress on technical measures of
validation; need links to expert review and
ground truth.
• More evaluation studies are needed.
• Important to integrate access audit with general
business intelligence: understanding the roles
and workflows of the organization.
Download