Current Trends in Data Security
Dan Suciu
Joint work with Gerome Miklau
1
Data Security
Dorothy Denning, 1982:
• Data Security is the science and study of
methods of protecting data (...) from
unauthorized disclosure and modification
• Data Security = Confidentiality + Integrity
2
Data Security
• Distinct from systems and network security
– Assumes these are already secure
• Tools:
– Cryptography, information theory, statistics, …
• Applications:
– An enabling technology
3
Outline
• Traditional data security
• Two attacks
• Data security research today
• Conclusions
4
Traditional Data Security
• Security in SQL = Access control + Views
• Security in statistical databases = Theory
5
[Griffith&Wade'76, Fagin'78]
Access Control in SQL
GRANT privileges ON object TO users
[WITH GRANT OPTIONS]
privileges = SELECT | INSERT | DELETE | . . .
object = table | attribute
REVOKE privileges ON object FROM users
[CASCADE ]
6
Views in SQL
A SQL View = (almost) any SQL query
• Typically used as:
CREATE VIEW pmpStudents AS
SELECT * FROM Students WHERE…
GRANT SELECT ON pmpStudents TO DavidRispoli
7
Summary of SQL Security
Limitations:
• No row level access control
• Table creator owns the data: that’s unfair !
Access control = great success story of the DB community...
… or spectacular failure:
• Only 30% assign privileges to users/roles
– And then to protect entire tables, not columns
8
Summary (cont)
• Most policies in middleware: slow, error prone:
–
–
–
–
SAP has 10**4 tables
GTE over 10**5 attributes
A brokerage house has 80,000 applications
A US government entity thinks that it has 350K
• Today the database is not at the center of the
policy administration universe
9
[Rosenthal&Winslett’2004]
[Adam&Wortmann’89]
Security in Statistical DBs
Goal:
• Allow arbitrary aggregate SQL queries
• Hide confidential data
SELECT name
FROM Patient
WHERE age=42
and sex=‘M’
and diagnostic=‘schizophrenia’
SELECT count(*)
OK
FROM Patients
WHERE age=42
and sex=‘M’
and diagnostic=‘schizophrenia’
10
[Adam&Wortmann’89]
Security in Statistical DBs
What has been tried:
• Query restriction
– Query-size control, query-set overlap control, query monitoring
– None is practical
• Data perturbation
– Most popular: cell combination, cell suppression
– Other methods, for continuous attributes: may introduce bias
• Output perturbation
– For continuous attributes only
11
Summary on Security in
Statistical DB
• Original goal seems impossible to achieve
• Cell combination/suppression are popular,
but do not allow arbitrary queries
12
Outline
• Traditional data security
• Two attacks
• Data security research today
• Conclusions
13
[Chris Anley, Advanced SQL Injection In SQL]
SQL Injection
Your health insurance company lets you see the claims online:
First login:
User:
fred
Password:
********
Now search through the claims :
Search claims by:
Dr. Lee
SELECT…FROM…WHERE doctor=‘Dr. Lee’ and patientID=‘fred’
14
SQL Injection
Now try this:
Search claims by:
Dr. Lee’ OR patientID = ‘suciu’; --
…..WHERE doctor=‘Dr. Lee’ OR patientID=‘suciu’;
--’ and patientID=‘fred’
Better:
Search claims by:
Dr. Lee’
OR 1 = 1; -15
SQL Injection
When you’re done, do this:
Search claims by:
Dr. Lee’; DROP TABLE Patients; --
16
SQL Injection
• The DBMS works perfectly. So why is
SQL injection possible so often ?
• Quick answer:
– Poor programming: use stored procedures !
• Deeper answer:
– Move policy implementation from apps to DB
17
Latanya Sweeney’s Finding
• In Massachusetts, the Group Insurance
Commission (GIC) is responsible for
purchasing health insurance for state
employees
• GIC has to publish the data:
GIC(zip, dob, sex, diagnosis, procedure, ...)
18
Latanya Sweeney’s Finding
• Sweeney paid $20 and bought the voter
registration list for Cambridge
Massachusetts:
GIC(zip, dob, sex, diagnosis, procedure, ...)
VOTER(name, party, ..., zip, dob, sex)
19
Latanya Sweeney’s Finding
zip, dob, sex
• William Weld (former governor) lives in
Cambridge, hence is in VOTER
• 6 people in VOTER share his dob
• only 3 of them were man (same sex)
• Weld was the only one in that zip
• Sweeney learned Weld’s medical records !
20
Latanya Sweeney’s Finding
• All systems worked as specified, yet an
important data has leaked
• How do we protect against that ?
Some of today’s research in data security address breaches
that happen even if all systems work correctly
21
Summary on Attacks
SQL injection:
• A correctness problem:
– Security policy implemented poorly in the application
Sweeney’s finding:
• Beyond correctness:
– Leakage occurred when all systems work as specified
22
Outline
• Traditional data security
• Two attacks
• Data security research today
• Conclusions
23
Research Topics in Data Security
Rest of the talk:
• Information Leakage
• Privacy
• Fine-grained access control
• Data encryption
• Secure shared computation
24
[Samarati&Sweeney’98, Meyerson&Williams’04]
Information Leakage:
k-Anonymity
Definition: each tuple is equal to at least k-1 others
Anonymizing: through suppression and generalization
First
Harry
*
John
Beatrice
*
John
Last
Stone
Reyser
R*
Stone
Ramos
R*
Age
30-50
34
20-40
36
30-50
47
20-40
22
Hard: NP-complete for supression only
Approximations exists
Race
Afr-Am
Cauc
*
Afr-am
Hisp
*
25
[Miklau&S’04, Miklau&Dalvi&S’05,Yang&Li’04]
Information Leakage:
Query-view Security
Have data:
TABLE Employee(name, dept, phone)
Secret Query
S(name)
View(s)
Disclosure ?
V(name,phone)
total
V1(name,dept)
big
S(name,phone)
V2(dept,phone)
S(name)
V(dept)
tiny
S(name)
V(name)
none
where dept=‘HR’ where dept=‘RD’
26
Summary on Information
Disclosure
• The theoretical research:
– Exciting new connections between databases
and information theory, probability theory,
cryptography
[Abadi&Warinschi’05]
• The applications:
– many years away
27
Privacy
• “Is the right of individuals to determine for
themselves when, how and to what extent
information about them is communicated to
[Agrawal’03]
others”
• More complex than confidentiality
28
Privacy
Involves:
• Data
• Owner
• Requester
• Purpose
• Consent
Example: Alice gives her email
to a web service
alice@a.b.com
Privacy policy: P3P
29
Hippocratic Databases
DB support for implementing privacy policies.
• Purpose specification
Hippocratic DB
• Consent
• Limited use
alice@a.b.com
• Limited retention
• …
Protection against:
 Sloppy organizations
 Malicious organizations
Privacy policy: P3P
[Agrawal’03, LeFevrey’04]
30
Privacy for Paranoids
• Idea: rely on trusted agents
aly1@agenthost.com
alice@a.b.com
Agent
Protection against:
 Sloppy organizations
 Malicious attackers
lice27@agenthost.com
foreign keys ?
31
[Aggarwal’04]
Summary on Privacy
• Major concern in industry
– Legislation
– Consumer demand
• Challenge:
– How to enforce an organization’s stated policies
32
Fine-grained Access Control
Control access at the tuple level.
• Policy specification languages
• Implementation
33
Policy Specification Language
No standard, but usually based on parameterized views.
CREATE AUTHORIZATION VIEW PatientsForDoctors AS
SELECT Patient.*
FROM Patient, Doctor
WHERE Patient.doctorID = Doctor.ID
and Doctor.login = %currentUser
Context
parameters
34
Implementation
SELECT Patient.name, Patient.age
FROM Patient
WHERE Patient.disease = ‘flu’
SELECT Patient.name, Patient.age
FROM Patient, Doctor
WHERE Patient.disease = ‘flu’
and Patient.doctorID = Doctor.ID
and Patient.login = %currentUser
e.g. Oracle
35
Two Semantics
• The Truman Model = filter semantics
–
–
–
–
transform reality
ACCEPT all queries
REWRITE queries
Sometimes misleading results
SELECT count(*)
FROM Patients
WHERE disease=‘flu’
• The non-Truman model = deny semantics
–
–
–
–
reject queries
ACCEPT or REJECT queries
Execute query UNCHANGED
May define multiple security views for a user
36
[Rizvi’04]
Summary of Fine Grained Access
Control
• Trend in industry: label-based security
• Killer app: application hosting
– Independent franchises share a single table at
headquarters (e.g., Holiday Inn)
– Application runs under requester’s label, cannot
see other labels
– Headquarters runs Read queries over them
• Oracle’s Virtual Private Database
37
[Rosenthal&Winslett’2004]
Data Encryption for Publishing
Scientist wants to publish
medical research data on the Web
• Users and their keys:
• Complex Policies:
All authorized users:
Patient:
Doctor:
Nurse:
Administrator :
Kuser
Kpat
Kdr
Knu
Kadmin
Doctor researchers may access trials
Nurses may access diagnostic
Etc…
What is the encryption granularity ?
38
[Miklau&S.’03]
Data Encryption for Publishing
An XML tree protection:
Doctor:
Kuser, Kdr
Nurse:
Kuser, Knu
Nurse+admin: Kuser, Knu, Kadm
<patient>
Kpat (KnuKadm)
Kuser
Kdr
Knu  Kdr
<privateData>
<diagnostic>
flu
Kpat
Kmaster
<name>
<age>
<address>
<drug>
JoeDoe
28
Seattle
Tylenol
<trial>
Kmaster
<placebo>
39
Candy
Summary on Data Encryption
• Industry:
– Supported by all vendors:
Oracle, DB2, SQL-Server
– Efficiency issues still largely unresolved
• Research:
– Hard theoretical security analysis
[Abadi&Warinschi’05]
40
Secure Shared Processing
• Alice has a database DBA
• Bob has a database DBB
• How can they compute Q(DBA, DBB), without
revealing their data ?
• Long history in cryptography
• Some database queries are easier than general case
41
[Agrawal’03]
Secure Shared Processing
Alice
a b c d
Task: find intersection
without revealing the rest
Bob
c d e
Compute one-way hash
h(a) h(b) h(c) h(d)
h(c) h(d) h(e)
Exchange
h(c) h(d) h(e)
h(a) h(b) h(c) h(d)
What’s wrong ?
42
[Agrawal’03]
Secure Shared Processing
commutative encryption:
h(x) = EA(EB(x)) = EB(EA(x))
Alice
a b c d
EA
Bob
c d e
EB
EA(a) EA(b) EA(c) EA(d)
EB(c) EB(d) EB(e)
EB(c) EB(d) EB(e)
EA(a) EA(b) EA(c) EA(d)
EB
h(c) h(d) h(e)
h(a) h(b) h(c) h(d)
EA
h(a) h(b) h(c) h(d)
43
h(c) h(d) h(e)
Summary on Secure Shared
Processing
• Secure intersection, joins, data mining
• But are there other examples ?
44
Outline
• Traditional data security
• Two attacks
• Data security research today
• Conclusions
45
Conclusions
• Traditional data security confined to one server
– Security in SQL
– Security in statistical databases
• Attacks possible due to:
– Poor implementation of security policies: SQL injection
– Unintended information leakage in published data
46
Conclusions
• State of the industry:
– Data security policies: scattered throughout applications
– Database no longer center of the security universe
– Needed: automatic means to translate complex policies into
physical implementations
• State of research: data security in global data sharing
– Information leakage, privacy, secure computations, etc.
– Database research community has an increased appetite for
cryptographic techniques
47
Questions ?
48