A View Based Security Framework for XML

advertisement
A View Based
Security Framework
for XML
Wenfei Fan, Irini Fundulaki, Floris Geerts,
Xibei Jia, Anastasios Kementsietsidis
University of Edinburgh
Digital Curation Center
Introduction

XML data management
 The
importance is clearly demonstrated by the wide
adoption of XML related technologies in eScience
projects

Selective exposure of information in XML
a
primary concern for data providers, curators and
consumers.
 safeguard data confidentiality, privacy and intellectual
property
Introduction --- Security View

Security View: multiple user groups
 who
wish to query the same XML document
 different access policies may be imposed,
specifying the portions of the document the
users are granted or denied access to.

Security views are necessarily virtual
 it
is prohibitively expensive to materialize
and maintain a large number of views.
Example:
a medical records XML database
Hospital
Psychiatry
Record
Genetics
Record
Record
Date Doctor Bill Patient
DateDoctor Bill Patient Date Doctor Bill Patient
DiagnosisName Sex NameDiagnosisNameSex Name DiagnosisNameSex Name
'David'
'Mark''
'David'
'Mary'
'Angela'
'Mary'
Patient
“Mary”admin
Doctor
“David”
The security
canaccess
accessthe
his
ownsee
medical
records
can only
records
of
patients
could
thehis
whole
db
Insurer’s view
Hospital
Record
Date
Bill
Record
Patient Date
Bill
Patient
Name Sex
Name Sex
'Mary'
'Mary'
An insurer
can only read his customers' billing info
Researcher’s View
Hospital
Record
Record
Record
Date Doctor Patient Date Doctor Patient Date Doctor Patient
Diagnosis Sex
Diagnosis Sex
Diagnosis Sex
a medical researcher
could retrieve the diagnosis data for research purposes,
but not the information on doctors or patients.
System Architecture
researchers
security admins
Security
Spec.
Editor
Security
Specification S
Query
Editor
View
Derivation
DR
Security
View VD
for
Role UD
with
XSD DD
...
Security
View VP
for
Role UP
with
XSD DP
Security
View VR
for
Role UR
with
XSD DR
Query QR
legend
input module
Result
Viewer
output module
core module
optional module
on VR
virtual view
Query
Rewriting
XML schema
XML database
XML data flow
Query QT
other data flow
on T
XSD D
for
document T
XML
document
T
Indexer
Query
Optimization
Query
Evaluation
security spec. lang. LS
used by admins.
view spec. lang. LV
transparent to users.
view query lang. LQV
used by users.
doc query lang. LQR
transparent to users.
Security Specification

hospital
hospital -> patient*

*
patient
*
pname




date
test medication
patient -> pname, visit*, parent*

parent
visit
treatment
*
(patient,pname) = N
(patient,visit) = N
parent -> patient
visit -> treatment, date


(hospital,patient) =
[visit/treatment/medication =
‘autism’]
(visit, treatment) = [medication]
treatment -> test + medication

(treatment,test) = N
Security Specification

Classify the nodes in the XML document




Support






accessible nodes
inaccessible nodes
conditional accessible nodes
inheritance
overriding
content-based access privilege
context-dependency
View derivation module
schema availability

the availability of an XML schema that specifies the structure of
accessible data is critical to the users who can then formulate
queries only over this schema.
View Specification

hospital
hospital -> patient*

*
patient
*
*
treatment parent


(patient, treatment) =
visit/treatment[medication]
(patient, parent) = parent
parent -> patient


medication
patient -> treatment*, parent*


(hospital, patient) =
patient[visit/treatment/medication =
‘autism’]
(parent, patient) = patient
treatment -> medication

(treatment, medication) = medication
Query Over the View

Regular XPath Query
a
mild extension of XPath that supports the
general Kleene closure (.)* instead of the
limited recursion “//”.
 Why: XPath is not closed under query
rewriting

i.e. for an XPath query on a recursively defined
view there may not exist an equivalent XPath
query on the underlying document
Query Over the Document

Regular XPath Query
 However,
the size of the rewritten query QT, if directly
represented in Regualar XPath, may be exponential
in the size of input query QV.
 We overcome this challenge by employing an
automaton characterization of QT ,denoted by
MFA(mixed finite state automata), which is linear in
the size of QV.

Query Rewriting Module
MFA:
Internal Query Representation
AFA: capture filters
and
21
20
15
16
treatment
medication
NFA: capture selecting paths
14
24
pname
0
hos pital
1
patient
vis it
5
17
19
parent patient
13
4
3
TEXT_EQUAL
'headache'
7
and
22
8
9
vis it
10
treatment
11
tes t
12
hospital/patient[(parent/patient)*/visit/treatment/test and
visit/treatment[medication/text()=“headache”]]/pname
Query Evaluation: HyPE


We propose a novel algorithm, HyPE (Hybrid Pass
Evaluation), for processing Regular XPath queries
represented by MFA’s.
A unique feature of HyPE is that it needs only a single
top-down depth-first traversal of the XML tree, during
which HyPE both



evaluates predicates of the input query (equivalently, AFA's of
the MFA) and
identifies potential answer nodes (by evaluating the NFA of the
MFA).
previous systems require to traverse the XML document
at least twice to evaluate XPath queries.
HyPE: Cans (candidate answers)
The potential answer nodes are collected
and stored in an auxiliary structure,
referred to as Cans (candidate answers),
which is often much smaller than the XML
document tree.
 A pass over Cans is needed to retrieve the
real result nodes.

HyPE
hospital
patient
patient
4, 8, 7
9, 7, 8
visit
parent
visit
pname
5
24 treatment 10
pname
patient
24
treatment
medication
9, 7, 8
11
treatment
...
“cold” test
19
11
test
visit
...
24
21, 15
16, 20
medication...
Cans:
candidate answer
real answer
24
states in AFA
9, 7, 8
medication
pruned subtree
...
14
treatment
12
a state in NFA
17
24
24
5
patient
4, 8, 7
parent
visit
5
treatment
patient
19
9, 7, 8
...
visit
...
treatment
24
4, 8, 7
parent
pname
“headache”
10
24
22
13
a state in NFA annotated by a false AFA
12
patient
...
medication
11
test
12
“headache”
SMOQE:
A Reference Implementation
We have developed a reference
implementation, called SMOQE(Secure
MOdular Query Engine), for the security
framework we proposed in this paper.
 It is implemented in Java.
 demonstrated in VLDB 2006

Conclusion

A generic, flexible view based access control
framework for protecting XML data and its
implementation: SMOQE
 able
to enforce fine-grained access policies according
to the structure and values of the protected XML data


schema availability
view derivation
 efficient
enforcement of security constraints during
XML query evaluation



Query rewriting
Automaton based representation
Evaluation using HyPE and optimization
Thank you!
Download