A Data Masking Technique for Data Warehouses

advertisement
INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM
A Data Masking Technique for Data Warehouses
Ricardo Jorge Santos & Marco Vieira
CISUC – DEI – FCTUC
University of Coimbra - Portugal
Jorge Bernardino
CISUC – DEIS – ISEC
Polytechnic Intitute of Coimbra - Portugal
ISEL, Lisbon – September/2011
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
Agenda
 Background
 Motivation
 MOBAT: A MOD Based Data Masking Technique
 Optimization Features
 Experimental Results
 Conclusions and Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
2
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
Security Concerns in Data Warehousing
 A Data Warehouse (DW) is a critical asset for many enterprises
 Stores all relevant historical and current business information
needed for supporting decision making (sensitive data)
 Main targets for stealing or compromising sensitive data
 Attack rate and complexity has increased in the recent past
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
3
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
Data Security Domains
 Data Confidentiality: Only the right users should access the right data
 Data Integrity: Data should always be correct, authentic and consistent
 Data Availability: User should always be able to access data whenever
needed
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
4
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
Data Privacy Issues in Today’s DWs (Our Focus)
 Masking solutions are not considered an acceptable solution
 Encryption techniques introduce too much overheads
 Storage Space
 Data Loading Time
 Query Response Time
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
5
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
Data Privacy Issues in Today’s DWs (Our Focus)
 Important feature: Facts in DW’s are mainly numerical-based columns!
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
6
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
MOBAT – MOd BAsed data masking Technique for DWs
MOBAT System Architecture
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
7
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
MOBAT – MOd BAsed data masking Technique for DWs
Suppose table T => set of N numerical columns Ci = {C1, C2, C3, …, CN) to mask;
total set of M rows Rj = {R1, R2, R3, …, RM).
Each value to mask in the table identified as a pair (Rj, Ci)
Rj and Ci respectively represent the row and column to which the value refers
Each new masked value (Rj, Ci)’ is obtained by applying the following formula (1)
for row j and column i of table T:
(Rj, Ci)’ = (Rj, Ci) – ((K3, j MOD K1) MOD K2, i) + K2, i
The inverse formula (2) for retrieving the original value is:
(Rj, Ci) = (Rj, Ci)’ + ((K3, j MOD K1) MOD K2, i) – K2, i
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
8
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
MOBAT – Example Dataset
Supposing K1 = 7432, K2,1 = 34 and K2,2 = 17252
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
9
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
MOBAT – Example Dataset
Supposing K1 = 9264, K2,1 = 12 and K2,2 = 78254
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
10
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
MOBAT – Querying
Using TPC-H benchmark with four numerical fact columns (i = 4) (L_Quantity,
L_ExtendedPrice, L_Tax and L_Discount) masked by MOBAT
New column L_KeyK3 for the j rows of the LineItem table, as the K3, j key
K1=9342
K2, L_Quantity=12
K2, L_ExtendedPrice=51234
K2, L_Tax=6
SELECT SUM(L_ExtendedPrice * L_Discount) AS Total_Revenue
K2, L_Discount=4
FROM LineItem
WHERE L_ShipDate>=TO_DATE('1994-01-01','YYYY-MM-DD') AND
L_ShipDate<TO_DATE('1995-01-01','YYYY-MM-DD') AND
L_Discount BETWEEN 0.05 AND 0.07 AND L_Quantity<24
SELECT SUM((L_ExtendedPrice+MOD(MOD(L_KeyK3,9342),51234)-51234) *
(L_Discount+MOD(MOD(L_KeyK3,9342),4)-4)) AS Total_Revenue
FROM LineItem
WHERE L_ShipDate>=TO_DATE('1994-01-01','YYYY-MM-DD') AND
L_ShipDate<TO_DATE('1995-01-01','YYYY-MM-DD') AND
(L_Discount+MOD(MOD(L_KeyK3,9342),4)-4) BETWEEN 0.05 AND 0.07 AND
(L_Quantity+MOD(MOD(L_KeyK3,9342),12)-12)<24
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
11
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
MOBAT – Optimizing Features & Performance
 The inclusion of K3,j requires additional storage space
 K3,j can be created in several ways, all with different impact in
performance:
 Simply adding a new column to the previous existing fact table
 Recreating the fact table including K3,j from the start
 Using a 128-bit integer column already existing in the fact table
(typically can be the primary key column)
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
12
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
Experimental Evaluation
 2.8GHz CPU, 2GB RAM (512MB for Oracle SGA), 1.5TB SATA HD
 Oracle 11g DBMS
 One standard benchmark and one real-world DW
 TPC-H Decision Support Benchmark with 1GB and 10GB scale
 Real-world Sales DW (2GB storage size)
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
13
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
Experimental Evaluation
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
14
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
Experimental Evaluation
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
15
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
Experimental Evaluation
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
16
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
Conclusions
 Our technique decreases data storage space and processing
overheads, while still proving a significant level of security
 Transparent method with minimal network bandwidth
consumption overheads, due to only rewriting queries
 Extremely easy and simple to implement in any DBMS / DW, with
low costs
 Querying the database directly will produce only realistic results
(stored data is masked at all times)
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
17
Agenda
Background
Motivation
MOBAT
Optimizing Features
Experimental Results
Conclusions &
Future Work
Future Work
 Developing the technique for also masking alphanumeric values
 Assess its security strength in comparison with other solutions
 Developing the technique for increasing its security strength
 Using higher-sized keys
 Enabling data integrity checks
 Implementing false data injection
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
18
INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM
A Data Masking Technique for Data Warehouses
THANK YOU!
Questions and Comments?
Ricardo Jorge Santos
lionsoftware.ricardo@gmail.com
ISEL, Lisbon – September/2011
19
Download