HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES

advertisement
HENGHA:
DATA HARVESTING DETECTION ON
HIDDEN DATABASES
Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi
University of California, Santa Barbara
CCSW 2010
Data Security Concern: Back-End Databases
of Web-based Applications
• Form-based query interfaces provide entrance to both
users and attackers.
• Traditional Attacks
• Submit malicious requests to break in the hidden database
through vulnerable holes in the application, e.g. SQL injection
[Vale05].
• Many can be detected by prior work.
10/8/2010
2
Data Security Concern: Back-End
Databases of Web-based Applications
• Data Harvesting Attacks
• Iteratively submit legitimate queries to extract data inventory or
infer sensitive aggregate information.
• E.g 1. A competitor of a car rental company A harvested A’s
inventory about a popular car.
• E.g 2. Terrorists inferred that a flight was relatively empty and could
be a hijacking target.
10/8/2010
3
Anatomy of Data Harvesting Attacks
• General strategy
• Iteratively submit legitimate queries with valid fields, analyze the
results and then design new queries with the goal of maximizing
information gain through limited #queries.
• Two types of harvesting attacks to consider
• Crawling Attack
• Performed by deep web crawling [Madh08]
• Sampling Attack
• Performed by uniform random sampling on results of sizes no more
than K [Dasg09]
10/8/2010
4
How To Defend Against Data Harvesting Attacks
• Database inference control [Denn83]?
• Query set restriction is not effective, especially on sampling
attacks.
• Query set restriction and data perturbation [Dasg09] hurt usability.
• Web robot detection [Tan02]?
• Data harvesters can camouflage normal users’ http traffic patterns.
10/8/2010
5
Our Approach
• Detection based on search behaviors within sessions
• Attackers’ search behaviors
• Diversity
• Queries are not concentrated and localized, and they reflect very
• distinct intents
• Broadness
• The results of the queries cover a broad scope of the underlying data.
10/8/2010
6
HengHa: Detecting Data Harvesting Attacks at Single
Session Level
• Identify data harvesting attackers by examining if their search
behaviors in a session show relatively significant diversity
and broadness.
• Diversity -> query correlation
• Broadness -> result coverage
HengHa
10/8/2010
D
E
T
E
C
T
O
R
Heng: query
correlation
observer
Web
query
Application
suspicious
Ha: result
coverage
monitor
result
DB
7
Heng: Query Correlation Observer
• Key idea
• Frequent predicate value sets as indications of correlations
among queries
Queries in a Session That Plans Trip to Chicago
• Intuitively, if a session has more frequent predicate value sets with
higher supports, and those predicate value sets are more similar to
the queries, the queries in this session are more correlated.
10/8/2010
8
Ha: Result Coverage Monitor
• Key idea
• Sort multi-attribute data D in a
total order, e.g. z-curve, that
preserves locality.
• Create a coverage bit vector
(CBV), where the bits
correspond to the data in the
total order.
• Access a data -> set a bit
• Training
• Cluster CBVs to model
different data access patterns
10/8/2010
y
1
3
0
2
0000
0100
1100
1110
x
9
Experiment
• Extracted 98,564 real user query sessions and a data table of 387
records from KDD Cup 2000 clickstream dataset
• Synthesized 1000 attack sessions [Madh08, Dasg09]
• Run on a server with Intel 2.4GHz CPU, 3GB RAM and FC 8 OS
• Performed four folds cross-validation
Effectiveness
Efficiency
of Detection
of Detection
in Four Validations
in Four Validation
10/8/2010
10
Conclusion & Future Work
• Identified non-traditional data harvesting attacks on the
back-end databases of web-based applications, i.e.
crawling attack and sampling attack.
• Detection based on identifying attackers’ special search
behaviors at single session level, diversity->query
correlation observer, broadness->result coverage monitor.
• Detecting cross-session data harvesting attacks will be
considered in the future work.
10/8/2010
11
References
• [Vale05] F. Valeur et al. A learning-based approach to the
•
•
•
•
detection of sql attacks. In DIMVA, pages 123–140, 2005.
[Dasg09] A. Dasgupta et al. Privacy preservation of
aggregates in hidden databases: why and how? In
SIGMOD, pages 153–164, 2009.
[Madh08] J. Madhavan et al. Google’s deep web crawl.
PVLDB, 1(2):1241–1252, 2008.
[Tan02] P.-N. Tan et al. Discovery of web robot sessions
based on their navigational patterns. Data Min. Knowl.
Discov., 6(1):9–35, 2002.
[Denn83] D. E. Denning et al. Inference controls for
statistical databases. Computer, 16(7):69–82, 1983.
10/8/2010
12
Thanks for Listening
10/8/2010
13
Download