Invalidation Clues for Database Scalability Services

advertisement
Invalidation Clues for Database
Scalability Services
Amit Manjhi*1, Phillip B. Gibbonsz, Anastassia Ailamaki*,
Charles Garrod*, Bruce M. Maggs*y, Todd C. Mowry*z,
Christopher Olston©*, Anthony Tomasic*, Haifeng Yux
*
Carnegie Mellon University
1
z
Intel Research Pittsburgh
y
Akamai Technologies
x
National University of Singapore
©
1
Yahoo! Research
Buxfer, Inc.
Databases
@Carnegie Mellon
Typical Architecture of Dynamic
Web Applications
Execute
code
Users
Access
DB
Request
Internet
Response
DB
App
Web
Server Server
Home server
2
Dynamic Web applications need to provision for
variable and unpredictable load
Databases
@Carnegie Mellon
Content Delivery Networks
CDN nodes
Users
Internet
• Scales central web server
• Works well for static content
3
Databases
@Carnegie Mellon
CDN Application Services
CDN nodes
Users
Internet
Database server is still a bottleneck
4
Databases
@Carnegie Mellon
Database Scalability Service
(DBSS) Architecture
Users
Internet
User queries answered from DB cache
How to guarantee privacy of data?
5
Databases
@Carnegie Mellon
Privacy concerns dictate that:
DBSS is provided encrypted data
• Cache base tables: does not work
• Cache query results – invalidate on
updates
Users
6
Home server maintains
master copy and
handles updates directly
Internet
Databases
@Carnegie Mellon
A Simple Example
comments (id, rating, story)
No Invalidations
Q:id=11,15
11
Q: id=11,15
Empty
Q
U
1 Wintel
15 1
2 Wintel
DBSS node
Nothing is
encrypted
Home server database
Q:SELECT id FROM comments WHERE story=“Wintel” AND rating>0
U:UPDATE comments SET rating=2 WHERE id=15
Invalidate
Empty
Q: Result
Q
U
7
Q: Result
1 Wintel Results
are
2 Wintel encrypted
15 1
11
More encryption can lead to more invalidations
Databases
@Carnegie Mellon
Privacy-Scalability Space for Query
Result Caching
No
encryption
Scalability
No
Encrypt data not useful for invalidation
(Our prior work, SIGMOD 2006)
Prior
Encrypt
Want solutions in this space
everything
Full
(Maximum privacy,
read-only scalability)
Privacy
8
Databases
@Carnegie Mellon
Our Approach: Invalidation Clues
Invalidations
(query clue, update clue)
• Limit unnecessary invalidation
Result
Query information
• Limit revealed
query
Result
QueryEmpty
clue
DBSS
Limit home
server overhead
update
Update
query
clue
Result
Query clue
Database
Home server
Query
Update
Invalidation clues offer a more general, flexible framework
9
Databases
@Carnegie Mellon
Example Bulletin-Board Application
SELECT id FROM comments WHERE story=? AND rating>?
UPDATE comments SET rating=?2 WHERE id=?5
1. Extra invalidation in no encryption scenario: results
with rating_param<2 and no id=5 in result
2. Example clue:
• story of comment being updated (update clue)
Invalidation clues enable more precise invalidations
than the “No” encryption scenario
10
Databases
@Carnegie Mellon
Privacy-Scalability Space for Query
Result Caching
Scalability
clues offer fine-grained tradeoff
No
Encrypt
(Code-analysis
data not useful
privacy,
for invalidation
encryption
(Our
maximum
prior work,
scalability)
SIGMOD 2006)
Database
No
Prior
Encrypt
Want solutions in this space
everything
Full
(Maximum privacy,
read-only scalability)
Privacy
11
Databases
@Carnegie Mellon
Outline





12
Introduction to invalidation clues framework
Improving scalability in the clues framework
Improving privacy in the clues framework
Evaluation results
Related work and summary
Databases
@Carnegie Mellon
Improving Scalability in the Clues
Framework
As a first cut, Fewer invalidations  More scalability
What is the “most precise” invalidation that can be done?
Database Inspection Strategy: Invalidate as if
using the database
Extra data (database clues) can either be attached to query
results (query result clue) or updates (update clue)
13
Databases
@Carnegie Mellon
Database Clues and Beyond
SELECT id FROM comments WHERE story=? AND rating>?
UPDATE comments SET rating=? WHERE id=?
Query Clue: Story of ALL
id story Auxiliary view
comments
Update Clue: Story of the comment
1. Consistency
On-the-fly
2. Privacy
being updated
Still better: Opportunistic Strategy – use database
clues only when benefit exceeds overhead
14
Databases
@Carnegie Mellon
Outline





15
Introduction to invalidation clues framework
Improving scalability in the clues framework
Improving privacy in the clues framework
Evaluation results
Related work and summary
Databases
@Carnegie Mellon
Attack Model of the DBSS
2. DBSS can pose as a user – chosen-plaintext attack
1. DBSS learns from query clues, update clues, and
invalidations – ciphertext-only attack
Users
16
Internet
Databases
@Carnegie Mellon
Results on Improving Privacy
SELECT id FROM comments WHERE story=? AND rating>?
UPDATE comments SET rating=? WHERE id=?
Invalidation decision involves equality on id
and story; order comparison on rating
Needless invalidations can improve privacy
Key idea
Extreme: If all query results are always invalidated,
DBSS can’t distinguish between any two query results
17
Paper has details on improving privacy for
equality and order comparisons
Databases
@Carnegie Mellon
Outline





18
Introduction to invalidation clues framework
Improving scalability in the clues framework
Improving privacy in the clues framework
Evaluation results
Related work and summary
Databases
@Carnegie Mellon
Benchmark Applications
19

Auction (RUBiS, from Rice)

Bulletin board (RUBBoS, from Rice)

Bookstore (TPC-W, from UW-Madison)
Databases
@Carnegie Mellon
Evaluation Methodology
Scalability: max # concurrent users with acceptable
response times
Users
20
5 ms
100 ms
Home server
CDN and DBSS
Databases
@Carnegie Mellon
Scalability (number of
concurrent users supported)
No clues
Clues (no DB clues)
Clues (incl. DB clues)
Opportunistic
900
600
300
0
0
Auction
Bboard
Bookstore
Benchmark Applications
21
1. Clues help
Databases
2. Opportunistic has the best scalability
@Carnegie Mellon
Related Work


22
Outsource database: [Hacigumus+ 2002],
[Hacigumus+ 2002], [Agrawal+ 2004]
Outsource database scalability: DBCache [Luo+
2002, Altinel+ 2003], DBProxy [Amiri+ 2003],
NEC cache portal [Li+ 2003], MTCache
[Larson+ 2004], [Manjhi+ 2006]
Databases
@Carnegie Mellon
Related Work


23
View invalidation strategies: [Levy and Sagiv
1993], [Candan+ 2002], [Choi and Luo 2004]
Privacy: [Agrawal+ 2004], [Hore+ 2004],
[Manjhi+ 2006]
Databases
@Carnegie Mellon
Summary

Invalidation clues: general framework for limiting
Unnecessary invalidation
 Revealed information
 Home server overhead
Fine-grained tradeoff between privacy and scalability


Database clues



24
Update clues better than query clues
Opportunistic use of database clues  best scalability
Evaluation on three application benchmarks
Databases
@Carnegie Mellon
Back-up slides….
25
Databases
@Carnegie Mellon
Download