Invalidation Clues for Database Scalability Services Amit Manjhi*1, Phillip B. Gibbonsz, Anastassia Ailamaki*, Charles Garrod*, Bruce M. Maggs*y, Todd C. Mowry*z, Christopher Olston©*, Anthony Tomasic*, Haifeng Yux * Carnegie Mellon University 1 z Intel Research Pittsburgh y Akamai Technologies x National University of Singapore © 1 Yahoo! Research Buxfer, Inc. Databases @Carnegie Mellon Typical Architecture of Dynamic Web Applications Execute code Users Access DB Request Internet Response DB App Web Server Server Home server 2 Dynamic Web applications need to provision for variable and unpredictable load Databases @Carnegie Mellon Content Delivery Networks CDN nodes Users Internet • Scales central web server • Works well for static content 3 Databases @Carnegie Mellon CDN Application Services CDN nodes Users Internet Database server is still a bottleneck 4 Databases @Carnegie Mellon Database Scalability Service (DBSS) Architecture Users Internet User queries answered from DB cache How to guarantee privacy of data? 5 Databases @Carnegie Mellon Privacy concerns dictate that: DBSS is provided encrypted data • Cache base tables: does not work • Cache query results – invalidate on updates Users 6 Home server maintains master copy and handles updates directly Internet Databases @Carnegie Mellon A Simple Example comments (id, rating, story) No Invalidations Q:id=11,15 11 Q: id=11,15 Empty Q U 1 Wintel 15 1 2 Wintel DBSS node Nothing is encrypted Home server database Q:SELECT id FROM comments WHERE story=“Wintel” AND rating>0 U:UPDATE comments SET rating=2 WHERE id=15 Invalidate Empty Q: Result Q U 7 Q: Result 1 Wintel Results are 2 Wintel encrypted 15 1 11 More encryption can lead to more invalidations Databases @Carnegie Mellon Privacy-Scalability Space for Query Result Caching No encryption Scalability No Encrypt data not useful for invalidation (Our prior work, SIGMOD 2006) Prior Encrypt Want solutions in this space everything Full (Maximum privacy, read-only scalability) Privacy 8 Databases @Carnegie Mellon Our Approach: Invalidation Clues Invalidations (query clue, update clue) • Limit unnecessary invalidation Result Query information • Limit revealed query Result QueryEmpty clue DBSS Limit home server overhead update Update query clue Result Query clue Database Home server Query Update Invalidation clues offer a more general, flexible framework 9 Databases @Carnegie Mellon Example Bulletin-Board Application SELECT id FROM comments WHERE story=? AND rating>? UPDATE comments SET rating=?2 WHERE id=?5 1. Extra invalidation in no encryption scenario: results with rating_param<2 and no id=5 in result 2. Example clue: • story of comment being updated (update clue) Invalidation clues enable more precise invalidations than the “No” encryption scenario 10 Databases @Carnegie Mellon Privacy-Scalability Space for Query Result Caching Scalability clues offer fine-grained tradeoff No Encrypt (Code-analysis data not useful privacy, for invalidation encryption (Our maximum prior work, scalability) SIGMOD 2006) Database No Prior Encrypt Want solutions in this space everything Full (Maximum privacy, read-only scalability) Privacy 11 Databases @Carnegie Mellon Outline 12 Introduction to invalidation clues framework Improving scalability in the clues framework Improving privacy in the clues framework Evaluation results Related work and summary Databases @Carnegie Mellon Improving Scalability in the Clues Framework As a first cut, Fewer invalidations More scalability What is the “most precise” invalidation that can be done? Database Inspection Strategy: Invalidate as if using the database Extra data (database clues) can either be attached to query results (query result clue) or updates (update clue) 13 Databases @Carnegie Mellon Database Clues and Beyond SELECT id FROM comments WHERE story=? AND rating>? UPDATE comments SET rating=? WHERE id=? Query Clue: Story of ALL id story Auxiliary view comments Update Clue: Story of the comment 1. Consistency On-the-fly 2. Privacy being updated Still better: Opportunistic Strategy – use database clues only when benefit exceeds overhead 14 Databases @Carnegie Mellon Outline 15 Introduction to invalidation clues framework Improving scalability in the clues framework Improving privacy in the clues framework Evaluation results Related work and summary Databases @Carnegie Mellon Attack Model of the DBSS 2. DBSS can pose as a user – chosen-plaintext attack 1. DBSS learns from query clues, update clues, and invalidations – ciphertext-only attack Users 16 Internet Databases @Carnegie Mellon Results on Improving Privacy SELECT id FROM comments WHERE story=? AND rating>? UPDATE comments SET rating=? WHERE id=? Invalidation decision involves equality on id and story; order comparison on rating Needless invalidations can improve privacy Key idea Extreme: If all query results are always invalidated, DBSS can’t distinguish between any two query results 17 Paper has details on improving privacy for equality and order comparisons Databases @Carnegie Mellon Outline 18 Introduction to invalidation clues framework Improving scalability in the clues framework Improving privacy in the clues framework Evaluation results Related work and summary Databases @Carnegie Mellon Benchmark Applications 19 Auction (RUBiS, from Rice) Bulletin board (RUBBoS, from Rice) Bookstore (TPC-W, from UW-Madison) Databases @Carnegie Mellon Evaluation Methodology Scalability: max # concurrent users with acceptable response times Users 20 5 ms 100 ms Home server CDN and DBSS Databases @Carnegie Mellon Scalability (number of concurrent users supported) No clues Clues (no DB clues) Clues (incl. DB clues) Opportunistic 900 600 300 0 0 Auction Bboard Bookstore Benchmark Applications 21 1. Clues help Databases 2. Opportunistic has the best scalability @Carnegie Mellon Related Work 22 Outsource database: [Hacigumus+ 2002], [Hacigumus+ 2002], [Agrawal+ 2004] Outsource database scalability: DBCache [Luo+ 2002, Altinel+ 2003], DBProxy [Amiri+ 2003], NEC cache portal [Li+ 2003], MTCache [Larson+ 2004], [Manjhi+ 2006] Databases @Carnegie Mellon Related Work 23 View invalidation strategies: [Levy and Sagiv 1993], [Candan+ 2002], [Choi and Luo 2004] Privacy: [Agrawal+ 2004], [Hore+ 2004], [Manjhi+ 2006] Databases @Carnegie Mellon Summary Invalidation clues: general framework for limiting Unnecessary invalidation Revealed information Home server overhead Fine-grained tradeoff between privacy and scalability Database clues 24 Update clues better than query clues Opportunistic use of database clues best scalability Evaluation on three application benchmarks Databases @Carnegie Mellon Back-up slides…. 25 Databases @Carnegie Mellon