CryptDB: A Practical Encrypted Relational DBMS Raluca Ada Popa, Nickolai Zeldovich, and Hari Balakrishnan MIT CSAIL New England Database Summit 2011 Hackers Curious DB administrators Physical attacks Both on public clouds and private data centers Regulatory laws Approach Perform SQL query processing on encrypted data Database server user queries Client frontend Trusted Stores schema, master key No query execution 1. 2. 3. Stores the database and Support standard SQL queriesprocesses on encrypted data SQL queries Not DB trusted to keep data Process queries completely atthe server private No change to existing DBMS Example 60 100 800 100 SELECT * FROM emp WHERE salary = 100 ≥ Frontend emp SELECT * FROM table1 WHERE col1 = x5a8c34 x638e54 ≥ x638e5 x5a8c34 ? 4 x5a8c34 x922eb4 x638e5 x5a8c34 4 rank name salary x934bc1 x1eab8 x4be219 1 x5a8c34 x638e5 x95c623 4 x922eb4 x84a21c x2ea887 x638e5 x5a8c34 x17cea7 4 Two techniques 1. SQL-aware encryption strategy – Different encryption schemes provide different functionality 2. Adjustable query-based encryption – Adapt encryption of data based on user queries 1. SQL-aware encryption Highest Privacy Scheme Operation Details RND None AES in UFE HOM +, * e.g., Paillier DET equality AES in CTR JOIN join new SEARCH ILIKE Song et al.’00 OPE order Boldyreva et al. ’09 e.g., =, !=, GROUP BY, IN, COUNT, DISTINCT e.g., >, <, ORDER BY, SORT, MAX, MIN first practical implementation Onions of encryptions RND DET SEARCH JOIN RND OPE OPE-JOIN Any value Any value Onion 1 Onion 2 HOM int value Onion 3 Each column has the same key in a given layer of an onion 2. Adjustable query-based encryption Start out the database with the most secure encryption scheme Adjust encryption dynamically Strip off levels of the onions: frontend gives key to server using a UDF Example RND DET emp: rank name salary SEARCH JOIN Any value SELECT * FROM emp WHERE salary = 100000 UPDATE table1 SET col3onion1 = DecryptRND(key, col3onion1) SELECT * FROM table1 WHERE col3onion1 = x5a8c34 JOIN needs new crypto Challenge: do not know which columns will be joined Col1 Join key Col1-Col2 Client Frontend = Col2 - Data items not revealed, cannot join without join key Further components Inserts, updates, deletes, nested queries Indexes Transactions, auto-increments Optimizations to speed up performance Not supported: A.a + A.b > B.c Security converges… … to maximum privacy for query mix Onion levels stripped only when new operations needed Steady State: no decryptions at server Practical: typical SQL processing on enlarged tuples Privacy Guarantees Formal privacy definition and proof Implications: emp: rank name • • salary If query has • equality predicate on name repeats • order predicate on name order • aggregation on salary nothing • no filter on a column nothing Never reveal plaintext Server cannot compute unrequested queries requiring new relationships Privacy (cont’d) DB owner can specify minimum security level for some fields CREATE TABLE emp (SSN text ≥ DET, name text, …) Implementation SQL Interface Server Query Encrypted Query Frontend Results Unmodified DBMS Encrypted Results CryptDB PK tables CryptDB UDFs (user-defined functions) No change to the DBMS Should work on most SQL DBMS Portability Ported CryptDB from Postgres to MySQL with 86 lines of code No change to MySQL Code changed was to connect to server, UDF declarations Low overhead on TPC-C • Supports all queries in TPC-C without change Throughput loss 27% Microbenchmarks from TPC-C Adjustable encryption Steady state of columns for TPC-C: 71% of columns remain encrypted with RND Importance of adjustable query-based encryption to privacy In practice, we expect most sensitive fields to remain at RND or DET (e.g., credit cards) Related work Theoretical approaches [Gennaro et al., ’10] – Inefficient Search on encrypted data (e.g., [Chang, Mitzenmacher ‘05], [Evdokimov, Guenther ’07]) – Restricted set of queries, inefficient Systems proposals (e.g., [Hacigumus et al., ’02]) – Lower degree of security, rewrite the DBMS, client-side processing Conclusions CryptDB is the first practical DBMS for running most standard queries on encrypted data – Runs queries completely at server – Provides provable privacy guarantees – Modest overhead – Does not change the DBMS or client applications Thanks!