Why and How to Build a Trusted Database System on Untrusted Storage? Radek Vingralek STAR Lab, InterTrust Technologies In collaboration with U. Maheshwari and W. Shapiro 1 What? Trusted Storage can be read and written only by trusted programs Stanford Database Seminar 2 Why? Digital Rights Management contract content Stanford Database Seminar 3 What? Revisited volatile memory processor untrusted storage trusted storage <50B Stanford Database Seminar 4 What? Refined Must protect also against accidental data corruption • • • • atomic updates efficient backups type-safe interface automatic index maintenance Must run in an embedded environment • small footprint Must provide acceptable performance Stanford Database Seminar 5 What? Refined Can assume single-user workload • none or a simple concurrency control • optimized for response time, not throughput • lots of idle time (can be used for database reorganization) Can assume a small database • 100 KB to 10 MB • can cache the working set – no-steal buffer management Stanford Database Seminar 6 A Trivial Solution plaintext data Critique: key H(db) encryption, hashing COTS dbms • does not protect metadata • cannot use sorted indexes trusted storage db untrusted storage Stanford Database Seminar 7 A Better Solution plaintext data Critique: • must scan, hash and crypt the entire db to read or write (COTS) dbms encryption, hashing db untrusted storage key H(db) trusted storage Stanford Database Seminar 8 Yet A Better Solution plaintext data Open issues: (COTS) dbms key H(A) encryption, hashing A H(B) H(C) B H(D) D • could we do better than a logarithmic overhead? • could we integrate the tree search with data location? C H(E) H(F) E H(G) F untrusted storage G Stanford Database Seminar 9 TDB Architecture Backup Store • full / incremental • validated restore Collection Store • index maintenance • scan, match, range Collections of Objects Object Store • object cache • concurrency control Object • abstract type Chunk Store • encryption, hashing • atomic updates Chunk • byte sequence • 100B--100KB Untrusted storage Trusted storage Stanford Database Seminar 10 Chunk Store - Specification Interface • allocate() -> ChunkId • write( ChunkId, Buffer ) • read( ChunkId ) -> Buffer • deallocate( ChunkId ) Crash atomicity • commit = [ write | deallocate ]* Tamper detection • raise an exception if chunk validation fails Stanford Database Seminar 11 Chunk Store – Storage Organization Log-structured Storage Organization • no static representation of chunks outside of the log • log in the untrusted storage Advantages • • • • • traffic analysis cannot link updates to the same chunk atomic updates for free easily supports variable-sized chunks copy-on-write snapshots for fast backups integrates well with hash verification (see next slide) Disadvantages • destroys clustering (cacheable working set) • cleaning overhead (expect plenty of idle time) Stanford Database Seminar 12 Chunk Store - Chunk Map Integrates hash tree and location map • Map: ChunkId Handle • Handle = ‹Hash, Location› • MetaChunk = Array[Handle] trusted storage H(R) R meta chunks S T data chunks X Stanford Database Seminar Y 13 Chunk Store - Read Basic scheme: Dereference handles from root to X Derefence trusted storage H(R) • use location to fetch • use hash to validate cached Optimized • • • • trusted cache: ChunkId Handle look for cached handle upward from X derefence handles down to X avoids validating entire path X Stanford Database Seminar R S T Y 14 Chunk Store - Write Basic: write chunks from X to root Optimized: trusted storage H(R) • buffer dirty handle of X in cache • defer upward propagation R S dirty T X Stanford Database Seminar Y 15 Chunk Store - Checkpointing the Map When dirty handles fill cache • write affected meta chunks to log • write root chunk last trusted storage H(R) X ... X ... T S R meta chunks Stanford Database Seminar 16 Chunk Store - Crash Recovery Process log from last root chunk • residual log • checkpointed log Must validate residual log trusted storage H(R) X ... X ... T S R ... Y ... crash residual log Stanford Database Seminar 17 Chunk Store - Validating the Log Keep incremental hash of residual log in trusted storage • updated after each commit Hash protects all current chunks • in residual log: directly • in checkpointed log: through chunk map trusted storage H*(residual-log) X ... X ... T S R ... Y ... crash residual log Stanford Database Seminar 18 Chunk Store - Counter-Based Log Validation A commit chunk is written with each commit • contains a sequential hash of commit set • signed with system secret key One-way counter used to prevent replays Benefits: • allows bounded discrepancy between trusted and untrusted storage • doesn’t require writing to trusted storage after each transaction hash hash X ... X ... T S R c.c. 73 X c.c. 74 ... crash residual log Stanford Database Seminar 19 Chunk Store - Log Cleaning Log cleaner creates free space by reclaiming obsolete chunk versions Segments • Log divided into fixed-sized regions called segments ( ~100 KB) • Segments are securely linked in the residual log for recovery Cleaning step • read 1 or more segments • check chunk map to find live chunk versions – ChunkId’s in the headers of chunk versions • write live chunk versions to the end of log • mark segments as free May not clean segments in residual log Stanford Database Seminar 20 Chunk Store - Multiple Partitions Partitions may use separate crypto parameters (algorithms, keys) Enables fast copy-on-write snapshots and efficient backups More difficult for the cleaner to test chunk version liveness Partition Map Partition Map Q P Q P Position Maps Position Maps Data chunks Data chunks Stanford Database Seminar D D2 21 Chunk Store - Cleaning and Partition Snapshots P updates c Snaphot PQ Q&P Q P.a P.b P.c P.a P.b P.c Cleaner moves Q’s c P Q P.c P.a P.b P.c Checkpoint P.a P.b P.c ... P.c P ... P.c P.c Crash!! P.c ... Residual log Stanford Database Seminar 22 Backup Store Creates and restores backups of partitions Backups can be full or incremental Backup creation utilizes snapshots to guarantee backup consistency (wrt concurrent updates) without locking Supports full and incremental backups of partitions Backup Store must verify during a backup restore • integrity of the backup (using a signature) • correctness of incremental restore sequencing Stanford Database Seminar 23 Object Store Provides type-safe access to named C++ objects • objects provide pickle and unpickle methods for persistence • but no transparent persistence Implements full transactional semantics • in addition to atomic updates Maps each object into a single chunk • less data written and read from the log • simplifies concurrency control Provides an in-memory cache of decrypted, validated, unpickled, type-checked C++ objects Implements no-steal buffer management policy Stanford Database Seminar 24 Collection Store Provides access to indexed collections of C++ objects using scan, exact match and range queries Performs automatic index maintenance during updates • implements insensitive iterators Uses functional indices • an extractor function is used to obtain a key from an object Collections and indexes are represented as objects • index nodes locked according to 2PL Stanford Database Seminar 25 Performance Evaluation - Benchmark Compared TDB to BerkeleyDB using TPC-B Used TPC-B because: • implementation included with BerkeleyDB • BerkeleyDB functionality limited choice of benchmarks (e.g., 1 index per collection) Stanford Database Seminar 26 Performance Evaluation - Setup Evaluation platform • • • • • 733 MHz Pentium II, 256 MB Windows NT 4.0, NTFS files EIDE disk, 8.9 ms (read), 10.9 ms write seek time 7200 RPM (4.2 ms avg. rot. latency) one-way counter: file on NTFS Both systems used a 4 MB cache Crypto parameters (for secure version of TDB): • SHA-1 for hashing (hash truncated to 12 B) • 3DES for encryption Stanford Database Seminar 27 Performance Evaluation - Results Response Time (avg over 100,000 transactions in a steady state): TDB utilization was set to 60% avg. response time (ms) 8 7 6.8 5.8 6 5 3.8 4 3 2 1 0 BerkeleyDB TDB Stanford Database Seminar TDB-S 28 Response Time vs. Utilization Measured response times for different TDB utilizations: avg. response time (ms) 8 7 6 5 TDB 4 BerkeleyDB 3 2 1 0 0.5 0.6 0.7 0.8 0.9 utilization Stanford Database Seminar 29 Related Work Theoretical work • Merkle Tree 1980 • Checking correctness of memory (Blum, et. al. 1992) Secure audit logs, Schneier & Kelsey 1998 • append-only data • read sequentially Secure file systems • Cryptographic FS, Blaze ‘93 • Read-only SFS, Fu et al. ‘00 • Protected FS, Stein et al. ‘01 Stanford Database Seminar 30 A Retrospective Instead of Conclusions Got lots of mileage from using log-structured storage Partitions add lots of complexity Cleaning not a big problem Crypto overhead small on modern PCs (< 6%) Code footprint too large for many embedded systems • needs to be within 10 KB • GnatDb (see a TR) For More Information: • OSDI 2000 -- “How to Build a Trusted Database System on Untrusted Storage.” U. Maheshwari, R. Vingralek, W. Shapiro • Technical Reports available at http://www.star-lab.com/tr/ Stanford Database Seminar 31 Database Size vs. Utilization 350 database size (MB) 300 250 200 TDB 150 BerkeleyDB 100 50 0 0.5 0.6 0.7 0.8 0.9 utilization Stanford Database Seminar 32 Stanford Database Seminar 33