ObliVM: A Programming Framework for Secure Computation Chang Liu Joint work with Xiao Shaun Wang, Kartik Nayak Yan Huang, and Elaine Shi Dating: Genetically Not leaking their sensitive genomic data to anyone else! Good match? 2 Problem Abstraction Alice Public function f Holds 𝑥 Bob Holds 𝑦 z = f(x, y) Security requirement: Reveal z but nothing more! 3 Nina Taft Distinguished Scientist 5 researchers, 4 months to develop an (efficient) oblivious matrix factorization algorithm over secure computation Generic protocols Customized protocols Low design cost, Flexible Efficient, requires Expertise Can generic secure computation be practical? Challenge 1: Efficiency: time & space Challenge 2: Programmability: for nonexpert programmers ObliVM: Achieve the Best of Both Worlds Programs by non-specialists achieve the performance of customized designs. Challenge 1: Efficiency: time & space Challenge 2: Programmability: for nonexpert programmers Programmer’s favorite model def binSearch(a, x): lo, hi = 0, len(a) res = -1 while lo <= hi: mid = (lo+hi)//2 midval = a[mid] if midval < x: lo = mid+1 elif midval > x: hi = mid else: res = mid return res Cryptographer’s favorite model XOR AND OR Accessing a secret index may leak information! …… … How secret indexes leak information? Breast cancer 𝑓(𝑥, 𝑦) XOR AND Liver problem OR …… … A naive solution (in generic approaches) is to linear scan through the entire memory for each memory access. Extremely Slow! Kidney problem Crypto Tool: Oblivious RAM 𝑂(𝑝𝑜𝑙𝑦 log 𝑁) • Hide access patterns [𝑖] [𝑀[𝑖]] ORAM Scheme • Poly-logarithmic cost per access Garbled Circuit Read M[i] • Redundancy • Data Shuffling [Shi, et al., 2011] Oblivious RAM with O((logN)3) Worst-Case Cost. In ASIACRYPT 2011. [Stefanov et al., 2013] Path ORAM: An extremely simple oblivious RAM protocol. In CCS 2013 [Wang, et al., 2015] Circuit ORAM: On Tightness of the Goldreich-Ostrovsky Lower Bound 𝑖 Oblivious Program Source Program Challenge! Oblivious Program Easy Circuit ObliVM: A Programming Framework for Oblivious Computation Program-specific optimizations through static analysis [LHS-CSF’13] [LHSKH-Oakland’14] [LHMHTS-ASPLOS’15] Programming abstractions for oblivious computation [LWNHS-Oakland’15] Example: FindMax int max(public int n, secret int h[]) { public int i = 0; secret int m = 0; while (i < n) { if (h[i] > m) then m = h[i]; i++; } return m; } h[] need not be in ORAM. Encryption suffices. Dynamic Memory Accesses: Main loop in Dijkstra dis[]: Not in ORAM vis[], e[][]: Inside ORAM for(int i=1; i<n; ++i) { int bestj = -1; for(int j=0; j<n; ++j) if(!vis[j] && (bestdis < 0 || dis[j] < bestdis)) bestdis = dis[j]; vis[bestj] = 1; for(int j=0; j<n; ++j) if(!vis[j] && (bestdis + e[bestj][j] < dis[j])) dis[j] = bestdis + e[bestj][j]; } Our compiler automates this analysis Do we need to place all variables/data inside one ORAM? Key observation: Accesses that do not depend on secret inputs need not be hidden A memory-trace obliviousness type system ensures the security of the target program. [LHS-CSF’13, LHSKH-Oakland’14, LHMHTS-ASPLOS’15] [LHS-CSF ‘13] Memory Trace Oblivious Program Execution. In CSF 2013. [LHSKH-Oakland ‘14] Automating RAM-model Secure Computation. In Oakland 2014 [LHMHTS-ASPLOS ‘15] GhostRider: A Hardware-Software System for Memory Trace Oblivious Computation. In ASPLOS 2015 ObliVM: A Programming Framework for Oblivious Computation Program-specific optimizations through static analysis [LHS-CSF’13] [LHSKH-Oakland’14] [LHTHMS-ASPLOS’15] Programming abstractions for oblivious computation [LWNHS-Oakland’15] Analogy to Parallel Computation Approach 1: Limited opportunities for compile-time optimizations. A program written in C Compile Approach 2: MapReduce is a parallel programming abstraction. A program written in Compile MapReduce Programming Abstractions for Oblivious Computation Approach 1: Limited opportunities for compile-time optimizations. A program written in C Approach 2: Compile Oblivious representation using ORAM [NWIWTS-Oakland15] [WLNHS-Oakland15] We provide oblivious programming abstractions. A program written in ObliVM Oblivious representation using ORAM (generic) Compile and oblivious algorithms abstractions (problem specific, but efficient) Interactions between PL and algorithms Programming abstractions Oblivious algorithms Find common patterns, generalize into abstractions The expected Interactions between PL and algorithms The unexpected New insights lead to new algorithms Programming abstractions Oblivious algorithms Find common patterns, generalize into abstractions The expected Interactions between PL and algorithms The unexpected Interactions between PL and New insights lead to algorithms allowed us to new algorithms solve open problems in oblivious algorithms design! Programming Oblivious abstractions algorithms • Depth-First Search • Shortest path • Minimum spanning tree Find common patterns, generalize into abstractions The expected Loop Block 1 ×n Coalescing Block 2 ×m Block 3 ×n Gives oblivious Dijkstra and MST for sparse graphs Loop Coalescing Gives oblivious Dijkstra and MST for sparse graphs Hand-crafting vs. Automated Compilation 2013 Nina Taft Distinguished Scientist Matrix Factorization [NIWJTB-CCS’13] 5 researchers 4 months Ridge Regression ObliVM Today Same Tasks 1 graduate student-day 10x-20x better performance [NWIJBT-IEEE S&P ’13] 5 researchers 3 weeks [LWNHS-IEEE S&P ’15] (This work) Backend Speedup for More Applications PL Earlier non-tree-based ORAMs perform Circuit ORAM worse than linear scans of memory[HKFV12] Speedup 106 9x105x 9x105x 7x 7x 1.7x106x 7x 2x 105 1.6x104x 5900x 104 103 2500x 2500x 7x 7x 5.5x 13x 2.6x104x 8200x 7x 10 51x 51x Dijkstra MST 65x 7400x 10x 7x 2x 5.5x 100 407x 7x 1.2x105x 366x 212x 530x 1 Data size: 768KB 768KB K-Means Heap 2MB Map/Set BSearch AMS CountMin Sketch Sketch 8GB 8GB 1GB 10GB 0.31GB ObliVM: Binary Search on 1GB Database Reference point: ~24 hours in 2012 [HFKV-CCS’12] ObliVM Today: 7.3 secs/query 2 EC2 virtual cores, 60GB memory, 10MBps bandwidth [HFKV-CCS’12] Holzer et al. Secure Two-Party Computations in ANSI C. In CCS ‘12 ObliVM: Binary Search on 1GB Database Reference point: ~24 hours in 2012 [HFKV-CCS’12] With cryptographic extensions (projected) 0.3 secs/query 2 EC2 virtual cores, 60GB memory, 300MBps bandwidth [HFKV-CCS’12] Holzer et al. Secure Two-Party Computations in ANSI C. In CCS ‘12 Overhead w.r.t. Insecure Baseline Distributed GWAS 130× slowdown Hamming Distance 4 1.7×10 × slowdown K-Means 6 9.3×10 × slowdown Overhead w.r.t. Insecure Baseline Distributed Opportunities for further optimizations: 130 × slowdown GWAS • Hardware acceleration 4 Hamming • Parallelism 1.7×10 × slowdown Distance • Faster cryptography K-Means … 6 9.3×10 × slowdown ObliVM Adoption www.oblivm.com Privacy-preserving data mining and recommendation system Computational biology, privacy-preserving microbiome analysis Privacy-preserving Software-Defined Networking Cryptographic MIPS processor iDash secure genome analysis competition (Won an “HLI Award for Secure Multiparty Computing”) Future Work: From ObliVM to A Unified Programming Framework for Modern Cryptography Secure Multiparty Computation ObliVM: Program Obfuscation (DARPA Safeware) Compiling Programs into Circuits Fully Homomorphic Encryption Functional Encryption Verifiable Computation