Privacy Preserving Similarity Evaluation of Time Series Data Haohan Zhu, Xianrui Meng, George Kollios Boston University Privacy Preserving Similarity Evaluation of Time Series Data Time Series Data are Everywhere Boston University Slideshow Title GoesData Here Sensitive Time-series Physiological Medical Data Signature Data Trajectory Data Financial Data 3/27/2014 Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 Privacy Concerns for Time Series Data Bostonseries University Slideshow Goes Here sensitive information Time data Title include Physiological Medical Data Signature Data Trajectory Data Financial Data Need to preserve privacy when using time series data Existing work on this topic limited (Ex. Differential Privacy ) Focused mostly on aggregation/summarization queries. Here we consider Similarity Queries Exact Distance between two specific time series Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 Privacy Preserving Similarity Evaluation Boston University Slideshow Title Goes Here Two-Party Computation Communication Input PROTOCOL Output Input Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 Similarity Evaluation of Time Series Data Boston University Slideshow Title Goes Here Distance Functions Euclidean Distance LCSS Distance ERP Distance Fréchet Distance Dynamic Time Warping (DTW) Y 7 11 7 4 5 4 3 5 7 4 2 3 3 4 6 5 3 2 4 2 3 4 2 1 2 2 4 7 2 1 3 6 8 12 17 3 4 5 4 6 7 Dynamic Time Warping 𝐷 𝑖, 𝑗 = 𝑑 𝑖, 𝑗 + min{𝐷 𝑖 − 1, 𝑗 , 𝐷 𝑖, 𝑗 − 1 , 𝐷(𝑖 − 1, 𝑗 − 1)} Where 𝑑 𝑖, 𝑗 = 𝑥𝑖 − 𝑦𝑗 X Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 Computational Model Settings Boston University Slideshow Title Goes Here Two-Party Computation Two Parties: Server and Client (Different computational powers) Inputs: Each Party has its own Time Series Data Output: One Party / Both Parties get the Distance Each party learns only the distance from the protocol Semi-Honest Threat Model (also called Honest-But-Curious Model) Each party executes the protocol correctly as specified (no deviations, malicious or not) However, they may try to learn as much as possible about the input of the other party from their views of the protocol Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 What we need to hide? Boston University Slideshow Goes Here The Original TimeTitleSeries The Matrix Entries Optimal Path Y 7 11 7 4 5 4 3 5 7 4 2 3 3 4 6 5 3 2 4 2 3 4 2 1 2 2 4 7 2 1 3 6 8 12 17 3 4 5 4 6 7 Potential Risk during Computation Filling the Matrix X Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 Protect the Matrix Boston University Slideshow Goes Here Key Cipher Matrix andTitleSecret Enc(7) Enc(11) Enc(7) Enc(4) Enc(5) Enc(5) Enc(7) Enc(4) Enc(2) Enc(3) Enc(6) Enc(5) Enc(3) Enc(2) Enc(4) Enc(4) Enc(2) Enc(1) Enc(2) Enc(2) Enc(2) Enc(1) Enc(3) Enc(6) Enc(8) 3 4 5 4 Owner of Cipher Matrix Client Owner of Secret Key Server Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 Protection when Filling the Matrix 1/3 Boston University Slideshow Goes Here Key Cipher Matrix andTitleSecret Enc(7) Enc(5) Enc(6) Enc(4) Enc(2) 3 4 5 Owner of Cipher Matrix Client 4 Owner of Secret Key Server Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 Protection when Filling the Matrix 2/3 Boston University Slideshow Goes Here Key Cipher Matrix andTitleSecret Enc(7) Enc(11) Enc(5) Enc(7) Enc(6) Enc(5) Enc(4) Enc(2) Enc(2) Enc(1) Enc(3) Enc(6) Enc(8) 3 4 5 4 Owner of Cipher Matrix Client Owner of Secret Key Server Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 Protection when Filling the Matrix 3/3 Boston University Slideshow Goes Here Key Cipher Matrix andTitleSecret Enc(7) Enc(11) Enc(7) Enc(4) Enc(5) Enc(5) Enc(7) Enc(4) Enc(2) Enc(3) Enc(6) Enc(5) Enc(3) Enc(2) Enc(4) Enc(4) Enc(2) Enc(1) Enc(2) Enc(2) Enc(2) Enc(1) Enc(3) Enc(6) Enc(8) 3 4 5 4 Owner of Cipher Matrix Client Owner of Secret Key Server Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 How to Fill One Entry of the Matrix Boston University Slideshow Title Goes Here Dynamic Time Warping Secure Euclidean Distance Computation (Part 1) Privacy Preserving Comparison of Ciphertexts (Part 2) Y2 Enc(B) Enc( d(X2, Y2) + Min(A, B, C) ) Part1 Y1 Part2 Enc(A) Enc(C) X1 X2 Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 Secure Euclidean Distance Computation Boston Title Goes Here Part 1University of theSlideshow Protocol PaillierEncryption (Partial Homomorphic Encryption) 𝐸𝑛𝑐 𝑚1 + 𝑚2 𝑚𝑜𝑑 𝑛 = 𝐸𝑛𝑐 𝑚1 ∙ 𝐸𝑛𝑐 𝑚2 𝑚𝑜𝑑 𝑛2 𝐸𝑛𝑐 𝑚1 ∙ 𝑘 𝑚𝑜𝑑 𝑛 = 𝐸𝑛𝑐 𝑚1 𝑘 𝑚𝑜𝑑 𝑛2 Square of Euclidean Distance Computation 2 𝐸𝑛𝑐 𝛿𝐸𝑢 (𝑝, 𝑞) = 𝐸𝑛𝑐( 𝑑 2 𝑖=1 𝑝𝑖 ) can be 𝐸𝑛𝑐( 𝑑𝑖=1 𝑞𝑖2 ) can be 𝐸𝑛𝑐(−2 𝑑𝑖=1 𝑝𝑖 𝑞𝑖 ) = 𝑑 2 𝑖=1 𝑝𝑖 + 𝑑 2 𝑖=1 𝑞𝑖 − 2 𝑑 𝑖=1 𝑝𝑖 𝑞𝑖 ) 𝐸𝑛𝑐( computed by the owner of 𝑝𝑖 (The Server) computed by the owner of 𝑞𝑖 (The Client) 𝑑 −2𝑞𝑖 𝑖 𝐸𝑛𝑐(𝑝𝑖 ) can be evaluated by the owner of 𝑞𝑖 (The Client) with information of 𝐸𝑛𝑐(𝑝𝑖 ) the owner of 𝑞𝑖 (The Client) is the owner of the matrix who cannot decrypt ciphertexts. The Client is also the evaluator of part 1 of the protocol Privacy Preserving Similarity Evaluation of Time Series Data Privacy Preserving Comparison Boston Title Goes Here Part 2University of theSlideshow Protocol We need a protocol to compute the minimum of the three values in 𝐷 𝑖 − 1, 𝑗 , 𝐷 𝑖, 𝑗 − 1 and 𝐷(𝑖 − 1, 𝑗 − 1) Inputs & Outputs: 3 Ciphertexts inputs from the client: 1 Ciphertext output to the client: Enc(A), Enc(B), Enc(C) Enc(min(A, B, C)) Solution Using Random Offsets One round communication protocol 3/27/2014 Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 Privacy Preserving Comparison cont. Boston UniversityGenerates Slideshow Title GoesCandidates: Here The Client The Client generates k Random Values: r1, r2, r3, …., rk (r1 is the minimum). The Client creates k+2 candidates: Enc(A+r1), Enc(B+r1), Enc(C+r1), Enc(X2+r2), Enc(X3+r3), …, Enc(Xk+rk), (Xi is one of A, B and C) by using homomorphic addition of Paillier encryption. The Client permutes and sends the k+2 candidates to the Server. The Server Decrypts and Compares Candidates The server decrypts k+2 candidates with the secret key and finds the minimal plaintext M among the k+2 plaintexts. The server re-encrypts the minimal plaintext M and returns Enc(M) to the client. The Client Compute the Result The client computes Enc(min(A, B, C)) = Enc(M - r1) by using homomorphic addition of Paillier encryption as the result. Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 Full Protocol for Each Entry Boston University Slideshow Title Goes Here Secure Euclidean Distance Computation (Part 1) One-way (Half Round) Communication Privacy Preserving Comparison of Ciphertexts (Part 2) One Round Communication Server Encrypt 𝑌𝑗 and 𝑌𝑗2 1. Decrypt Candidates 2. Compare Plaintexts 3. Re-Encrypt 4. Return 𝐸𝑛𝑐(𝑀) Client Compute 𝐸𝑛𝑐(𝐷(𝑋𝑖 , 𝑌𝑗 )) Create k+2 Candidates Compute 𝐸𝑛𝑐 𝐷 𝑋𝑖 , 𝑌𝑗 + Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 Performance Analysis Boston University Slideshow Title Goes Communication Cost : Here mn(d + k + 4) Computation Cost Server: mn(d + 2) encryptions and mn(k + 2) decryptions Client: mn(k+1) encryptions m: length of time series data of server n: length of time series data of client d: dimensionality of single data point k: size of random set For low dimensions, the server has more computational cost than the client, because encryptions and decryptions cost more than other operations and decryptions cost more than encryptions Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 Security Analysis Boston University Slideshow nothing Title Goes Here during the computation since the The client learns client cannot decrypt any ciphertext. The server learns nothing during the first phase of the protocol, however…. The server decrypts k+2 candidates. The server may create a permuted linear equation system. To solve this system is NP-hard. The Client needs to select the range for random values close to the actual values and needs to make sure that there is no wrap around. We show that our protocol can preserve at least half of the information entropy of the plaintexts. Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 How to Fill One Entry of the Matrix Boston University Slideshow Title Goes Here Fréchet Distance Three-Phase Protocol Two Comparison Protocols (Max and Min) Y2 Enc(B) Enc( Max ( d(X2, Y2) , Min(A, B, C) ) ) Y1 Enc(A) Enc(C) X1 X2 Privacy Preserving Similarity Evaluation of Time Series Data Experiments Boston University Slideshow Title Goes Here Datasets: Real world data Synthetic data Distance Functions Dynamic Time Warping Discrete Fréchet Distance Parameters n: length of time series data d: dimensionality of single data point k: size of random set 3/27/2014 Privacy Preserving Similarity Evaluation of Time Series Data Experiments Boston University Slideshow Title Goes Here Performance vs Sequence Size (Different Phases) 3/27/2014 Privacy Preserving Similarity Evaluation of Time Series Data Experiments Boston University Slideshow Title Goes Here Performance vs Sequence Size (Different Parties) 3/27/2014 Privacy Preserving Similarity Evaluation of Time Series Data Experiments Boston University Slideshow Title Goes Here Performance vs Dimensionality (Different Phases) 3/27/2014 Privacy Preserving Similarity Evaluation of Time Series Data 3/27/2014 Experiments Boston University Slideshow Title Goes Performance vs Size ofHereRandom Set (Different Phases) Thanks