Privacy Preserving Similarity Evaluation of Time Series Data Boston University

advertisement
Privacy Preserving Similarity Evaluation
of Time Series Data
Haohan Zhu, Xianrui Meng, George Kollios
Boston University
Privacy Preserving Similarity Evaluation of Time Series Data
Time Series Data are Everywhere
Boston University
Slideshow Title GoesData
Here
 Sensitive
Time-series




Physiological Medical Data
Signature Data
Trajectory Data
Financial Data
3/27/2014
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
Privacy Concerns for Time Series Data
Bostonseries
University Slideshow
Goes Here sensitive information
 Time
data Title
include




Physiological Medical Data
Signature Data
Trajectory Data
Financial Data
 Need to preserve privacy when using time series data
 Existing work on this topic limited (Ex. Differential Privacy )
 Focused mostly on aggregation/summarization queries.
 Here we consider Similarity Queries
 Exact Distance between two specific time series
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
Privacy Preserving Similarity Evaluation
Boston University
Slideshow Title Goes Here
 Two-Party
Computation
Communication
Input
PROTOCOL
Output
Input
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
Similarity Evaluation of Time Series Data
Boston University
Slideshow Title Goes Here
 Distance
Functions





Euclidean Distance
LCSS Distance
ERP Distance
Fréchet Distance
Dynamic Time Warping (DTW)
Y
7
11
7
4
5
4
3
5
7
4
2
3
3
4
6
5
3
2
4
2
3
4
2
1
2
2
4
7
2
1
3
6
8
12
17
3
4
5
4
6
7
 Dynamic Time Warping
 𝐷 𝑖, 𝑗 = 𝑑 𝑖, 𝑗 + min{𝐷 𝑖 − 1, 𝑗 , 𝐷 𝑖, 𝑗 − 1 , 𝐷(𝑖 − 1, 𝑗 − 1)}
 Where 𝑑 𝑖, 𝑗 = 𝑥𝑖 − 𝑦𝑗
X
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
Computational Model Settings
Boston University
Slideshow Title Goes Here
 Two-Party
Computation




Two Parties: Server and Client (Different computational powers)
Inputs: Each Party has its own Time Series Data
Output: One Party / Both Parties get the Distance
Each party learns only the distance from the protocol
 Semi-Honest Threat Model (also called Honest-But-Curious
Model)
 Each party executes the protocol correctly as specified (no deviations,
malicious or not)
 However, they may try to learn as much as possible about the input of the
other party from their views of the protocol
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
What we need to hide?
Boston
University Slideshow
Goes Here
 The
Original
TimeTitleSeries
 The Matrix
 Entries
 Optimal Path
Y
7
11
7
4
5
4
3
5
7
4
2
3
3
4
6
5
3
2
4
2
3
4
2
1
2
2
4
7
2
1
3
6
8
12
17
3
4
5
4
6
7
 Potential Risk during Computation
 Filling the Matrix
X
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
Protect the Matrix
Boston University
Slideshow
Goes Here Key
 Cipher
Matrix
andTitleSecret
Enc(7)
Enc(11)
Enc(7)
Enc(4)
Enc(5)
Enc(5)
Enc(7)
Enc(4)
Enc(2)
Enc(3)
Enc(6)
Enc(5)
Enc(3)
Enc(2)
Enc(4)
Enc(4)
Enc(2)
Enc(1)
Enc(2)
Enc(2)
Enc(2)
Enc(1)
Enc(3)
Enc(6)
Enc(8)
3
4
5
4
Owner of Cipher Matrix
Client
Owner of Secret Key
Server
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
Protection when Filling the Matrix 1/3
Boston University
Slideshow
Goes Here Key
 Cipher
Matrix
andTitleSecret
Enc(7)
Enc(5)
Enc(6)
Enc(4)
Enc(2)
3
4
5
Owner of Cipher Matrix
Client
4
Owner of Secret Key
Server
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
Protection when Filling the Matrix 2/3
Boston University
Slideshow
Goes Here Key
 Cipher
Matrix
andTitleSecret
Enc(7)
Enc(11)
Enc(5)
Enc(7)
Enc(6)
Enc(5)
Enc(4)
Enc(2)
Enc(2)
Enc(1)
Enc(3)
Enc(6)
Enc(8)
3
4
5
4
Owner of Cipher Matrix
Client
Owner of Secret Key
Server
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
Protection when Filling the Matrix 3/3
Boston University
Slideshow
Goes Here Key
 Cipher
Matrix
andTitleSecret
Enc(7)
Enc(11)
Enc(7)
Enc(4)
Enc(5)
Enc(5)
Enc(7)
Enc(4)
Enc(2)
Enc(3)
Enc(6)
Enc(5)
Enc(3)
Enc(2)
Enc(4)
Enc(4)
Enc(2)
Enc(1)
Enc(2)
Enc(2)
Enc(2)
Enc(1)
Enc(3)
Enc(6)
Enc(8)
3
4
5
4
Owner of Cipher Matrix
Client
Owner of Secret Key
Server
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
How to Fill One Entry of the Matrix
Boston University
Slideshow
Title Goes Here
 Dynamic
Time
Warping
 Secure Euclidean Distance Computation (Part 1)
 Privacy Preserving Comparison of Ciphertexts (Part 2)
Y2
Enc(B)
Enc( d(X2, Y2) + Min(A, B, C) )
Part1
Y1
Part2
Enc(A)
Enc(C)
X1
X2
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
Secure Euclidean Distance Computation
Boston
Title Goes Here
 Part
1University
of theSlideshow
Protocol
 PaillierEncryption (Partial Homomorphic Encryption)
 𝐸𝑛𝑐 𝑚1 + 𝑚2 𝑚𝑜𝑑 𝑛 = 𝐸𝑛𝑐 𝑚1 ∙ 𝐸𝑛𝑐 𝑚2 𝑚𝑜𝑑 𝑛2
 𝐸𝑛𝑐 𝑚1 ∙ 𝑘 𝑚𝑜𝑑 𝑛 = 𝐸𝑛𝑐 𝑚1 𝑘 𝑚𝑜𝑑 𝑛2
 Square of Euclidean Distance Computation
2
 𝐸𝑛𝑐 𝛿𝐸𝑢
(𝑝, 𝑞) = 𝐸𝑛𝑐(
𝑑
2
𝑖=1 𝑝𝑖 ) can be
𝐸𝑛𝑐( 𝑑𝑖=1 𝑞𝑖2 ) can be
𝐸𝑛𝑐(−2 𝑑𝑖=1 𝑝𝑖 𝑞𝑖 ) =
𝑑
2
𝑖=1 𝑝𝑖
+
𝑑
2
𝑖=1 𝑞𝑖
− 2
𝑑
𝑖=1 𝑝𝑖 𝑞𝑖 )
 𝐸𝑛𝑐(
computed by the owner of 𝑝𝑖 (The Server)

computed by the owner of 𝑞𝑖 (The Client)
𝑑
−2𝑞𝑖
𝑖 𝐸𝑛𝑐(𝑝𝑖 )
can be evaluated by the owner of 𝑞𝑖
(The Client) with information of 𝐸𝑛𝑐(𝑝𝑖 )
 the owner of 𝑞𝑖 (The Client) is the owner of the matrix who cannot decrypt
ciphertexts. The Client is also the evaluator of part 1 of the protocol

Privacy Preserving Similarity Evaluation of Time Series Data
Privacy Preserving Comparison
Boston
Title Goes Here
 Part
2University
of theSlideshow
Protocol
 We need a protocol to compute the minimum of the three values in
𝐷 𝑖 − 1, 𝑗 , 𝐷 𝑖, 𝑗 − 1 and 𝐷(𝑖 − 1, 𝑗 − 1)
 Inputs & Outputs:
 3 Ciphertexts inputs from the client:
 1 Ciphertext output to the client:
Enc(A), Enc(B), Enc(C)
Enc(min(A, B, C))
 Solution Using Random Offsets
 One round communication protocol
3/27/2014
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
Privacy Preserving Comparison cont.
Boston
UniversityGenerates
Slideshow Title GoesCandidates:
Here
 The
Client
 The Client generates k Random Values: r1, r2, r3, …., rk (r1 is the minimum).
 The Client creates k+2 candidates: Enc(A+r1), Enc(B+r1), Enc(C+r1), Enc(X2+r2),
Enc(X3+r3), …, Enc(Xk+rk), (Xi is one of A, B and C) by using homomorphic
addition of Paillier encryption.
 The Client permutes and sends the k+2 candidates to the Server.
 The Server Decrypts and Compares Candidates
 The server decrypts k+2 candidates with the secret key and finds the minimal
plaintext M among the k+2 plaintexts.
 The server re-encrypts the minimal plaintext M and returns Enc(M) to the client.
 The Client Compute the Result
 The client computes Enc(min(A, B, C)) = Enc(M - r1) by using homomorphic
addition of Paillier encryption as the result.
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
Full Protocol for Each Entry
Boston University
Slideshow Title
Goes Here
 Secure
Euclidean
Distance
Computation (Part 1)
 One-way (Half Round) Communication
 Privacy Preserving Comparison of Ciphertexts (Part 2)
 One Round Communication
Server
Encrypt 𝑌𝑗 and 𝑌𝑗2
1. Decrypt Candidates
2. Compare Plaintexts
3. Re-Encrypt
4. Return 𝐸𝑛𝑐(𝑀)
Client
Compute 𝐸𝑛𝑐(𝐷(𝑋𝑖 , 𝑌𝑗 ))
Create k+2
Candidates
Compute 𝐸𝑛𝑐 𝐷 𝑋𝑖 , 𝑌𝑗 +
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
Performance Analysis
Boston University Slideshow
Title Goes
 Communication
Cost
: Here
mn(d + k + 4)
 Computation Cost
 Server: mn(d + 2) encryptions and mn(k + 2) decryptions
 Client: mn(k+1) encryptions




m: length of time series data of server
n: length of time series data of client
d: dimensionality of single data point
k: size of random set
 For low dimensions, the server has more computational cost than the client,
because encryptions and decryptions cost more than other operations and
decryptions cost more than encryptions
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
Security Analysis
Boston
University
Slideshow nothing
Title Goes Here during the computation since the
 The
client
learns
client cannot decrypt any ciphertext.
 The server learns nothing during the first phase of the
protocol, however…. The server decrypts k+2 candidates.
 The server may create a permuted linear equation system. To solve this
system is NP-hard.
 The Client needs to select the range for random values close to the actual
values and needs to make sure that there is no wrap around.
 We show that our protocol can preserve at least half of the information
entropy of the plaintexts.
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
How to Fill One Entry of the Matrix
Boston University
Slideshow Title Goes Here
 Fréchet
Distance
 Three-Phase Protocol
 Two Comparison Protocols (Max and Min)
Y2
Enc(B)
Enc( Max ( d(X2, Y2) , Min(A,
B, C) ) )
Y1
Enc(A)
Enc(C)
X1
X2
Privacy Preserving Similarity Evaluation of Time Series Data
Experiments
Boston University Slideshow Title Goes Here
 Datasets:


Real world data
Synthetic data
 Distance Functions


Dynamic Time Warping
Discrete Fréchet Distance
 Parameters



n: length of time series data
d: dimensionality of single data point
k: size of random set
3/27/2014
Privacy Preserving Similarity Evaluation of Time Series Data
Experiments
Boston University Slideshow
Title Goes Here
 Performance
vs Sequence
Size (Different Phases)
3/27/2014
Privacy Preserving Similarity Evaluation of Time Series Data
Experiments
Boston University Slideshow
Title Goes Here
 Performance
vs Sequence
Size (Different Parties)
3/27/2014
Privacy Preserving Similarity Evaluation of Time Series Data
Experiments
Boston University Slideshow
Title Goes Here
 Performance
vs Dimensionality
(Different Phases)
3/27/2014
Privacy Preserving Similarity Evaluation of Time Series Data
3/27/2014
Experiments
Boston University Slideshow
Title Goes
 Performance
vs Size
ofHereRandom Set (Different Phases)
Thanks
Download