Semantics for Provenance Security Stephen Chong Harvard University Symposium on Provenance in Software Systems Monday March 30, 2009 Provenance security Some data are sensitive Must ensure provenance does not reveal sensitive data E.g., “John participated in medical study S” reveals “John has disease D” Semantics for Provenance Security, Stephen Chong, Harvard University. 2 Provenance security Some data are sensitive Must ensure provenance does not reveal sensitive data E.g., “John participated in medical study S” reveals “John has disease D” Some provenance is sensitive Must ensure output does not reveal sensitive provenance E.g., Journal referee reports should not contain name/email of referee Must ensure provenance does not reveal sensitive provenance E.g., If student in Disciplinary Hearing, then student’s advisor must attend. “Prof. Smith participated as an Advisor” may reveal “John participated as respondent” Semantics for Provenance Security, Stephen Chong, Harvard University. 2 Provenance security Some data are sensitive Must ensure provenance does not reveal sensitive data E.g., “John participated in medical study S” reveals “John has disease D” Some provenance is sensitive Must ensure output does not reveal sensitive provenance E.g., Journal referee reports should not contain name/email of referee Must ensure provenance does not reveal sensitive provenance E.g., If student in Disciplinary Hearing, then student’s advisor must attend. “Prof. Smith participated as an Advisor” may reveal “John participated as respondent” How do we know if we have security right? Complex interaction between information security and provenance Not well-understood Semantics for Provenance Security, Stephen Chong, Harvard University. 2 Semantics for provenance security Goal: precise, useful, intuitive definitions of provenance security understand provenance security principles and mechanisms to apply in practice This work: Formal definitions for provenance security public data does not reveal sensitive provenance public provenance does not reveal sensitive provenance public provenance does not reveal sensitive data (public data does not reveal sensitive data) Semantics for Provenance Security, Stephen Chong, Harvard University. 3 Language model Simple language-based model (based on Cheney, Acar, Ahmed [2008]) Program c has input locations, produces single output 〈l1=v1, …, ln=vn ; c〉! v E.g., 〈l1=3,l2=5, l3=7 ; x = l1; if (x) then l2 else l3〉 Semantics for Provenance Security, Stephen Chong, Harvard University. 5 4 Language model Simple language-based model (based on Cheney, Acar, Ahmed [2008]) Program c has input locations, produces single output 〈l1=v1, …, ln=vn ; c〉! v Provenance T describes execution 〈l1=v1, …, ln=vn ; c〉! v " T E.g., 〈l1=3,l2=5,l3=7 ; x = l1; if (x) then l2 else l3〉 5 " x=l1 ; cond(x,true,l2) Semantics for Provenance Security, Stephen Chong, Harvard University. 4 Language model Simple language-based model (based on Cheney, Acar, Ahmed [2008]) Program c has input locations, produces single output 〈l1=v1, …, ln=vn ; c〉! v Provenance T describes execution 〈l1=v1, …, ln=vn ; c〉! v " T Partial provenance: allow parts of T to be elided E.g., 〈l1=3,l2=5,l3=7 ; x = l1; if (x) then l2 else l3〉 5 " x=l1 ; cond(x,true,l cond(x,true,l22)) Semantics for Provenance Security, Stephen Chong, Harvard University. 4 Language model Simple language-based model (based on Cheney, Acar, Ahmed [2008]) Program c has input locations, produces single output 〈l1=v1, …, ln=vn ; c〉! v Provenance T describes execution 〈l1=v1, …, ln=vn ; c〉! v " T Partial provenance: allow parts of T to be elided E.g., 〈l1=3,l2=5,l3=7 ; x = l1; if (x) then l2 else l3〉 5 " x=l1 ; cond(x,true,#) cond(x,true,l2) Semantics for Provenance Security, Stephen Chong, Harvard University. 4 Language model Simple language-based model (based on Cheney, Acar, Ahmed [2008]) Program c has input locations, produces single output 〈l1=v1, …, ln=vn ; c〉! v Provenance T describes execution 〈l1=v1, …, ln=vn ; c〉! v " T Partial provenance: allow parts of T to be elided E.g., 〈l1=3,l2=5,l3=7 ; x = l1; if (x) then l2 else l3〉 5 " x=l1 ; cond(x,#,#) cond(x,true,l2) Semantics for Provenance Security, Stephen Chong, Harvard University. 4 Language model Simple language-based model (based on Cheney, Acar, Ahmed [2008]) Program c has input locations, produces single output 〈l1=v1, …, ln=vn ; c〉! v Provenance T describes execution 〈l1=v1, …, ln=vn ; c〉! v " T Partial provenance: allow parts of T to be elided E.g., 〈l1=3,l2=5,l3=7 ; x = l1; if (x) then l2 else l3〉 5 " x=l1 ; #cond(x,true,l2) Semantics for Provenance Security, Stephen Chong, Harvard University. 4 Information and traces T contains less information than T’ if T elides parts of T’ E.g. x= # contains less information than x=l1 cond(x, #, #) contains less information than cond(x, true, T) T;# contains less information than T;T’ # contains less information than any T ! # Theorem: If 〈l1=v1, …, ln=vn ; c〉 v " T’ and T contains less information than T’ then 〈l1=v1, …, ln=vn ; c〉 v " T Semantics for Provenance Security, Stephen Chong, Harvard University. 5 Security policies Each input location has security policy for data and provenance e.g., !(l1) = LL Semantics for Provenance Security, Stephen Chong, Harvard University. !(l2) = LH !(l3) = HH 6 Security policies Each input location has security policy for data and provenance e.g., !(l1) = LL !(l2) = LH Data security: H : High security (secret) L : Low security (public) Semantics for Provenance Security, Stephen Chong, Harvard University. !(l3) = HH Provenance security: H : High provenance (secret) L : Low provenance (public) 6 Security policies Each input location has security policy for data and provenance e.g., !(l1) = LL !(l2) = LH !(l3) = HH User knows low security inputs, and is given output and partial provenance trace User should not learn high security data User should not learn which high provenance locations involved in computation What (partial) provenance can we give to user? Semantics for Provenance Security, Stephen Chong, Harvard University. 6 First attempt We think T is secure for execution 〈l1=v1, …, ln=vn ; c〉 Semantics for Provenance Security, Stephen Chong, Harvard University. v if: 7 First attempt We think T is secure for execution 〈l1=v1, …, ln=vn ; c〉 〈l1=v1, …, ln=vn ; c〉 Semantics for Provenance Security, Stephen Chong, Harvard University. v " T v if: and 7 First attempt We think T is secure for execution 〈l1=v1, …, ln=vn ; c〉 〈l1=v1, …, ln=vn ; c〉 v " T v if: and T does not contain any high provenance locations. Semantics for Provenance Security, Stephen Chong, Harvard University. 7 First attempt We think T is secure for execution 〈l1=v1, …, ln=vn ; c〉 〈l1=v1, …, ln=vn ; c〉 v if: v " T and T does not contain any high provenance locations. E.g., 〈… ; if (l1) then l2 +l3 else l4 +l5〉 5 " cond(l1,true, l2+l3) !(l1) = HL !(l2) = HH !(l3) = HL !(l4) = HH !(l5) = HL Semantics for Provenance Security, Stephen Chong, Harvard University. 7 First attempt We think T is secure for execution 〈l1=v1, …, ln=vn ; c〉 〈l1=v1, …, ln=vn ; c〉 v if: v " T and T does not contain any high provenance locations. E.g., 〈… ; if (l1) then l2 +l3 else l4 +l5〉 5 " cond(l1,true, #+l l2+l33)) !(l1) = HL !(l2) = HH !(l3) = HL !(l4) = HH !(l5) = HL Semantics for Provenance Security, Stephen Chong, Harvard University. 7 Provenance security T satisfies provenance security for execution 〈l1=v1, …, ln=vn ; c〉 v if: Semantics for Provenance Security, Stephen Chong, Harvard University. 8 Provenance security T satisfies provenance security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 Semantics for Provenance Security, Stephen Chong, Harvard University. v " T and 8 Provenance security T satisfies provenance security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any high provenance li, there is an execution 〈l1=w1, …, ln=wn ; c〉 v such that if lj is low security then vj = wj 〈l1=w1, …, ln=wn ; c〉 v " T Semantics for Provenance Security, Stephen Chong, Harvard University. and and 8 Provenance security T satisfies provenance security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any high provenance li, there is an execution 〈l1=w1, …, ln=wn ; c〉 v such that if lj is low security then vj = wj and 〈l1=w1, …, ln=wn ; c〉 v " T and li involved in 〈l1=v1, …, ln=vn ; c〉 v iff li not involved in 〈l1=w1, …, ln=wn ; c〉 Semantics for Provenance Security, Stephen Chong, Harvard University. v 8 Provenance security T satisfies provenance security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any high provenance li, there is an execution 〈l1=w1, …, ln=wn ; c〉 v such that if lj is low security then vj = wj and 〈l1=w1, …, ln=wn ; c〉 v " T and li involved in 〈l1=v1, …, ln=vn ; c〉 v iff li not involved in 〈l1=w1, …, ln=wn ; c〉 v Looks the same Semantics for Provenance Security, Stephen Chong, Harvard University. 8 Provenance security T satisfies provenance security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any high provenance li, there is an execution 〈l1=w1, …, ln=wn ; c〉 v such that if lj is low security then vj = wj and 〈l1=w1, …, ln=wn ; c〉 v " T and li involved in 〈l1=v1, …, ln=vn ; c〉 v iff li not involved in 〈l1=w1, …, ln=wn ; c〉 Looks the same Semantics for Provenance Security, Stephen Chong, Harvard University. v but li not involved 8 Provenance security T satisfies provenance security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any high provenance li, there is an execution 〈l1=w1, …, ln=wn ; c〉 v such that if lj is low security then vj = wj and 〈l1=w1, …, ln=wn ; c〉 v " T and li involved in 〈l1=v1, …, ln=vn ; c〉 v iff li not involved in 〈l1=w1, …, ln=wn ; c〉 Semantics for Provenance Security, Stephen Chong, Harvard University. v 8 Provenance security T satisfies provenance security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any high provenance li, there is an execution 〈l1=w1, …, ln=wn ; c〉 v such that if lj is low security then vj = wj and 〈l1=w1, …, ln=wn ; c〉 v " T and li involved in 〈l1=v1, …, ln=vn ; c〉 v iff li not involved in 〈l1=w1, …, ln=wn ; c〉 v Neither output v nor provenance T reveal which high provenance input locations were used. Semantics for Provenance Security, Stephen Chong, Harvard University. 8 Provenance security T satisfies provenance security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any high provenance li, there is an execution 〈l1=w1, …, ln=wn ; c〉 v such that if lj is low security then vj = wj and 〈l1=w1, …, ln=wn ; c〉 v " T and li involved in 〈l1=v1, …, ln=vn ; c〉 v iff li not involved in 〈l1=w1, …, ln=wn ; c〉 v E.g., 〈… ; if (l1) then l2 +l3 else l4 +l5〉 5 " cond(l1,true, l2+l3) !(l1) = HL !(l2) = HH !(l3) = HL !(l4) = HH !(l5) = HL Semantics for Provenance Security, Stephen Chong, Harvard University. 9 Provenance security T satisfies provenance security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any high provenance li, there is an execution 〈l1=w1, …, ln=wn ; c〉 v such that if lj is low security then vj = wj and 〈l1=w1, …, ln=wn ; c〉 v " T and li involved in 〈l1=v1, …, ln=vn ; c〉 v iff li not involved in 〈l1=w1, …, ln=wn ; c〉 v E.g., 〈… ; if (l1) then l2 +l3 else l4 +l5〉 5 " cond(l cond(l11,true, ,true, ll22+l +l33)) !(l1) = HL !(l2) = HH !(l3) = HL !(l4) = HH !(l5) = HL Semantics for Provenance Security, Stephen Chong, Harvard University. 9 Provenance security T satisfies provenance security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any high provenance li, there is an execution 〈l1=w1, …, ln=wn ; c〉 v such that if lj is low security then vj = wj and 〈l1=w1, …, ln=wn ; c〉 v " T and li involved in 〈l1=v1, …, ln=vn ; c〉 v iff li not involved in 〈l1=w1, …, ln=wn ; c〉 v E.g., 〈… ; if (l1) then l2 +l3 else l4 +l5〉 5 " cond(l cond(l11,true, ,true, #+l l2+l33) ) !(l1) = HL !(l2) = HH !(l3) = HL !(l4) = HH !(l5) = HL Semantics for Provenance Security, Stephen Chong, Harvard University. 9 Provenance security T satisfies provenance security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any high provenance li, there is an execution 〈l1=w1, …, ln=wn ; c〉 v such that if lj is low security then vj = wj and 〈l1=w1, …, ln=wn ; c〉 v " T and li involved in 〈l1=v1, …, ln=vn ; c〉 v iff li not involved in 〈l1=w1, …, ln=wn ; c〉 v E.g., 〈… ; if (l1) then l2 +l3 else l4 +l5〉 5 " cond(l cond(l11,true, ,true, #) l2+l3) !(l1) = HL !(l2) = HH !(l3) = HL !(l4) = HH !(l5) = HL Semantics for Provenance Security, Stephen Chong, Harvard University. 9 Provenance security T satisfies provenance security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any high provenance li, there is an execution 〈l1=w1, …, ln=wn ; c〉 v such that if lj is low security then vj = wj and 〈l1=w1, …, ln=wn ; c〉 v " T and li involved in 〈l1=v1, …, ln=vn ; c〉 v iff li not involved in 〈l1=w1, …, ln=wn ; c〉 v E.g., 〈… ; if (l1) then l2 +l3 else l4 +l5〉 5 " cond(l cond(l11,,true, #, #) l2+l3) !(l1) = HL !(l2) = HH !(l3) = HL !(l4) = HH !(l5) = HL Semantics for Provenance Security, Stephen Chong, Harvard University. 9 Provenance security Public data does not reveal sensitive provenance ! Public provenance does not reveal sensitive ! provenance Public provenance does not reveal sensitive data (Public data does not reveal sensitive data) Semantics for Provenance Security, Stephen Chong, Harvard University. 10 Data security T satisfies data security for execution 〈l1=v1, …, ln=vn ; c〉 v if: Semantics for Provenance Security, Stephen Chong, Harvard University. 11 Data security T satisfies data security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 Semantics for Provenance Security, Stephen Chong, Harvard University. v " T and 11 Data security T satisfies data security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any execution〈l1=w1, …, ln=wn ; c〉 Semantics for Provenance Security, Stephen Chong, Harvard University. w 11 Data security T satisfies data security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any execution〈l1=w1, …, ln=wn ; c〉 if w vj = wj for all low security lj Semantics for Provenance Security, Stephen Chong, Harvard University. 11 Data security T satisfies data security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any execution〈l1=w1, …, ln=wn ; c〉 w if vj = wj for all low security lj then 〈l1=w1, …, ln=wn ; c〉 w " T Semantics for Provenance Security, Stephen Chong, Harvard University. 11 Data security T satisfies data security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any execution〈l1=w1, …, ln=wn ; c〉 w if vj = wj for all low security lj then 〈l1=w1, …, ln=wn ; c〉 w " T If inputs look the same Semantics for Provenance Security, Stephen Chong, Harvard University. 11 Data security T satisfies data security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any execution〈l1=w1, …, ln=wn ; c〉 w if vj = wj for all low security lj then 〈l1=w1, …, ln=wn ; c〉 w " T If inputs look the same Semantics for Provenance Security, Stephen Chong, Harvard University. then T describes execution 11 Data security T satisfies data security for execution 〈l1=v1, …, ln=vn ; c〉 v if: 〈l1=v1, …, ln=vn ; c〉 v " T and for any execution〈l1=w1, …, ln=wn ; c〉 w if vj = wj for all low security lj then 〈l1=w1, …, ln=wn ; c〉 w " T Provenance T does not reveal which high security data used. Semantics for Provenance Security, Stephen Chong, Harvard University. 11 Enforcing security Given〈l1=v1, …, ln=vn ; c〉 v is there always a maximum trace that satisfies security? Semantics for Provenance Security, Stephen Chong, Harvard University. 12 Enforcing security Given〈l1=v1, …, ln=vn ; c〉 v is there always a maximum trace that satisfies security? In general, no Semantics for Provenance Security, Stephen Chong, Harvard University. 12 Enforcing security Given〈l1=v1, …, ln=vn ; c〉 v is there always a maximum trace that satisfies security? In general, no Is there always some trace that satisfies security? Semantics for Provenance Security, Stephen Chong, Harvard University. 12 Enforcing security Given〈l1=v1, …, ln=vn ; c〉 v is there always a maximum trace that satisfies security? In general, no Is there always some trace that satisfies security? Yes for data security; in general, no for provenance security Semantics for Provenance Security, Stephen Chong, Harvard University. 12 Enforcing security Given〈l1=v1, …, ln=vn ; c〉 v is there always a maximum trace that satisfies security? In general, no Is there always some trace that satisfies security? Yes for data security; in general, no for provenance security If public inputs determine inclusion of high provenance li then not even # will satisfy provenance security E.g., !(l1) = LL !(l2) = LH !(l3) = HL x = l2 " " if (l1) then l2 else 0 if (l3) then l2 else 0 ! Semantics for Provenance Security, Stephen Chong, Harvard University. 12 Eliding a trace Assume public inputs do not determine inclusion of high provenance location any high provenance location is nested in conditional with highconfidentiality test Semantics for Provenance Security, Stephen Chong, Harvard University. 13 Eliding a trace Assume public inputs do not determine inclusion of high provenance location any high provenance location is nested in conditional with highconfidentiality test Assume output does not reveal high confidentiality information. I.e., no declassification Semantics for Provenance Security, Stephen Chong, Harvard University. 13 Eliding a trace Assume public inputs do not determine inclusion of high provenance location any high provenance location is nested in conditional with highconfidentiality test Assume output does not reveal high confidentiality information. I.e., no declassification Given execution 〈l1=v1, …, ln=vn ; c〉 v and full trace T, ELIDE(T) satisfies provenance and data security, where ELIDE(x=e) ELIDE(T0; T1) ELIDE(cond(l, v, T)) ELIDE(cond(l, v, T)) = = = = Semantics for Provenance Security, Stephen Chong, Harvard University. x=e ELIDE(T0); ELIDE(T1) cond(l, v, ELIDE(T)) if l is low confidentiality cond(l, #, #) if l is high confidentiality 13 Conclusion Need to understand provenance security, and interactions with data security This talk: Formal definitions for provenance security public data does not reveal sensitive provenance public provenance does not reveal sensitive provenance public provenance does not reveal sensitive data Practical implications: Principles for determining access control for provenance Security policies for data and provenance must be consistent! Semantics for Provenance Security, Stephen Chong, Harvard University. 14