Real Security for Real Provenance is Really Hard Dr. Adriane Chapman For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. Data Security 101 ■ Users are given roles (RBAC) or satisfy a set of privilege predicates (ABAC) ■ Access to the data is specified by role/privilege predicates ■ Access is administered by data owners ■ If data cannot be released, some systems allow surrogates to stand in for the unreleaseable information – e.g. a Public version of a Classified document ■ Cryptographic techniques (secure hash functions, etc) maintain data integrity For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. Data Security 101 ■ Users are given roles (RBAC) or satisfy a set of privilege predicates (ABAC) ■ Access to the data is specified by role/privilege predicates ■ Access is administered by data owners ■ If data cannot be released, some systems allow surrogates to stand in for the unreleaseable information – e.g. a Public version of a Classified document ■ To heighten protection, data can be encrypted to ensure that it is not tampered with Scalable Access Controls for Lineage Arnon Rosenthal, Len Seligman, Adriane Chapman and Barbara Blaustein, Theory and Practice of Provenance 2009. For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. Who are the users? CDC Historical Disease Data Invoker: Analyst Smith Author: Prof. Jones Lineage Query Result : Epidemic Projector, v3 Animal Tests Hospital Admissions Data EPO Epidemic Forecast Bio-Threat Intelligence Author: Agent 009 For Public Release 09-1947 EPO Epidemic Warning Reports The Public TrakTek, Inc. Disease Spread Monitor Pharmacy Prescription Data © 2009 The MITRE Corporation. All rights reserved. Who are the users? CDC Historical Disease Data Invoker: Analyst Smith Author: Prof. Jones Lineage Query Result : Epidemic Projector, v3 Animal Tests Hospital Admissions Data EPO Epidemic Forecast Bio-Threat Intelligence Author: Agent 009 For Public Release 09-1947 EPO Epidemic Warning Reports The Hospital TrakTek, Inc. Disease Spread Monitor Pharmacy Prescription Data © 2009 The MITRE Corporation. All rights reserved. Who are the users? CDC Historical Disease Data Invoker: Analyst Smith Author: Prof. Jones Lineage Query Result : Epidemic Projector, v3 Animal Tests Hospital Admissions Data EPO Epidemic Forecast Bio-Threat Intelligence Author: Agent 009 For Public Release 09-1947 EPO Epidemic Warning Reports Congress TrakTek, Inc. Disease Spread Monitor Pharmacy Prescription Data © 2009 The MITRE Corporation. All rights reserved. Data Security 101 ■ Users are given roles (RBAC) or satisfy a set of privilege predicates (ABAC) ■ Access to the data is specified by role/privilege predicates ■ Access is administered by data owners ■ If data cannot be released, some systems allow surrogates to stand in for the unreleaseable information – e.g. a Public version of a Classified document ■ Cryptographic techniques (secure hash functions, etc) maintain data integrity Scalable Access Controls for Lineage Arnon Rosenthal, Len Seligman, Adriane Chapman and Barbara Blaustein, Theory and Practice of Provenance 2009. For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. How do you specify access? RBAC ABAC ■ RBAC model the world with roles ■ Predicates on attributes are used to describe access – Form aggregates of users (groups) and privileges (roles) – Admins authorize groups to use roles ■ Not expressive enough – Only user (group) is tested – Allowing hospitals access to more information when threats are high is not allowed can _ See _ Movie(u, r, e) ( Age(u ) 21 Rating ( r ) {R, PG13, G}) ( 21 Age(u ) 13 Rating ( r ) {PG13, G )} ( Age(u ) 13 Rating ( r ) {G}) ■ Instead of explicitly assigning users, decide based on U, R, E. ■ Multi-factor policies – Every policy will create an explosion in the number of roles, e.g., ■ Group1: Director of Surgery at Hospital 123 where status=“emergency” ■ Group 2 : Director of Surgery at Hospital 123 where status=“normal” For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. Even Classic ABAC doesn’t cut it Animal_Testing_Access(user, resource, EPO Epidemic Hospital ≔ environment) AdmissionsIntelligence Warning [User.Division= Reports Data User.AssignedProject.Type=Epidemiology Invoker: Request.SourceDomain is in {.gov, .mil} Analyst Author: SmithExperiment.ReleaseMarking = Intel Prof. (ExperSubject.Type = inanimate Jones ExperSubject.Type = animal Inc. TrakTek, EPO Epidemic experimenterName.pseudonym=true Disease Congress just Epidemic Projector, ExperSubject.Type = human Spread passed a new Forecast v3 Monitor releaseOnFile(ExperSubject) Disclosure Act. [Request.HasApproval.Level ≥4 Author: What parts need (Request.HasApproval.Level ≥2 Agent to change to 009 = Red)] threat.Status Pharmacy update this Bio-Threat Prescription Animal Tests [… concern? CDC Historical Disease Data Intelligence For Public Release 09-1947 Data © 2009 The MITRE Corporation. All rights reserved. The Solution – Extend ABAC ■ Stakeholder Concerns traceable and editable – HIPAA wants to protect patient privacy, how does the role “doctor” protect patient privacy? ■ Named Concerns – Link directly to access predicates that embody these concerns For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. Data Security 101 ■ Users are given roles (RBAC) or satisfy a set of privilege predicates (ABAC) ■ Access to the data is specified by role/privilege predicates ■ Access is administered by data owners ■ If data cannot be released, some systems allow surrogates to stand in for the unreleaseable information – e.g. a Public version of a Classified document ■ Cryptographic techniques (secure hash functions, etc) maintain data integrity Scalable Access Controls for Lineage Arnon Rosenthal, Len Seligman, Adriane Chapman and Barbara Blaustein, Theory and Practice of Provenance 2009. For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. Who decides what to secure? Tell everyone my CDC Historical Disease Data Invoker: Analyst Smith code was used! Tell Author: Prof. Jones Epidemic Projector, v3 Animal Tests EPO Epidemic Hospital no one I ran Warning Admissions Reports Data this code. EPO Epidemic Forecast Bio-Threat Intelligence For Public Release 09-1947 Author: Agent 009 TrakTek, Inc. Disease Spread Monitor Pharmacy Prescription Data © 2009 The MITRE Corporation. All rights reserved. The Solution ■ Stakeholders can have differing opinions! – Represent these explicitly – Represent their combo. – Let them be edited independently ■ Make Administration Manageable! ■ Sharing the Power – Unacceptable approaches: ■ A single administrator, or a global conflict-resolution rule ■ A totally separate formalism for conflict resolution – Share power by attribute ownership, derivation ■ Combine as derived attribute; delegate right to define derivation rule (See paper) For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. Our Framework Tell Tell everyone my no one I ran Who says this code. how this should code was used! Author: Prof. Jones Invoker: Analyst Smith be combined? Combiner: VETO Epidemic Projector, v3 Stakeholder: Prof Jones can _ See _ Node(u, r, e) (TRUE ) Stakeholder: Analyst Smith can _ See _ Node(u, r, e) (u.hasSecretClearance r.threatStatus Re d ) 14 For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. Data Security 101 ■ Users are given roles (RBAC) or satisfy a set of privilege predicates (ABAC) ■ Access to the data is specified by role/privilege predicates ■ Access is administered by data owners ■ If data cannot be released, some systems allow surrogates to stand in for the unreleaseable information – e.g. a Public version of a Classified document ■ Cryptographic techniques (secure hash functions, etc) maintain data integrity Surrogate Parenthood: Protected and Informative Lineage Graphs Barbara Blaustein, Adriane Chapman, Arnon Rosenthal, Len Seligman, M. David Allen, Michael Morse, In Preparation. For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. Provenance Surrogates ■ Replace nodes with less sensitive information ■ Obscure edge information The Public CDC Historical Disease Data Hospital Admissions Data EPO Epidemic Warning Reports Invoker Analyst Smith Author Prof. Jones Epidemic Projector, v3 EPO Epidemic Forecast Author: Agent 009 Laboratory Animal Tests Results Bio-Threat Intelligence For Public Release 09-1947 TrakTek, Inc. Disease Spread Monitor Pharmacy Prescription Data © 2009 The MITRE Corporation. All rights reserved. But not straightforward for Provenance – Inference Threats ■ Inferring edges from the rather strong clues, such as – Parameter labels (role labels) – Results of non-graph queries ■ Inferring node information via edges from other nodes – e.g., ResultSize(N3) may reveal ResultSize(N1) – TimeReceived may reveal TimeProcessed at predecessor Policy specifies which surrogates are releasable, i.e., what threats are “acceptable” (see Who owns it point). For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. Data Security 101 ■ Users are given roles (RBAC) or satisfy a set of privilege predicates (ABAC) ■ Access to the data is specified by role/privilege predicates ■ Access is administered by data owners ■ If data cannot be released, some systems allow surrogates to stand in for the unreleaseable information – e.g. a Public version of a Classified document ■ Cryptographic techniques (secure hash functions, etc) maintain data integrity Do you know where your data’s been? Fine-Grained Tamper-Evident Data Provenance Jing Zhang, Adriane Chapman and Kristen LeFevre, In Submission.. For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. What does provenance “integrity” mean? ■ Data “integrity” – encryption (e.g. SHA-1, MD5, etc) – w.r.t. query answers – provide enough extra information to prove that query results are correct (usually Merkle Hash Trees) ■ Provenance “integrity” – Allow users of the data to verify that the provenance has not been tampered with – AND that it accurately represents the state of the data For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. Why is this difficult? Endocrine Activity White Blood Cell Count Collected by Good Stewards Labs Patient Ages And Weights Collected by PCP Paul Collected by Perfect Saints Clinic Drug Efficacy Report Dataset 1 Interim Dataset TrustUsRx Aggregator Dataset 2 Pamela Updated 1 patient record Dataset 3 ■ Objects are compound – Patient records contain several attributes which were obtained via different methods and have different provenance ■ Non-linear sequence of information – Provenance is a DAG not a chain For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. Solution Sketch ■ A participant may alter data via insert, delete, update and aggregate ■ A provenance record consists of a sequenceID, participant, and the input/output values of the object ■ Developed an extended signature scheme – Create a checksum that verifies the integrity of provenance and data For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. Conclusions ■ Provenance is a DAG and a node. – There are unique security inference problems! ■ Who gets to control what is released is not straightforward ■ Using standard access control methods doesn’t work ■ Provenance integrity is necessary to assure the veracity of the information For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved. Acknowledgements MITRE University of Michigan ■ Arnon Rosenthal ■ Barbara Blaustein ■ Len Seligman ■ David Allen ■ Michael Morse ■ Kristen LeFevre ■ H.V. Jagadish ■ Jing Zhang For Public Release 09-1947 © 2009 The MITRE Corporation. All rights reserved.