Scalable Trusted Computing Engineering challenge, or something more fundamental? Ken Birman Cornell University Cornell Quicksilver Project Krzys Ostrowski: The key player Ken Birman, Danny Dolev: Collaborators and research supervisors Mahesh Balakrishnan, Maya Haridasan, Tudor Marian, Amar Phanishayee, Robbert van Renesse, Einar Vollset, Hakim Weatherspoon: Offered valuable comments and criticisms Trusted Computing A vague term with many meanings… For individual platforms, integrity of the computing base Availability and exploitation of TPM h/w Proofs of correctness for key components Security policy specification, enforcement Scalable trust issues arise mostly in distributed settings System model A world of Actors: Sally, Ted, … Groups: Sally_Advisors = {Ted, Alice, …} Objects: travel_plans.html, investments.xls Actions: Open, Edit, … Policies: (Actor,Object,Action) { Permit, Deny } Places: Ted_Desktop, Sally_Phone, …. Rules If Emp.place Secure_Place and Emp Client_Advisors then Allow Open Client_Investments.xls Can Ted, working at Ted_Desktop, open Sally_Investments.xls? … yes, if Ted_Desktop Secure_Places Miscellaneous stuff Policy changes all the time Like a database receiving updates E.g. as new actors are added, old ones leave the system, etc … and they have a temporal scope Starting at time t=19 and continuing until now, Ted is permitted to access Sally’s file investments.xls Order dependent decisions Consider rules such as: Only one person can use the cluster at a time. The meeting room is limited to three people While people lacking clearance are present, no classified information can be exposed These are sensitive to the order in which conflicting events occur Central “clearinghouse” decides what to allow based on order in which it sees events Goal: Enforce policy Read (data) Policy Database investments.xls … reduction to a proof Each time an action is attempted, system must develop a proof either that the action should be blocked or allowed For example, might use the BAN logic For the sake of argument, let’s assume we know how to do all this on a single machine Implications of scale We’ll be forced to replicate and decentralize the policy enforcement function For ownership: Allows “local policy” to be stored close to the entity that “owns” it For performance and scalability For fault-tolerance Decentralized policy enforcement Read (data) Original Scheme Policy Database investments.xls Decentralized policy enforcement Read (data) New Scheme Policy DB 1 investments.xls Policy DB 2 So… how do we decentralize? Consistency: the bane of decentralization We want a system to behave as if all decisions occur in a single “rules” database Yet want the decisions to actually occur in a decentralized way… a replicated policy database System needs to handle concurrent events in a consistent manner So… how do we decentralize? More formally: Any run of the decentralized system should be indistinguishable from some run of a centralized system Analogy: database 1-copy serializability But this is a familiar problem! Database researchers know it as the atomic commit problem. Distributed systems people call it: State machine replication Virtual synchrony Paxos-style replication … and because of this we know a lot about the question! … replicated data with abcast Closely related to the “atomic broadcast” problem within a group Abcast sends a message to all the members of a group Protocol guarantees order, fault-tolerance Solves consensus… Indeed, a dynamic policy repository would need abcast if we wanted to parallelize it for speed or replicate it for fault-tolerance! A slight digression Consensus is a classical problem in distributed systems N processes They start execution with inputs {0,1} Asynchronous, reliable network At most 1 process fails by halting (crash) Goal: protocol whereby all “decide” same value v, and v was an input Distributed Consensus Jenkins, if I want another yes-man, I’ll build one! Lee Lorenz, Brent Sheppard Asynchronous networks No common clocks or shared notion of time (local ideas of time are fine, but different processes may have very different “clocks”) No way to know how long a message will take to get from A to B Messages are never lost in the network Fault-tolerant protocol Collect votes from all N processes At most one is faulty, so if one doesn’t respond, count that vote as 0 Compute majority Tell everyone the outcome They “decide” (they accept outcome) … but this has a problem! Why? What makes consensus hard? Fundamentally, the issue revolves around membership In an asynchronous environment, we can’t detect failures reliably A faulty process stops sending messages but a “slow” message might confuse us Yet when the vote is nearly a tie, this confusing situation really matters Some bad news FLP result shows that fault-tolerant consensus protocols always have nonterminating runs. All of the mechanisms we discussed are equivalent to consensus Impossibility of non-blocking commit is a similar result from database community But how bad is this news? In practice, these impossibility results don’t hold up so well Both define “impossible not always possible” In fact, with probabilities, the FLP scenario is of probability zero … must ask: Does a probability zero result even hold in a “real system”? Indeed, people build consensus-based systems all the time… Solving consensus Systems that “solve” consensus often use a membership service This GMS functions as an oracle, a trusted status reporting function Then consensus protocol involves a kind of 2phase protocol that runs over the output of the GMS It is known precisely when such a solution will be able to make progress More bad news Consensus protocols don’t scale! Isis (virtual synchrony) new view protocol Paxos decree protocol Selects a leader; normally 2-phase; 3 if leader dies Each phase is a 1-n multicast followed by an n-1 convergecast (can tolerate n/2-1 failures) Basic protocol has no leader and could have rollbacks with probability linear in n Faster-Paxos is isomorphic to the Isis view protocol (!) … both are linear in group size. Regular Paxos might be O(n2) because of rollbacks Work-arounds? Only run the consensus protocol in the “group membership service” or GMS It has a small number of members, like 3-5 They run a protocol like the Isis one Track membership (and other “global” state on behalf of everything in the system as a whole” Scalability of consensus won’t matter But this is centralized Recall our earlier discussion Any central service running on behalf of the whole system will become burdened if the system gets big enough Can we decentralize our GMS service? GMS in a large system Global events are inputs to the GMS GMS Output is the official record of events that mattered to the system Hierarchical, federated GMS Quicksilver V2 (QS2) constructs a hierarchy of GMS state machines In this approach, each “event” is associated with some GMS that owns the relevant official record GMS0 GMS1 GMS2 Delegation of roles One (important) use of the GMS is to track membership in our rule enforcement subsystem But “delegate” responsibility for classes of actions to subsystems that can own and handle them locally GMS “reports” the delegation events In effect, it tells nodes in the system about the system configuration – about their roles And as conditions change, it reports new events Delegation In my capacity as President of the United States, I authorize John Pigg to oversee this nation’s banks Thank you, sir! You can trust me Delegation GMS0 GMS1 Policy subsystem Delegation example IBM might delegate the handling of access to its Kingston facility to the security scanners at the doors Events associated with Kingston access don’t need to pass through the GMS Instead, they “exist” entirely within the group of security scanners … giving rise to pub/sub groups Our vision spawns lots and lots of groups that own various aspects of trust enforcement The scanners at the doors The security subsystems on our desktops The key management system for a VPN … etc A nice match with publish-subscribe Publish-subscribe in a nutshell Publish(“topic”, message) Subscribe(“topic”, handler) Basic idea: Platform invokes handler(message) each time a topic match arises Fancier versions also support history mechanisms (lets joining process catch up) Publish-subscribe in a nutshell Concept first mentioned by Willy Zwaenepoel in a paper on multicast in the V system First implementation was Frank Schmuck’s Isis “news” tool Later re-invented in TIB message bus Also known as “event notification”… very popular Other kinds of published events Changes in the user set Or the group set For example, IBM hired Sally. Jeff left his job at CIA. Halliburton snapped him up Jeff will be handling the Iraq account Or the rules Jeff will have access to the secret archives Sally is no longer allowed to access them But this raises problems If “actors” only have partial knowledge E.g. the Cornell library door access system only knows things normally needed by that door … then we will need to support out-ofband interrogation of remote policy databases in some cases A Scalable Trust Architecture GMS hierarchy tracks configuration events GMS GMS GMS Pub/sub framework Role delegation Master enterprise policy DB Central database tracks overall policy Slave system applies policy Knowledge limited to locally useful policy Enterprise policy system for some company or entity A Scalable Trust Architecture Enterprises talk to one-another when decisions require non-local information PeopleSoft Inquiry FBI (policy) Cornell University www.zombiesattackithaca.com Open questions? Minimal trust A problem reminiscent of zero-knowledge Example: FBI is investigating reports of zombies in Cornell’s Mann Library… Mulder is assigned to the case. The Cornell Mann Library must verify that he is authorized to study the situation But does FBI need to reveal to Cornell that the Cigarette Man actually runs the show? Other research questions Pub-sub systems are organized around topics, to which applications subscribe But in a large-scale security policy system, how would one structure these topics? Topics are like file names – “paths” But we still would need an agreed upon layout Practical research question “State transfer” is the problem of initializing a database or service when it joins the system after an outage How would we implement a rapid and secure state transfer, so that a joining security policy enforcement module can quickly come up to date? Once it’s online, the pub-sub system reports updates on topics that matter to it Practical research question Designing secure protocols for interenterprise queries This could draw on the secured Internet transaction architecture A hierarchy of credential databases Used to authenticate enterprises to oneanother so that they can share keys They employ the keys to secure “queries” Recap? We’ve suggested that scalable trust comes down to “emulation” of a trusted single-node rule enforcement service by a distributed service And that service needs to deal with dynamics such as changing actor set, object set, rule set, group membership Recap? Concerns that any single node Would be politically unworkable Would impose a maximum capacity limit Won’t be fault-tolerant … pushed for a decentralized alternative Needed to make a decentralized service emulate a centralized one Recap? This led us to recognize that our problem is an instance of an older problem: replication of a state machine or an abstract data type The problem reduces to consensus… and hence is impossible … but we chose to accept “Mission Impossible: V” … Impossible? Who cares! We decided that the impossibility results were irrelevant to real systems Federation addressed by building a hierarchy of GMS services Each supported by a group of servers Each GMS owns a category of global events Now can create pub/sub topics for the various forms of information used in our decentralized policy database … enabling decentralized policy enforcement QS2: A work in progress We’re building Quicksilver, V2 (aka QS2) Under development by Krzys Ostrowski at Cornell, with help from Ken Birman, Danny Dolev (HUJL) Some parts already exist and can be downloaded now: Quicksilver Scalable Multicast (QSM). Focus is on reliable and scalable message delivery even with huge numbers of groups or severe stress on the system Quicksilver Architecture Our solution: Assumes low latencies, IP multicast A layered platform, native hosting on .NET Applications (any language) Quicksilver pub-sub API GMS our platform Strongly-typed .NET group endpoints Properties Framework endows groups with stronger properties Quicksilver Scalable Multicast (C# / .NET) Quicksilver: Major ideas Maps overlapping groups down to “regions” Multicast is doing by IP multicast, per-region Engineering challenge: application may belong to thousands of groups; efficiency of mapping is key Discovers failures using circulating tokens Local repair avoids overloading sender Eventually will support strong reliability model too Novel rate limited sending scheme Groups A1..A100 Members GroupsofCa region have 1..C 100 A “similar” group membership Signed up A to 100 groups AC C ABC sending messages in multiple groups AB in 300 that aggregate QSM runs protocols over regions,groups improving scalability B In traditional group multicast Groups B1..B100 systems, groups run independently Protocol 1 B Hierarchical aggregation used for Protocol 2 inter-region groups that span multiple regions protocol intra-region protocol Node Protocol 3 Region BC C Throughput (1 group, 1000-byte messages) Reaction to a 10-second freeze (110 nodes, rate 7500/s) 10000 QSM (1 sender) number of packets throughput (packets/s) 8000 6000 QSM’s network bandwidth utilization 1 sender - 80% 2 senders - 90% 4000 node catching up, sender resumes sender detects delays, suspends sending QSM (2 senders) node resumes all nodes back in sync one node freezes sent received (undisturbed) received (disturbed) completed 2000 JGroups 0 0 10 20 30 40 50 60 70 80 880 90 100 110 890 900 Throughput (1 sender, 1000-byte messages) 9000 100 4000 3000 JGroups throughput (8 nodes) 1000 50 crashed with 512 groups 0 0 1000 2000 3000 4000 number of groups 5000 0 6000 fraction of time smaller than this 150 QSM memory usage memory usage (megabytes) throughput (packets/s) 7000 2000 940 950 1 QSM throughput (110 nodes) 5000 930 QSM latency (110 nodes) – Cumulative Distribution 200 6000 920 time (s) number of nodes 8000 910 0.9 send to receive send to ack 0.8 0.7 0.6 0.5 0.4 average ≈ 19ms median ≈ 17ms maximum ≈ 340ms average ≈ 2.6s median ≈ 2.3s maximum ≈ 6.9s 0.3 0.2 0.1 0 0.0001 0.001 0.01 0.1 latency (s) 1 10 Connections to type theory We’re developing a new high-level language for endowing groups with “types” Such as security or reliability properties Internally, QS2 will “compile” from this language down to protocols that amortize costs across groups Externally, we are integrating QS2 types with types in the operating system / runtime environment (right now, Windows .net) Many challenging research topics in this area! http://www.cs.cornell.edu/projects/quicksilver/ Open questions? Not all policy databases are amenable to a decentralized enforcement Must have “enough” information at the point of enforcement to construct proofs Is this problem tractable? Complexity? More research is needed on the question of federation of policy databases with “minimal disclosure” Open questions? We lack a constructive logic of distributed, fault-tolerant systems Part of the issue is exemplified by the FLP problem: logic has yet to deal with the pragmatics of real-world systems Part of the problem resides in type theory: we lack true “distributed” type mechanisms