Scalable Trust: Engineering Challenge or Complexity Barrier?

advertisement
Scalable Trusted Computing
Engineering challenge, or
something more fundamental?
Ken Birman
Cornell University
Cornell Quicksilver Project



Krzys Ostrowski: The key player
Ken Birman, Danny Dolev: Collaborators
and research supervisors
Mahesh Balakrishnan, Maya Haridasan,
Tudor Marian, Amar Phanishayee,
Robbert van Renesse, Einar Vollset,
Hakim Weatherspoon: Offered valuable
comments and criticisms
Trusted Computing

A vague term with many meanings…





For individual platforms, integrity of the
computing base
Availability and exploitation of TPM h/w
Proofs of correctness for key components
Security policy specification, enforcement
Scalable trust issues arise mostly in
distributed settings
System model

A world of





Actors: Sally, Ted, …
Groups: Sally_Advisors = {Ted, Alice, …}
Objects: travel_plans.html, investments.xls
Actions: Open, Edit, …
Policies:


(Actor,Object,Action)  { Permit, Deny }
Places: Ted_Desktop, Sally_Phone, ….
Rules


If Emp.place  Secure_Place and
Emp  Client_Advisors then
Allow Open Client_Investments.xls
Can Ted, working at Ted_Desktop, open
Sally_Investments.xls?

… yes, if Ted_Desktop  Secure_Places
Miscellaneous stuff

Policy changes all the time



Like a database receiving updates
E.g. as new actors are added, old ones
leave the system, etc
… and they have a temporal scope

Starting at time t=19 and continuing until
now, Ted is permitted to access Sally’s file
investments.xls
Order dependent decisions

Consider rules such as:




Only one person can use the cluster at a time.
The meeting room is limited to three people
While people lacking clearance are present, no
classified information can be exposed
These are sensitive to the order in which
conflicting events occur

Central “clearinghouse” decides what to allow
based on order in which it sees events
Goal: Enforce policy
Read
(data)
Policy Database
investments.xls
… reduction to a proof


Each time an action is attempted,
system must develop a proof either that
the action should be blocked or allowed
For example, might use the BAN logic
For the sake of argument, let’s assume
we know how to do all this on a single
machine
Implications of scale

We’ll be forced to replicate and
decentralize the policy enforcement
function



For ownership: Allows “local policy” to be
stored close to the entity that “owns” it
For performance and scalability
For fault-tolerance
Decentralized policy enforcement
Read
(data)
Original
Scheme
Policy Database
investments.xls
Decentralized policy enforcement
Read
(data)
New
Scheme
Policy DB 1
investments.xls
Policy DB 2
So… how do we decentralize?

Consistency: the bane of decentralization



We want a system to behave as if all
decisions occur in a single “rules” database
Yet want the decisions to actually occur in a
decentralized way… a replicated policy
database
System needs to handle concurrent events
in a consistent manner
So… how do we decentralize?

More formally:
Any run of the decentralized system
should be indistinguishable from some
run of a centralized system

Analogy: database 1-copy serializability
But this is a familiar problem!


Database researchers know it as the
atomic commit problem.
Distributed systems people call it:




State machine replication
Virtual synchrony
Paxos-style replication
… and because of this we know a lot
about the question!
… replicated data with abcast

Closely related to the “atomic broadcast”
problem within a group




Abcast sends a message to all the members of a
group
Protocol guarantees order, fault-tolerance
Solves consensus…
Indeed, a dynamic policy repository would
need abcast if we wanted to parallelize it for
speed or replicate it for fault-tolerance!
A slight digression

Consensus is a classical problem in
distributed systems





N processes
They start execution with inputs {0,1}
Asynchronous, reliable network
At most 1 process fails by halting (crash)
Goal: protocol whereby all “decide” same
value v, and v was an input
Distributed Consensus
Jenkins, if I want another yes-man, I’ll build one!
Lee Lorenz, Brent Sheppard
Asynchronous networks



No common clocks or shared notion of
time (local ideas of time are fine, but
different processes may have very
different “clocks”)
No way to know how long a message
will take to get from A to B
Messages are never lost in the network
Fault-tolerant protocol

Collect votes from all N processes





At most one is faulty, so if one doesn’t
respond, count that vote as 0
Compute majority
Tell everyone the outcome
They “decide” (they accept outcome)
… but this has a problem! Why?
What makes consensus hard?

Fundamentally, the issue revolves
around membership



In an asynchronous environment, we can’t
detect failures reliably
A faulty process stops sending messages
but a “slow” message might confuse us
Yet when the vote is nearly a tie, this
confusing situation really matters
Some bad news



FLP result shows that fault-tolerant
consensus protocols always have nonterminating runs.
All of the mechanisms we discussed are
equivalent to consensus
Impossibility of non-blocking commit is
a similar result from database
community
But how bad is this news?

In practice, these impossibility results don’t
hold up so well




Both define “impossible  not always possible”
In fact, with probabilities, the FLP scenario is of
probability zero
… must ask: Does a probability zero result even
hold in a “real system”?
Indeed, people build consensus-based
systems all the time…
Solving consensus

Systems that “solve” consensus often use a
membership service



This GMS functions as an oracle, a trusted status
reporting function
Then consensus protocol involves a kind of 2phase protocol that runs over the output of
the GMS
It is known precisely when such a solution
will be able to make progress
More bad news

Consensus protocols don’t scale!

Isis (virtual synchrony) new view protocol



Paxos decree protocol



Selects a leader; normally 2-phase; 3 if leader dies
Each phase is a 1-n multicast followed by an n-1
convergecast (can tolerate n/2-1 failures)
Basic protocol has no leader and could have rollbacks
with probability linear in n
Faster-Paxos is isomorphic to the Isis view protocol (!)
… both are linear in group size.

Regular Paxos might be O(n2) because of rollbacks
Work-arounds?

Only run the consensus protocol in the
“group membership service” or GMS




It has a small number of members, like 3-5
They run a protocol like the Isis one
Track membership (and other “global”
state on behalf of everything in the system
as a whole”
Scalability of consensus won’t matter
But this is centralized

Recall our earlier discussion


Any central service running on behalf of
the whole system will become burdened if
the system gets big enough
Can we decentralize our GMS service?
GMS in a large system
Global events
are inputs to
the GMS
GMS
Output is the official
record of events that
mattered to the
system
Hierarchical, federated GMS


Quicksilver V2 (QS2) constructs a
hierarchy of GMS state machines
In this approach, each “event” is
associated with some GMS that owns
the relevant official record
GMS0
GMS1
GMS2
Delegation of roles


One (important) use of the GMS is to track
membership in our rule enforcement
subsystem
But “delegate” responsibility for classes of
actions to subsystems that can own and
handle them locally



GMS “reports” the delegation events
In effect, it tells nodes in the system about the
system configuration – about their roles
And as conditions change, it reports new events
Delegation
In my capacity as President of the
United States, I authorize John Pigg to
oversee this nation’s banks
Thank you, sir! You
can trust me
Delegation
GMS0
GMS1
Policy
subsystem
Delegation example



IBM might delegate the handling of
access to its Kingston facility to the
security scanners at the doors
Events associated with Kingston access
don’t need to pass through the GMS
Instead, they “exist” entirely within the
group of security scanners
… giving rise to pub/sub groups

Our vision spawns lots and lots of
groups that own various aspects of trust
enforcement





The scanners at the doors
The security subsystems on our desktops
The key management system for a VPN
… etc
A nice match with publish-subscribe
Publish-subscribe in a nutshell

Publish(“topic”, message)
Subscribe(“topic”, handler)

Basic idea:



Platform invokes handler(message) each
time a topic match arises
Fancier versions also support history
mechanisms (lets joining process catch up)
Publish-subscribe in a nutshell




Concept first mentioned by Willy
Zwaenepoel in a paper on multicast in
the V system
First implementation was Frank
Schmuck’s Isis “news” tool
Later re-invented in TIB message bus
Also known as “event notification”…
very popular
Other kinds of published events

Changes in the user set


Or the group set


For example, IBM hired Sally. Jeff left his
job at CIA. Halliburton snapped him up
Jeff will be handling the Iraq account
Or the rules


Jeff will have access to the secret archives
Sally is no longer allowed to access them
But this raises problems

If “actors” only have partial knowledge


E.g. the Cornell library door access system
only knows things normally needed by that
door
… then we will need to support out-ofband interrogation of remote policy
databases in some cases
A Scalable Trust Architecture
GMS hierarchy tracks
configuration events
GMS
GMS
GMS
Pub/sub framework
Role
delegation
Master
enterprise
policy DB
Central database
tracks overall policy
Slave system
applies policy
Knowledge limited to
locally useful policy
Enterprise policy system for some
company or entity
A Scalable Trust Architecture

Enterprises talk to one-another when
decisions require non-local information
PeopleSoft
Inquiry
FBI
(policy)
Cornell University
www.zombiesattackithaca.com
Open questions?

Minimal trust


A problem reminiscent of zero-knowledge
Example:



FBI is investigating reports of zombies in Cornell’s
Mann Library… Mulder is assigned to the case.
The Cornell Mann Library must verify that he is
authorized to study the situation
But does FBI need to reveal to Cornell that the
Cigarette Man actually runs the show?
Other research questions


Pub-sub systems are organized around
topics, to which applications subscribe
But in a large-scale security policy
system, how would one structure these
topics?


Topics are like file names – “paths”
But we still would need an agreed upon
layout
Practical research question


“State transfer” is the problem of
initializing a database or service when it
joins the system after an outage
How would we implement a rapid and
secure state transfer, so that a joining
security policy enforcement module can
quickly come up to date?

Once it’s online, the pub-sub system
reports updates on topics that matter to it
Practical research question


Designing secure protocols for interenterprise queries
This could draw on the secured Internet
transaction architecture



A hierarchy of credential databases
Used to authenticate enterprises to oneanother so that they can share keys
They employ the keys to secure “queries”
Recap?


We’ve suggested that scalable trust
comes down to “emulation” of a trusted
single-node rule enforcement service by
a distributed service
And that service needs to deal with
dynamics such as changing actor set,
object set, rule set, group membership
Recap?

Concerns that any single node





Would be politically unworkable
Would impose a maximum capacity limit
Won’t be fault-tolerant
… pushed for a decentralized alternative
Needed to make a decentralized service
emulate a centralized one
Recap?



This led us to recognize that our
problem is an instance of an older
problem: replication of a state machine
or an abstract data type
The problem reduces to consensus…
and hence is impossible
… but we chose to accept “Mission
Impossible: V”
… Impossible? Who cares!


We decided that the impossibility results were
irrelevant to real systems
Federation addressed by building a hierarchy
of GMS services




Each supported by a group of servers
Each GMS owns a category of global events
Now can create pub/sub topics for the various
forms of information used in our
decentralized policy database
… enabling decentralized policy enforcement
QS2: A work in progress

We’re building Quicksilver, V2 (aka QS2)


Under development by Krzys Ostrowski at Cornell,
with help from Ken Birman, Danny Dolev (HUJL)
Some parts already exist and can be
downloaded now:


Quicksilver Scalable Multicast (QSM).
Focus is on reliable and scalable message delivery
even with huge numbers of groups or severe
stress on the system
Quicksilver Architecture

Our solution:


Assumes low latencies, IP multicast
A layered platform, native hosting on .NET
Applications (any language)
Quicksilver pub-sub API
GMS
our platform
Strongly-typed .NET group endpoints
Properties Framework endows groups with stronger properties
Quicksilver Scalable Multicast (C# / .NET)
Quicksilver: Major ideas

Maps overlapping groups down to “regions”


Multicast is doing by IP multicast, per-region




Engineering challenge: application may belong to
thousands of groups; efficiency of mapping is key
Discovers failures using circulating tokens
Local repair avoids overloading sender
Eventually will support strong reliability model too
Novel rate limited sending scheme
Groups A1..A100 Members
GroupsofCa
region
have
1..C
100
A “similar”
group membership
Signed up
A
to 100
groups
AC
C
ABC
sending
messages
in multiple
groups
AB
in 300 that aggregate
QSM runs protocols
over regions,groups
improving scalability B
In traditional group multicast
Groups B1..B100
systems, groups run independently
Protocol 1
B
Hierarchical aggregation used for
Protocol
2
inter-region
groups that span multiple regions
protocol
intra-region
protocol
Node
Protocol 3
Region
BC
C
Throughput (1 group, 1000-byte messages)
Reaction to a 10-second freeze (110 nodes, rate 7500/s)
10000
QSM (1 sender)
number of packets
throughput (packets/s)
8000
6000
QSM’s network bandwidth utilization
1 sender - 80%
2 senders - 90%
4000
node catching up,
sender resumes
sender detects delays,
suspends sending
QSM (2 senders)
node
resumes
all nodes
back
in sync
one node
freezes
sent
received (undisturbed)
received (disturbed)
completed
2000
JGroups
0
0
10
20
30
40
50
60
70
80
880
90 100 110
890
900
Throughput (1 sender, 1000-byte messages)
9000
100
4000
3000
JGroups throughput (8 nodes)
1000
50
crashed with 512 groups
0
0
1000
2000
3000
4000
number of groups
5000
0
6000
fraction of time smaller than this
150
QSM
memory
usage
memory usage (megabytes)
throughput (packets/s)
7000
2000
940
950
1
QSM throughput (110 nodes)
5000
930
QSM latency (110 nodes) – Cumulative Distribution
200
6000
920
time (s)
number of nodes
8000
910
0.9
send to receive
send to ack
0.8
0.7
0.6
0.5
0.4
average ≈ 19ms
median ≈ 17ms
maximum ≈ 340ms
average ≈ 2.6s
median ≈ 2.3s
maximum ≈ 6.9s
0.3
0.2
0.1
0
0.0001
0.001
0.01
0.1
latency (s)
1
10
Connections to type theory

We’re developing a new high-level language for
endowing groups with “types”




Such as security or reliability properties
Internally, QS2 will “compile” from this language down to
protocols that amortize costs across groups
Externally, we are integrating QS2 types with types in the
operating system / runtime environment (right now,
Windows .net)
Many challenging research topics in this area!
http://www.cs.cornell.edu/projects/quicksilver/
Open questions?

Not all policy databases are amenable
to a decentralized enforcement



Must have “enough” information at the
point of enforcement to construct proofs
Is this problem tractable? Complexity?
More research is needed on the
question of federation of policy
databases with “minimal disclosure”
Open questions?

We lack a constructive logic of
distributed, fault-tolerant systems


Part of the issue is exemplified by the FLP
problem: logic has yet to deal with the
pragmatics of real-world systems
Part of the problem resides in type theory:
we lack true “distributed” type mechanisms
Download