The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps D

advertisement
The Insider Threat
in Scalable Distributed Systems:
Algorithms, Metrics, Gaps
Yair Amir
Distributed Systems and Networks lab
Johns Hopkins University
www.dsn.jhu.edu
Yair Amir
2 Nov 07
ACM STC’07
1
Acknowledgement
• Johns Hopkins University
– Claudiu Danilov, Jonathan Krisch, John Lane
• Purdue University
– Cristina Nita-Rotaru, Josh Olsen, Dave Zage
• Hebrew University
– Danny Dolev
• Telcordia Technologies
– Brian Coan
Yair Amir
2 Nov 07
2
This Talk in Context
Scalable Information Access & Communication
• High availability (80s – 90s)
– Benign faults, accidental errors, crashes, recoveries, network
partitions, merges.
– Fault tolerant replication as an important tool.
– Challenges – consistency, scalability and performance.
• Security (90s – 00s)
– Attackers are external.
– Securing the distributed system and the network.
– Crypto+ as an important tool.
• Survivability ( 00s – …)
– Millions of compromised computers: there is always a chance the
system will be compromised.
– Lets start the game when parts of it are already compromised.
– Can the system still achieve its goal, and under what assumptions ?
– Challenges – assumptions, scalability, performance.
Yair Amir
2 Nov 07
3
Trends:
Information Access & Communication
• Networks become one
– From one’s network to one Internet.
– Therefore: (inherently,) the environment becomes increasingly
hostile.
• Stronger adversaries => weaker models.
– Benign faults - mean time to failure, fault independence
• Fail-stop, crash-recovery, network partitions-merges.
• Goals: high availability, consistency (safety, liveness).
– External attacks – us versus them
• Eavesdropping, replay attacks, resource consumption kind of DoS.
• Goals: keep them out. Authentication, Integrity, Confidentiality.
– Insider attacks – the enemy is us
• Byzantine behavior
• Goals: safety, liveness, (performance?)
Yair Amir
2 Nov 07
4
The Insider Threat
• Networks are already hostile!
– 250,000 new zombie nodes per day.
– Very likely that some of them are part of critical
systems.
– Insider attacks are a real threat, even for wellprotected systems.
• Challenges:
– Service level: Can we provide “correct” service?
– Network level: Can we “move” the bits?
– Client level: Can we handle “bad” input?
Yair Amir
2 Nov 07
5
The Insider Threat in Scalable Systems
• Service level: Byzantine Replication
– Hybrid approach: few trusted components, everything
else can be compromised.
– Symmetric approach: No trusted component,
compromise up to some threshold.
• Network level: Byzantine Routing
– Flooding “solves” the problem.
– “Stable” networks - some limited solutions,
good starting point [Awerbuch et al. 02]
– “Dynamic” networks – open problem.
• Client level: ?
– Input replication – not feasible in most cases.
– Recovery after the fact – Intrusion detection,
tracking and backtracking [Chen et al. 03].
– Open question – is there a better approach?
Yair Amir
2 Nov 07
6
Outline
•
•
•
•
•
Context and trends
Various levels of the insider threat problem
Service level problem formulation
Relevant background
Steward: First scalable Byzantine replication
–
–
–
–
A bit on how it works
Correctness
Performance
Tradeoffs
• Composable architecture
– A bit on how it works
– BLink – Byzantine link protocol
– Performance and optimization
• Theory hits reality
– Limitation of existing correctness criteria
– Proposed model and metrics
• Summary
Yair Amir
2 Nov 07
7
Service Level: Problem Formulation
A site
Clients
Server
Replicas
•
•
•
•
•
•
1
2
3 ooo N
Servers are distributed in sites, over a Wide Area Network.
Clients issue requests to servers, then get back answers.
Some servers can act maliciously.
Wide area connectivity is limited and unstable.
How to get good performance and guarantee correctness ?
What is correctness?
Yair Amir
2 Nov 07
8
Relevant Prior Work
• Byzantine Agreement
– Byzantine generals [Lamport et al. 82], [Dolev 83]
• Replication with benign faults
– 2-phase commit [Eswaran, Gray et al. 76]
– 3-phase commit [Skeen, Stonebreaker 82]
– Paxos [Lamport 98]
• Hybrid architectures
– Hybrid Byzantine tolerant systems [Correia, Verissimo et al. 04]
• Symmetric approaches for Byzantine-tolerant replication
–
–
–
–
BFT [Castro, Liskov 99]
Separating agreement from execution [Yin, Alvisi et al. 03]
Fast Byzantine consensus [Martin, Alvisi 05]
Byzantine-tolerant storage using erasure codes [Goodson,
Reiter et al. 04]
Yair Amir
2 Nov 07
9
Background: Paxos and BFT
C
request proposal accept
reply
C
0
0
1
1
request
pre-prepare prepare commit
reply
2
2
3
• Paxos [Lamport 98]
– Ordering coordinated by an
elected leader.
– Two rounds among servers
during normal case
(Proposal and Accept).
– Requires 2f+1 servers to
tolerate f benign faults.
• BFT [Castro, Liskov 99]
– Extends Paxos into the Byzantine
environment.
– One additional round of
communication, crypto.
– Requires 3f+1 servers to tolerate f
Byzantine servers.
SRDS 2007
October 10, 2007
Background: Threshold Crypto
• Practical Threshold Signatures [Schoup 2000]
– Each participant receives a secret share.
– Each participant signs a certain message with its share, and
sends the signed message to a combiner.
– Out of k valid signed shares, the combiner creates a
(k, n) threshold signature.
• A (k, n) threshold signature
– Guarantees that at least k participants signed the same
message with their share.
– Can be verified with simple RSA operations.
– Combining the shares is fairly expensive.
– Signature verification is fast.
Yair Amir
2 Nov 07
11
Steward: First Byzantine Replication
Scalable to Wide Area Networks
A site
[DSN 2006]
Clients
Server
Replicas
•
•
1
2
3
ooo
N
Each site acts as a trusted unit that can crash or partition.
Within each site: Byzantine-tolerant agreement (similar to BFT).
– Masks f malicious faults in each site.
– Threshold signatures prove agreement to other sites.
•
•
---------- that is optimally intertwined with -------------Between sites: light-weight, fault-tolerant protocol (similar to Paxos).
•
There is no free lunch: we pay with more hardware.
– 3f+1 servers in each site.
Yair Amir
2 Nov 07
12
Outline
•
•
•
•
•
Context and trends
Various levels of the insider threat problem
Service level problem formulation
Relevant background
Steward: First scalable Byzantine replication
–
–
–
–
A bit on how it works
Correctness
Performance
Tradeoffs
• Composable architecture
– A bit on how it works
– BLink – Byzantine link protocol
– Performance and optimization
• Theory hits reality
– Limitation of existing correctness criteria
– Proposed model and metrics
• Summary
Yair Amir
2 Nov 07
13
Main Idea 1:
Common Case Operation
• A client sends an update to a
server at its local site.
• The update is forwarded to the
leader site.
• The representative of the
leader site assigns order
in agreement and issues
a threshold signed
proposal.
• Each site issues a
threshold signed accept.
• Upon receiving a majority of
accepts, servers in each site
“order” the update.
• The original server sends a
response to the client.
Yair Amir
2 Nov 07
Byzantine ordering
Threshold signed
proposal (2f+1)
Threshold signed
accept (2f+1)
14
Steward Hierarchy Benefits
• Reduces the number of messages sent on the wide
area network.
– O(N2)  O(S2) – helps both in throughput and latency.
• Reduces the number of wide area crossings.
– BFT-based protocols require 3 wide area crossings.
– Paxos-based protocols require 2 wide area crossings.
• Optimizes the number of local Byzantine agreements.
– A single agreement per update at leader site.
– Potential for excellent performance.
• Increases system availability
– (2/3 of total servers + 1)  (A majority of sites).
– Read-only queries can be answered locally.
Yair Amir
2 Nov 07
15
Steward Hierarchy Challenges
• Each site has a representative that:
– Coordinates the Byzantine protocol inside the site.
– Forwards packets in and out of the site.
• One of the sites act as the leader in the wide area protocol
– The representative of the leading site is the one assigning sequence
numbers to updates.
• Messages coming out of a site during leader election are based on
communication between 2f+1(out of 3f+1) servers inside the site.
– There can be multiple sets of 2f+1 servers.
– In some instances, multiple correct but different site messages can be
issued by a malicious representative.
– It is sometimes impossible to completely isolate a malicious server
behavior inside its own site.
• How do we select and change representatives in agreement ?
• How do we select and change the leader site in agreement ?
• How do we transition safely when we need to change them ?
Yair Amir
2 Nov 07
16
Main Idea 2: View Changes
• Sites change their local
representatives based
on timeouts.
• Leader site
representative has a
larger timeout.
– allows to contact at
least one correct
rep. at other sites.
• After changing
enough leader site
representatives, servers
at all sites stop
participating in the
protocol, and elect a
different leading site.
Yair Amir
2 Nov 07
17
Correctness Criteria
• Safety:
– If two correct servers order an update with the
same sequence i, then these updates are
identical.
• Liveness:
– If there exists a set of a majority of sites, each
consisting of at least 2f+1 correct, connected
servers, and a time after which all sites in the set
are connected, then if a client connected to a site
in the set proposes an update, some correct
server at a site in the set eventually orders the
update.
Yair Amir
2 Nov 07
18
Intuition Behind a Proof
• Safety:
 Any agreement (ordering or view change) involves
a majority of sites, and 2f+1 servers in each.
 Any two majorities intersect in at least one site.
 Any two sets of 2f+1 servers in that site intersect in at least f+1
servers (which means at least one correct server).
 That correct server will not agree to order two different updates
with the same sequence.
• Liveness:
 A correct representative or leader site cannot be changed
by f local servers.
 The selection of different timeouts ensures that a correct
representative of the leader site has enough time to contact
correct representatives at other sites.
Yair Amir
2 Nov 07
19
Testing Environment
 Platform: Dual Intel Xeon CPU 3.2 GHz 64 bits
1 GByte RAM, Linux Fedora Core 4.
 Library relies on Openssl :
- Used OpenSSL 0.9.7a, Feb 2003.
 Baseline operations:
- RSA 1024-bits sign: 1.3 ms, verify: 0.07 ms.
- Perform modular exponentiation 1024 bits, ~1 ms.
- Generate a 1024 bits RSA key ~ 55ms.
Yair Amir
2 Nov 07
20
Symmetric Wide Area Network
•
•
•
•
•
•
Synthetic network used for
analysis and understanding.
5 sites, each of which
connected to all other sites with
equal bandwidth/latency links.
One fully deployed site of 16
replicas; the other sites are
emulated by one computer
each.
Total – 80 replicas in the
system, emulated by 20
computers.
50 ms wide area links between
sites.
Varied wide area bandwidth and
the number of clients.
Yair Amir
2 Nov 07
21
Write Update Performance
Update Throughput
•
•
90
Symmetric network.
5 sites.
80
70
Steward 10Mbps
•
•
•
•
BFT:
16 replicas total.
4 replicas in one site,
3 replicas in each other site.
Up to 5 faults total.
Updates/sec
60
Steward 5Mbps
50
Steward 2.5Mbps
40
BFT 10Mbps
BFT 5Mbps
30
BFT 2.5Mbps
20
10
0
0
5
10
15
20
25
30
Clients
•
•
Steward:
16 replicas per site.
Total of 80 replicas
(four
sites are emulated). Actual
computers: 20.
Up to 5 faults in each site.
Update only performance
(no disk writes).
Update Latency
1000
900
800
Latency (ms)
•
•
•
700
Steward 10Mbps
600
Steward 5Mbps
Steward 2.5Mbps
500
BFT 10Mbps
400
BFT 5Mbps
300
BFT 2.5Mbps
200
100
0
0
5
10
15
20
25
30
Clients
Yair Amir
2 Nov 07
22
Read-only Query Performance
Query Mix Throughput
500
450
400
Actions/sec
350
300
Steward
250
BFT
200
150
100
50
0
0
10
20
30
40
50
60
70
80
90
100
Update ratio (%)
Query Mix Latency
500
450
400
350
Latency (ms)
• 10 Mbps on wide area
links.
• 10 clients inject mixes of
read-only queries and
write updates.
• None of the systems was
limited by bandwidth.
• Performance improves
between a factor of two
and more than an order
of magnitude.
• Availability: Queries can
be answered locally,
within each site.
300
Steward
250
BFT
200
150
100
50
0
0
10
20
30
40
50
60
70
80
90
100
Update ratio (%)
Yair Amir
2 Nov 07
23
Wide-Area Scalability
Planetlab Update Throughput
90
80
70
Updates/sec
60
50
Steward
40
BFT
30
20
10
0
0
5
10
15
20
25
30
Clients
Planetlab Update Latency
1400
1200
1000
Latency (ms)
• Selected 5 Planetlab sites,
in 5 different continents:
US, Brazil, Sweden, Korea
and Australia.
• Measured bandwidth and
latency between every pair
of sites.
• Emulated the network on
our cluster, both for
Steward and BFT.
• 3-fold latency improvement
even when bandwidth is
not limited. (how come ?)
800
Steward
BFT
600
400
200
0
0
5
10
15
20
25
30
Clients
Yair Amir
2 Nov 07
24
Non-Byzantine Comparison
Boston
MITPC
Delaware
4.9 ms
UDELPC
9.81Mbits/sec
3.6 ms
ISEPC
1.42Mbits/sec
San Jose
TISWPC
1.4 ms
1.47Mbits/sec
ISEPC3
ISIPC4
ISIPC
38.8 ms
1.86Mbits/sec
100 Mb/s
<1ms
Virginia
100 Mb/s
< 1ms
Los Angeles
•
•
•
Based on a real experimental network (CAIRN).
Several years ago we benchmarked benign replication on this network.
Modeled on our cluster, emulating bandwidth and latency constraints,
both for Steward and BFT.
Yair Amir
2 Nov 07
25
CAIRN Emulation Performance
CAIRN Update Throughput
90
80
70
60
Updates/sec
• Steward is limited by
bandwidth at 51
updates per second.
50
Steward
40
BFT
30
20
• 1.8Mbps can barely
accommodate 2
updates per second for
BFT.
0
0
5
10
15
20
25
30
Clients
CAIRN Update Latency
1400
1200
1000
Latency (ms)
• Earlier experimentation
with benign fault
2-phase commit
protocols achieved up
to 76 updates per sec.
[Amir et. all 02].
10
Steward
800
BFT
600
400
200
0
0
5
10
15
20
25
30
Clients
Yair Amir
2 Nov 07
26
Steward: Approach Tradeoffs
• Excellent performance
– Optimized based on intertwined knowledge among global
and local protocols.
• Highly complex
– Complex correctness proof.
– Complex implementation.
• Limited model does not translate well to wide area
environment needs
– Global benign protocol over local Byzantine.
– “What if the whole site is compromised?”
– Partially addressed by implementing 4 different protocols:
Byzantine/Benign, Byzantine/Byzantine, Benign/Benign,
Benign/Byzantine (Steward).
– “Different sites have different security profiles…”
Yair Amir
2 Nov 07
27
A Composable Approach
[SRDS 2007]
• Use clean two-level hierarchy to maintain scalability.
– Clean separation of the local and global protocols.
– Message complexity remains O(Sites2).
• Use state machine based logical machines to
achieve a customizable architecture.
– Free substitution of the fault tolerance method used in each
site and among the sites.
• Use efficient wide-area communication to achieve
high performance.
– Byzantine Link (BLink) protocol for inter logical machine
communication.
Yair Amir
2 Nov 07
28
Outline
•
•
•
•
•
Context and trends
Various levels of the insider threat problem
Service level problem formulation
Relevant background
Steward: First scalable Byzantine replication
–
–
–
–
A bit on how it works
Correctness
Performance
Tradeoffs
• Composable architecture
– A bit on how it works
– BLink – Byzantine link protocol
– Performance and optimization
• Theory hits reality
– Limitation of existing correctness criteria
– Proposed model and metrics.
• Summary
Yair Amir
2 Nov 07
29
Building a Logical Machine
Wide-Area Protocol
Wide-Area Protocol
BLink
Local-Area Protocol
•
BLink
Local-Area Protocol
Site B  Logical Machine B
Site A  Logical Machine A
A single instance of the wide-area replication protocol runs among a group of
logical machines (LMs), one in each site.
– Logical machines behave like single physical machines with respect to the widearea protocol.
– Logical machines send threshold-signed wide-area messages via BLink.
•
Each logical machine is implemented by a separate instance of a local state
machine replication protocol.
– Physical machines in each site locally order all wide-area protocol events:
• Wide-area message reception events.
• Wide-area protocol timeout events.
•
Each logical machine executes a single stream of wide-area protocol events.
Yair Amir
2 Nov 07
30
A Composable Architecture
• Clean separation and free substitution
– We can choose the local-area protocol deployed in each
site, and the wide-area protocol deployed among sites.
– Trade performance for fault tolerance
• Protocol compositions: wide area / local area
– Paxos on the wide area: Paxos/Paxos, Paxos/BFT
– BFT on the wide area: BFT/Paxos, BFT/BFT
Yair Amir
2 Nov 07
31
A Composable Architecture
• Clean separation and free substitution
– We can choose the local-area protocol deployed in each
site, and the wide-area protocol deployed among sites.
– Trade performance for fault tolerance
• Protocol compositions: wide area / local area
– Paxos on the wide area: Paxos/Paxos, Paxos/BFT
– BFT on the wide area: BFT/Paxos, BFT/BFT
Yair Amir
2 Nov 07
32
An Example: Paxos/BFT
Leader site
Logical Machine
LM1
BLink Logical Links
Physical Machines
LM2
LM5
Logical
Machine
Wide-Area
Network
LM3
LM4
Client
Yair Amir
2 Nov 07
33
Paxos/BFT in Action
LM1
LM2
LM5
Update initiation from Client
LM3
Yair Amir
LM4
2 Nov 07
34
Paxos/BFT in Action
LM1
LM2
LM5
Local Ordering of Update,
Threshold Signing of Update
LM3
Yair Amir
LM4
2 Nov 07
35
Paxos/BFT in Action
LM1
Forwarding of Update to
Leader LM via BLink
LM2
LM5
LM3
Yair Amir
LM4
2 Nov 07
36
Paxos/BFT in Action
LM1
Local Ordering of Update,
Threshold Signing of Proposal
LM2
LM5
LM3
Yair Amir
LM4
2 Nov 07
37
Paxos/BFT in Action
LM1
Dissemination of
Proposal via BLink
LM2
LM5
LM3
Yair Amir
LM4
2 Nov 07
38
Paxos/BFT in Action
LM1
Local Ordering of Proposal,
Threshold Signing of Accept
LM2
LM5
LM3
Yair Amir
LM4
2 Nov 07
39
Paxos/BFT in Action
LM1
Dissemination of
Accepts via BLink
LM2
LM5
LM3
Yair Amir
LM4
2 Nov 07
40
Paxos/BFT in Action
LM1
Local Ordering of Accepts,
Global Ordering of Proposal
LM2
LM5
LM3
Yair Amir
LM4
2 Nov 07
41
Paxos/BFT in Action
LM1
LM2
LM5
Reply to client
LM3
Yair Amir
LM4
2 Nov 07
42
The BLink Protocol
• Faulty servers can block communication into and out of logical
machines.
• Redundant message sending is not feasible in wide-area
environments.
• Our approach: BLink protocol
– Outgoing wide-area messages are normally sent only once.
– Four sub-protocols, depending on fault tolerance method in sending
and receiving logical machines:
• (Byzantine, Byzantine), (Byzantine, benign)
• (benign, Byzantine), (benign, benign)
– This talk: (Byzantine, Byzantine)
Yair Amir
2 Nov 07
43
Constructing Logical Links
BLink Logical Link
Sending
Logical
Machine
Receiving
Logical
Machine
Virtual
Links
• Logical links are constructed from sets of virtual links.
• Each virtual link contains:
– Forwarder from the sending logical machine
– Peer from the receiving logical machine.
• Virtual links are constructed via a mapping function.
• At a given time, the LM delegates wide-area communication
responsibility to one virtual link on each logical link.
• Virtual links suspected of being faulty are replaced according to
a selection order.
Yair Amir
2 Nov 07
44
Intuition: A Simple Mapping
•
•
•
•
F = 2, N = 3F+1 = 7
Servers 0 and 1 from Sending LM and
Servers 2 and 3 from Receiving LM faulty.
Mapping function:
– Virtual link i consists of the servers
with id  i mod N
Selection order:
– Cycle through virtual links in sequence
(1, 2, 3, ….)
0
1
2
X
X
3
4
5
6
Sending LM
X
X
0
1
2
3
4
5
6
Receiving LM
• Two important metrics:
– Ratio of correct to faulty virtual links
– Worst-case number of consecutive faulty virtual links
• With the simple mapping:
– At least 1/3 of the virtual links are correct.
– The adversary can block at most 2F consecutive virtual links.
• With a more sophisticated mapping:
– At least 4/9 of the virtual links are correct.
– The adversary can block at most 2F consecutive virtual links.
Yair Amir
2 Nov 07
45
Architectural Comparison
Update Throughput vs. Clients
50ms Diameter, 10Mbps Links
RSA Sign
90
Steward
1
3
Paxos/Paxos
0
2+(S-1)
BFT/Paxos
0
3+2(S-1)
Paxos/BFT
1
3+2(S-1)
80
Update Throughput
(updates/sec)
Protocol
Threshold
RSA Sign
70
Stew ard
60
Paxos/Paxos
50
Paxos/BFT
40
BFT/Paxos
30
BFT/BFT
20
10
0
0
BFT/BFT
2
10
20
30
40
Number of Clients
4+4(S-1)
Update Latency vs. Clients
50ms Diameter, 10Mbps Links
0.7
Protocols were CPU-limited.
Relative maximum throughput
corresponds to the number of
expensive cryptographic operations.
0.6
Update Latency (s)
•
•
0.5
Stew ard
Paxos/Paxos
0.4
Paxos/BFT
0.3
BFT/Paxos
BFT/BFT
0.2
0.1
0
0
10
20
Number of Clients
30
40
Architectural Comparison
•
Update Throughput vs. Clients
50ms Diameter, 10Mbps Links
Paxos/BFT vs. Steward
90
80
Update Throughput
(updates/sec)
– Same level of fault tolerance
– Paxos/BFT locally orders all wide-area
protocol events, Steward orders
events only when necessary.
– Paxos/BFT achieves about 2.5 times
lower throughput than Steward.
– Difference is the cost of providing
customizability!
70
Stew ard
60
Paxos/Paxos
50
Paxos/BFT
40
BFT/Paxos
30
BFT/BFT
20
10
0
0
10
20
30
40
Number of Clients
Update Latency vs. Clients
50ms Diameter, 10Mbps Links
0.7
Protocols were CPU-limited.
Relative maximum throughput
corresponds to the number of
expensive cryptographic operations.
0.6
Update Latency (s)
•
•
0.5
Stew ard
Paxos/Paxos
0.4
Paxos/BFT
0.3
BFT/Paxos
BFT/BFT
0.2
0.1
0
0
10
20
Number of Clients
30
40
Performance Optimizations
• Computational Bottlenecks:
– 1. Ordering all message reception events.
– 2. Threshold signing outgoing messages.
• Solutions:
– Aggregate local ordering: batching
– Aggregate threshold signing: Merkle trees
• Use a single threshold signature for many outgoing
messages.
• Outgoing messages contain additional information needed
to verify the threshold signature.
Yair Amir
2 Nov 07
48
Merkle Hash Trees
•
•
•
•
Use a single threshold signature for many outgoing wide-area messages.
Each leaf contains the digest of a message to be sent.
Each interior node contains the digest of the concatenation of its two children.
Threshold signature is computed on the root hash.
N1-8 Root hash: D(N1-4 || N5-8)
N1-4 D(N1-2 || N3-4)
N1-2 D(N1 || N2)
D(m1)
N1
D(m2)
N2
Threshold Sign
N5-8 D(N5-6 || N7-8)
N3-4 D(N3 || N4)
N5-6 D(N5 || N6)
N7-8 D(N7 || N8)
D(m3)
N3
D(m5)
N5
D(m7)
N7
D(m4)
N4
D(m6)
N6
D(m8)
N8
Example: Sending Message m4
•
Outgoing message contains additional information needed to verify the signature.
–
–
–
•
The message itself
The siblings of the nodes on the path from m4 to the root hash
The signature on the root hash
To verify, use the digests to reconstruct the root hash, then verify the threshold signature.
Send: m4 || N3 || N1-2 || N5-8
Root hash
N1-8 Root hash: D(N1-4 || N5-8)
N1-4
D(N1-2 || N3-4)
N1-2 D(N1 || N2)
D(m1)
N1
D(m2)
N2
N3-4
D(m3)
N3
N5-8 D(N5-6 || N7-8)
D(N3 || N4)
D(m4)
N4
N5-6 D(N5 || N6)
N7-8 D(N7 || N8)
D(m5)
N5
D(m7)
N7
D(m6)
N6
D(m8)
N8
Performance of Optimized Systems
Update Throughput vs. Clients
50ms Diameter, 10 Mbps Links
Total
Steward
2
4
6
Paxos/Paxos
2
6
8
BFT/Paxos
3
8
11
Paxos/BFT
2
11
13
BFT/BFT
3
15
18
Paxos/Paxos
400
350
Update Throughput
(updates/sec)
Protocol
Protocol Rounds
Wide
Local
Area
Area
Optimized Steward
300
250
200
Paxos/BFT
150
100
BFT/Paxos
50
0
0
50
100
BFT/BFT
150
Number of Clients
•
Maximum throughput limited by widearea bandwidth and impacted by number
of wide-area rounds.
Optimizations effectively eliminate the
computational bottleneck associated with
local ordering.
Update Latency vs. Clients
50ms Diameter, 10Mbps Links
0.7
Paxos/Paxos
0.6
Update Latency (s)
•
Optimized
Steward
Paxos/BFT
0.5
0.4
0.3
BFT/Paxos
0.2
BFT/BFT
0.1
0
0
50
100
Number of Clients
150
Performance of Optimized Systems
•
Update Throughput vs. Clients
50ms Diameter, 10 Mbps Links
Paxos/BFT vs. Steward
•
Paxos/BFT and Steward achieve
identical maximum throughput.
almost
BFT/BFT vs. Paxos/BFT
–
BFT/BFT offers stronger fault tolerance
properties than Paxos/BFT and achieves
roughly 75% of the throughput.
Paxos/Paxos
400
350
Update Throughput
(updates/sec)
–
Optimized Steward
300
250
200
Paxos/BFT
150
100
BFT/Paxos
50
0
0
50
100
BFT/BFT
150
Number of Clients
•
Maximum throughput limited by widearea bandwidth and impacted by number
of wide-area rounds.
Optimizations effectively eliminate the
computational bottleneck associated with
local ordering.
Update Latency vs. Clients
50ms Diameter, 10Mbps Links
0.7
Paxos/Paxos
0.6
Update Latency (s)
•
Optimized
Steward
Paxos/BFT
0.5
0.4
0.3
BFT/Paxos
0.2
BFT/BFT
0.1
0
0
50
100
Number of Clients
150
Outline
•
•
•
•
•
Context and trends
Various levels of the insider threat problem
Service level problem formulation
Relevant background
Steward: First scalable Byzantine replication
–
–
–
–
A bit on how it works
Correctness
Performance
Tradeoffs
• Composable architecture
– A bit on how it works
– BLink – Byzantine link protocol
– Performance and optimization
• Theory hits reality
– Limitation of existing correctness criteria
– Proposed model and metrics
• Summary
Yair Amir
2 Nov 07
53
Red Team Attack
• Steward under attack
– Five sites, 4 replicas each.
– Red team had full control (root)
over five replicas, one in each
site. Full access to source code.
– Both representative and stand-by
replicas were attacked.
– Compromised replicas were
injecting:
•
•
•
•
•
Loss (up to 20% each)
Delay (up to 200ms)
Packet reordering
Fragmentation (up to 100 bytes)
Replay attacks
3
2
4
1
5
– Compromised replicas were
running modified servers that
contained malicious code.
Yair Amir
2 Nov 07
54
Red Team Results
• The system was NOT compromised!
– Safety and liveness guarantees were preserved.
– The system continued to run correctly under all attacks.
• Most of the attacks did not affect the performance.
• The system was slowed down when the representative of the
leading site was attacked.
– Speed of update ordering was slowed down by a factor of 5.
• Big problem:
– A better attack could slow the system down by a factor of 100.
– Still ok in terms of liveness criterion.
• Main lesson:
– Correctness criteria used by the community are not good
enough for scalable systems over wide area networks.
Yair Amir
2 Nov 07
55
New Attack Model and Metrics
• In addition to existing safety and liveness.
• Performance attack:
– Once the adversary cannot compromise safety and cannot
stop the system, the next best thing is to slow it down below
usefulness.
• Performance metric:
– Can we guarantee a certain average fraction of the “clean”
performance while under attack
– Assumptions: correct nodes can freely communicate (non
resource consumption denial of service); “clean”
performance is defined as performance of “best” algorithm.
• Response metric:
– How fast can we get to the above average fraction
Can we design algorithms that achieve these metrics?
Yair Amir
2 Nov 07
56
Summary
• Insider threat problem is important on several
levels.
• For the service level
– Algorithmic engines for scalable solutions seem on
the right track
– But still a gap between algorithmic engines and
practical systems (e.g. management).
• Solutions for network and client levels less
mature
• What fits small scale systems does not
necessarily fit large scale systems,
especially on the wide area.
– New attack models
– New metrics
– New algorithmic approaches
Yair Amir
2 Nov 07
57
Download