Using Client Puzzles to Protect TLS Drew Dean Adam Stubblefield Xerox PARC

advertisement
Using Client Puzzles to Protect TLS
Drew Dean
Xerox PARC
ddean@parc.xerox.com
Adam Stubblefield†
Rice University
astubble@rice.edu
Abstract
Client puzzles are commonly proposed as a solution to
denial-of-service attacks. However, very few implementations of the idea actually exist, and there are a number of subtle details in the implementation. In this paper, we describe our implementation of a simple and
backwards compatible client puzzle extension to TLS.
We also present measurements of CPU load and latency
when our modified library is used to protect a secure
webserver. These measurements show that client puzzles are a viable method for protecting SSL servers from
SSL based denial-of-service attacks.
1 Introduction
Denial-of-service attacks have become a major problem on the Internet. Major web sites have been taken
down for several hours at a time by distributed denialof-service (DDoS). The attackers have shown an interesting combination of skill and ignorance. They are able
to break into tens or hundreds of machines and install
their tool of choice. They then use these “zombie” machines to actually launch the DDoS attack. Some of
the tools even use encrypted communications between
the attacker and zombie machines. The tools forge the
source IP address on the traffic they generate in order to make determining the zombie machine somewhat
harder. They will pick IP addresses that are on the same
subnet, in order to overcome egress filtering.
However, the tools work via brute force: they just gen This work was supported in part by DARPA grant N66001-00-18921.
† This work was completed while the author was an intern at Xerox
PARC.
erate random traffic (perhaps with a political message)
aimed at a particular machine. While generating a gigabyte per second of traffic aimed at a single machine will
bring most websites down to their knees, the sheer volume traffic stands out for anyone doing network monitoring. For ecommerce sites, the attacker could easily arrange an attack such that the website remained available,
but web surfers are unable to complete any purchases.
Such an attack is based on going after the secure server
that processes credit card payments1. The SSL/TLS protocol, as it stands, allows the client to request the server
to perform an RSA decryption without first having done
any work. RSA decryption is an expensive operation;
the largest secure site we are aware of can process 4000
RSA decryptions per second. If we assume that a partial SSL handshake takes 200 bytes, then 800 KB/s is
sufficient to paralyze an ecommerce site. Such a small
amount of traffic is much easier to hide.
The rest of this paper describes our design and implementation of a modification to the TLS protocol to overcome this attack. We use the idea of client puzzles
to slow down the attacker sufficiently that a denial-ofservice is no longer possible. The rest of the paper is
organized as follows: Section 2 discusses related work,
Section 3 presents our approach, Section 4 contains an
analysis of our proposed scheme, Section 5 discusses future work, and Section 6 concludes.
2 Related Work
The original idea of cryptographic puzzles is due to
Merkle [8]. However, Merkle used puzzles for key
agreement, rather than access control. Client puzzles
have been applied to TCP SYN flooding by Juels and
1 Most
ecommerce sites only use a secure server for transmitting
payment information.
Brainard [7], who mention that SSL has a similar problem. Aura, Nikander, and Leiwo apply client puzzles to
authentication protocols in general [2]; we share a number of techniques with them. However, we concentrate
specifically on TLS. Dwork and Naor presented client
puzzles as a general solution to controlling resource usage, and specifically for regulating junk email [3]. Their
schemes develop along a different axis, primarily motivated by the desire for the puzzles to have shortcuts if
a piece of secret information is known. Franklin and
Malkhi use iterated application of a strong hash function as a forgery-resistant timer for web usage monitoring [6]. Again, this is slightly different, as we are not interested in counting time, rather in providing a rate limiting step for new TLS connections. Rivest, Shamir, and
Wagner posed the related problem of time-lock cryptography in their 1996 paper [9]. Our goal is much much
more limited than theirs; we seek only to prevent a denial of service attack on TLS. This implies that we need
not concern ourselves with making our puzzles inherently sequential; an attacker capable of solving puzzles
in parallel could just as easily launch multiple attacks in
parallel.
3 Design and Implementation
It is always critical to be explicit about the goals and
assumptions of security work. Our goals are as follows:
1. To prevent denial of service attacks on secure web
servers.
2. To remain as backwards compatible as possible
with TLS.
3. To minimize additional server load added by implementing these measures.
These goals are listed in order of importance. Clearly,
solving the problem we are considering is the first priority. Remaining compatible with existing TLS implementations is much more important than minimizing impact
on server performance, because purchasing small additional amounts of CPU power is generally inexpensive.
We make the following assumptions about the environment:
1. That the server being protected has ample network
bandwidth. We cannot prevent the brute force
DDoS attack; we only aim to prevent an attack
against TLS.
2. That legitimate clients, seeking access to a heavily
loaded server, are willing to perform a computation
that takes no more than a few seconds, and often
much less.
These assumptions acknowledge fundamental tradeoffs
in availability. We are working at the TLS layer, on top
of TCP. If there is insufficient bandwidth for TCP to operate, there is nothing we can do about it here. We require each client to make a small sacrifice in peak performance to make average performance better. This is
almost always a good tradeoff to make.
3.1
Design
The TLS protocol breaks up the underlying TCP stream
into a record oriented protocol. The unshaded portions
of Figure 1 diagram the opening TLS handshake. The
TLS specification specifies that unknown (to a particular
implementation) record type shall be ignored. Therefore, we use a new record type for the puzzle messages.
This allows us to we remain backwards compatible with
old TLS implementations that do not support puzzles.
Though such implementations may time out a connection if they do not reply to a puzzle, they will not notice
any protocol violations. This technique is only applicable to TLS and does not work for SSLv3 as SSLv3 does
not discard unknown record types. When the server is
not under attack, no changes in the TLS protocol are required.
In order to prevent the denial-of-service attack against
TLS, we need to add a new message after the Server
Hello message and before the Server Done message. See
the shaded portions of Figure 1. This message contains
a cryptographic puzzle and is only sent when the server
is under load. The server will then wait on a response
message before continuing with the handshake protocol.
3.1.1 The Client Puzzles
To be useful as a client puzzle, a puzzle needs to be solvable in a predictable amount of time. The puzzle generally should not take too long to solve (e.g., no more than
a second or so on a relatively slow machine), but at the
same time, there should be no known shortcut to solving the puzzle. In addition, the server needs to be able
Client Hello
Server
Helllo
struct {
uint8 PuzzleLength;
uint8 PuzzlePreimage[64];
uint8 PuzzleHash[16];
} PuzzleChallenge;
Certificate
Puzzle
Request
struct {
uint8 PuzzleSolution[64];
} PuzzleResponse;
Puzzle
Reply
Server
Done
Figure 1: The TLS handshake protocol. The shaded portions are our additions.
Figure 2: The client puzzle messages
to generate puzzles while doing much less work than the
client solving them. Of course, the server also needs an
efficient method of verifying the correctness of a proposed solution.
For h(x), a preimage resistant hash function, a client
puzzle is the triple (n; x0 ; h(x)), where x0 is x with its n
lowest bits set to 0. Both MD5 and SHA-1 are conjectured to be preimage resistant. The solution to the puzzle
is the full value of x. Because h(x) is preimage resistant,
the best way for a client to generate x, is to try values in
the domain bounded by 0 and x0 until a match is found.
This should take, on average, 2n 1 calculations of h(x).
The server, on the other hand, needs to generate a random block (for MD5 and SHA-1, 512 bits) of data, and
evaluate the hash function twice. Note that we generate
an entire random block rather than just the unknown bits
to prevent an attacker from effectively precomputing all
possible puzzles.
3.1.2 The Message Format
The client puzzle message format we use is described
using the TLS presentation language in Figure 2. The
value PuzzleLength is the number of unknown bits in
PuzzlePreimage. PuzzleHash is the target value and is
computed by taking the MD5 hash of the correct puzzle solution. In the reply message, PuzzleSolution is
the client’s solution to the puzzle. The server checks
to make sure that MD5(PuzzleSolution) = PuzzleHash.
3.1.3 Determining Server Load
We desire only to send puzzles when the server is overloaded for two reasons:
1. While the server is not overloaded, we would like
all TLS clients, not just those with our modifications, to be able to communicate with the server.
2. There is no point to adding more latency to TLS
connection setup when the server is not overloaded.
Determining when a server is overloaded due to incoming TLS connections is one of the interesting problems
we faced. There is no obviously correct metric: the machine’s load is due to many factors, and the rate of new
TLS connections may be quite high, but still manageable, e.g., if the machine has a hardware cryptographic
accelerator. Note that the computationally intensive section of the TLS handshake protocol occurs after the
client sends the ClientKeyExchange message2 . At that
point, the server must perform a public key operation.
In the most common case, this operation is a RSA private key decryption. This is the operation that we seek to
protect the server from having to complete for malicious
users.
The metric we choose to use counts the number of RSA
decryptions that we have committed to performing. We
increment this count when either we decide not to send
a client puzzle or when the correct solution to a client
2 There are other computationally intensive sections of the protocol
if the server agrees to an anonymous or export cipher suite. These are
beyond the scope of this paper.
SSL3_ST_SW_CERT_REQ
Yes
Apache
OpenSSL
SSL3_ST_SW_PUZZLE
Under Attack?
Se
puz nd
zle
?
puzzlebits_cb
Inc
RS remen
Ac
oun t
t
inc_rsa_cb
Dec
r
RSA ement
cou
nt
dec_rsa_cb
No
SSL3_ST_SR_PUZZLE
ssl3_accept
SSL3_ST_SW_SRVR_DONE
Figure 3: TLS states in OpenSSL
puzzle is submitted. The count is decremented after the
corresponding RSA decryption has completed or if the
connection is closed before the RSA operation is begun. This measures exactly what we care about: whether
there is a backlog of public key operations to be performed in the near future.
By detecting whether this value is above a specified
limit, the server can determine whether or not to send a
client puzzle. More complicated schemes based on this
metric such as a state machine with distinct entrance into
and exit from client puzzle mode levels or statistical regressions are also possible.
3.2
Implementation
In order to measure the effectiveness of our solution,
we modified the OpenSSL library to support servers
and clients that understood our puzzle protocol. On the
server side, hooks were added to the mod ssl Apache
module and as a client the TLS enhanced version of lynx
was used.
3.2.1 The OpenSSL Library
OpenSSL is an open-source library that includes support
for the SSL and TLS protocols as well as the underlying
cryptographic operations needed by SSL and TLS [5].
OpenSSL handles connections on a per-socket basis and
does not keep any global process state. This prevents a
clean separation between our modified OpenSSL library
and server applications because we need to measure the
Figure 4: Control flow in OpenSSL with client puzzles
global server load. On the client side however, the application never needs to be aware whether the puzzle protocol took place. Clients can trivially support the protocol
just by relinking with the modified library.
In OpenSSL, the TLS handshake is implemented as a
state machine representing the current location in the
protocol. To add support for puzzles on the server, a new
state was added after the server certificate request state.
In this state the server either sends a puzzle request and
switches to a state waiting to receive the puzzle reply
or immediately switches to the server done state. The
puzzle reply state will wait to receive a puzzle solution
before switching to the server done state. If a puzzle
solution is never received the connection will time out.
This is diagrammed in Figure 3.
On the client side, we treat the reception of a puzzle as
an “unexpected event”, in the incoming message handler because the client is expecting a handshake record.
The puzzle solution is then computed and returned to the
server before the handshake processing continues.
The biggest challenge is deciding whether a server
should send a client puzzle. Because OpenSSL has no
notion of application or system wide state, it has no way
to count the number of RSA operations a server has committed to. To remedy this problem, we provide callbacks
to alert the application whenever we commit to or finish
an RSA private decryption. We also add a callback that
120
Exit Puzzle
Mode Level
100
Commited RSA operations
RSA Operations Commited To
Enter Puzzle
Mode Level
80
60
40
In Puzzle
Mode
20
Time
Figure 5: Puzzle states
allows the server to decide whether to send a client puzzle on the current connection, and if so, how many bits
the puzzle should be. This control flow is shown in Figure 4.
3.2.2 The Server
Our server implementation was based on the Apache
webserver [1] with the mod ssl module [4]. In Apache,
every connection is handled by a different UNIX process. This makes sharing information between connections rather difficult. We needed to count the total number of RSA private key operations that all of the Apache
processes had committed to in order to correctly determine when to send client puzzles. We used a page of
shared memory, protected by file-based mutexes, to store
the count of committed operations, and also whether a
puzzle was sent on the last connection. This provided us
with a reasonably simple and efficient implementation.
The server uses two user specified values to tell
OpenSSL whether to send a client puzzle. One value is
the maximum number of private key operations to commit to before sending puzzles. Once it starts sending
puzzles it continues until the number of committed operations drops below the other specified value. We use
separate thresholds for turning on and off puzzles to provide some hysteresis so that the server does not oscillate
in and out of puzzle mode too rapidly. This process is
outlined in Figure 5. The selection of these values is
described in Section 4.
0
0
200
400
600
800
1000
Requests
1200
1400
1600
1800
2000
Figure 6: Committed RSA operations at each request
without puzzles (during attack)
3.2.3 The Client
Integrating this scheme on the client side was surprisingly simple. It can be added to any client that supports
TLS through OpenSSL simply by relinking with the new
library. Initially we had planned to add a status message
to alert the user when a puzzle was being completed, but
the time needed to complete a puzzle is so short that this
did not end up being necessary.
4 Analysis
In this section we analyze the effectiveness of our client
puzzle protocol in protecting a webserver against a TLS
based denial of service attack. To benchmark the server,
we used a modified version of Dan Boneh’s multithreaded TLS benchmark. Our test server was an 750
MHz Athlon with 256 MB of RAM running FreeBSD
4.0. Using OpenSSL’s RSA implementation and benchmarks, the server was able to complete 148 1024-bit private key operations a second.
4.1
Performance Without Client Puzzles
We first ran our benchmark on an copy of Apache version 1.3.12 with mod ssl version 2.6.5 without support for the puzzle protocol. We did add some minimal profiling support to this build. Using one client and
5.5
80
5
70
4.5
60
Commited RSA operations
4
Latency (seconds)
3.5
3
2.5
2
1.5
50
40
30
20
1
10
0.5
0
0
0
5
10
15
Request
20
25
30
Figure 7: Latency for a legitimate client without puzzles
(during and after attack).
less than 550Kbps of traffic, we were able to completely
load the server. As shown in Figure 6, the number of
pending RSA operations was continually increasing for
the first 100 TLS connections made to the server during a simulated attack. At this point, there were no
more Apache processes available to handle additional requests, so the number of pending requests falls as RSA
operations complete with no new operations being committed to, as clients are unable to make new connections
to the server. Figure 7 shows the latency experienced by
a legitimate user trying to connect to the server during
this period during a representative benchmarked run. By
using two attacking computers, we were able to double
the latency experienced by the legitimate user. These
simulated attacks can be continued indefinitely by the
attacking computers. These results show that an unprotected TLS server is indeed vulnerable to these attacks.
4.2
Performance With Client Puzzles
The situation when using the client puzzle enabled version of Apache is much better. During the non-client
puzzle tests, we observed that the loaded server was able
to complete approximately 60 requests per second. We
therefore decided to set the upper bounds on committed
RSA operations to 40. That way, a client would have
to wait no more that a second to connect. After that
bound was reached, we decided to leave the system in
puzzle mode until the number of committed RSA operations dropped below 10. This prevented the server from
switching into and out of puzzle mode too often dur-
0
200
400
600
800
1000
Requests
1200
1400
1600
1800
2000
Figure 8: Committed RSA operations at each request
with puzzles (during attack). Note the different scale
from figure 6.
ing a continuous attack. After testing different values
for the length of the client puzzles to send, we settled
on 20-bit puzzles, as a value that stopped even multiple
attackers but did not disrupt client operations. A possible improvement to this scheme would be to increase
the puzzle length depending on the length of time that
an attack has been taking place. This would slow down
attackers even further, but would also cause legitimate
client wait times to increase.
Figure 9 shows the latency for a legitimate client during
a denial of service attack waged against the modified
server using multiple attackers. The number of pending RSA operations is shown in Figure 8. The server
console remained usable throughout the attacks and its
processor was never completely loaded. The extremely
low, and neat constant, latency is the key metric. These
results indicate that the puzzle protocol was effective in
preventing the TLS based denial of service attack.
4.3
Security Considerations
Because the whole point of using TLS is to provide a secure connection between hosts, we look now to the security implications of our client puzzle protocol. We begin
by noting that there are no shared keys between the client
puzzle protocol and the handshake protocol. The client
puzzle protocol merely “stops” the TLS handshake protocol until it completes. Because the handshake protocol
is not time dependent (with the exception of a times-
6 Conclusions
1
0.9
0.8
Client puzzles are an effective means of countering a
denial-of-service attack against TLS servers. We have
presented an implementation that remains fully compatible with existing TLS clients when the server is not under attack. This is the best possible compatibility. We
have shown that puzzle sizes can be chosen that keep the
server available, even under duress, while adding latency
below the humanly perceptible threshold for interactive
response.
Latency (seconds)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
Request
20
25
30
Figure 9: Latency for a legitimate client with puzzles
(during and after attack). Note the different scale as
compared to Figure 7.
tamp sent at the beginning of the protocol that does not
even need to be correct), this does not seem to negatively affect security. The only possible problem is that
the puzzle protocol is generating random data from the
same pool as the handshake protocol. However, as long
as the random number generator used in the TLS implementation is not poly-time distinguishable from true
randomness, the client puzzle protocol should have no
effect on the security of TLS. Of course, a TLS implementation that uses an insecure random number generator has much more serious security problems that are
beyond the scope of this work.
5 Future Work
Acknowledgments
We thank Matt Franklin and Dan Boneh for useful discussion about this work. We thank Diana Smetters for
helpful comments that improved the presentation of this
paper.
References
[1] The Apache HTTP server project. http://www.
apache.org/httpd.html.
[2] Tuomas Aura, Pekka Nikander, and Jussipekka
Leiwo. Dos-resistant authentication with client puzzles. In Proceedings of the Cambridge Security
Protocols Workshop 2000, LNCS, Cambridge, UK,
April 2000. Springer-Verlag.
[3] Cynthia Dwork and Moni Naor. Pricing via processing or combatting junk mail. In Ernest F. Brickell,
editor, Proc. CRYPTO 92, pages 139–147. SpringerVerlag, 1992. Lecture Notes in Computer Science
No. 740.
[4] Ralf S. Engelschall. mod ssl: The Apache interface
to OpenSSL. http://www.modssl.org/.
[5] Ralf S. Engelschall. Openssl: The open source
toolkit for SSL/TLS. http://www.openssl.
org/.
We have built a functional prototype and examined
its behavior. While the system behaves well, further
improvements could be made with more sophisticated
strategies for determining when to start and stop the puzzle requests. A further degree of control is available by
dynamically adjusting puzzle length depending on conditions at that moment on the server. Of course, a security proof for this scheme would be desirable.
[6] Matthew K. Franklin and Dahlia Malkhi. Auditable
metering with lightweight security. Journal of Computer Security, 6(4):237–255, 1998.
[7] Ari Juels and John Brainard. Client puzzles: A cryptographic defense against connection depletion attacks. In S. Kent, editor, Proceedings of NDSS ’99,
pages 151–165, 1999.
[8] R. C. Merkle. Secure communications over insecure
channels. Communications of the ACM, 21:294–
299, April 1978.
[9] Ronald L. Rivest, Adi Shamir, and David A. Wagner. Time-lock puzzles and timed-release cryptography. (Preliminary version posted on the web by
Rivest.).
Download