Encryption Key Recovery Using Coupons 2

advertisement
Cloud Based Recoverable Encryption through Noised Secret with
Discounts
Sushil Jajodia, Witold Litwin, Thomas Schwarz
April 12, 2013
Extended Abstract
Abstract. Key loss danger is the Achille’s heel of cryptography. Key backup copies with
escrows help, but favor disclosures. Schemes for recoverable encryption through
noised secret alleviate the dilemma. The escrow’s backup is specifically encrypted. The
recovery must use a cloud, possibly thousands nodes large. Cloud cost, money trail
etc. may be expected to rarefy Illegal attempts. We now add to known schemes the
concept of a discount. The recovery requestor optionally provides the discount with
the request. The discount contains the code lowering the recovery complexity, easily
by orders of magnitude. A smaller cloud may suffice for the same recovery timing.
Alternatively, same cloud may provide faster recovery etc. We define the concept of a
coupon and adapt known schemes for recoverable encryption through noised secret.
We analyze various properties of the new schemes.
1 Introduction
Key recovery is a classical goal. Key escrow idea, where a key copy is backed up with some
trusted service, was proposed as a basis for solutions. The approach did not become
popular. Key disclosure creates temptations to which an escrow may not resist, as it is now
well-known. Recoverable encryption concept was meant as basis for more real-life solutions.
The backup is by itself supposed encrypted in a novel way. This encryption should make (1)
the brute-force recovery, i.e., from the backup alone, always feasible, although
computationally hard, unlike for a key copy. The recovery complexity (hardness) should be
(2) arbitrarily fixed by the key owner, depending on the trust in the escrow. As the result,
the recovery decryption on the escrow’s site (node) alone, should become (3) impractical,
e.g., could usually last dozens of days at least.
Nevertheless, (4), the maximal recovery time should speed-up over a cloud, with possibly
linear O (M / N) speed up. Here M is key-owner defined integer providing O (M) complexity
and N is the cloud size in nodes. This time is expected provided by the recovery requestor.
Practical timing, e.g., in minutes, should (5) imply N in thousands. The service (hiring) of
such a large cloud may be expected usually noticeable, as well as somehow costly and
traceable, e.g., through money trial. All these properties should predictably make an illegal
recovery attempts, e.g., by an escrow side insider, arbitrarily less tempting than up to now.
The class of recoverable encryption schemes termed through noised secret sharing, RENS in
short, proved the concept feasible over the current cloud infrastructure, [ ]. We now extend
these schemes with the concept of discount. As the name suggests, a discount contains a
code that lowers the recovery cost with respect to a brute-force one. Technically, it lowers
the calculation complexity, easily by orders of magnitude. We speak then below about the
discounted recovery. The discounted recovery time, whether the worst case or the average
one, is then accordingly smaller, for the same N. A discount of, say 50%, halves it.
Alternatively, the discounted recovery uses a smaller cloud for the same timing, e.g., twice
as small. Etc.
The requestor sends the discount to the escrow within the recovery request. The escrow is
not aware of any discount code otherwise. The code amends the otherwise brute-force RENS
request sent to the cloud. If the requestor is unable to provide a coupon, the brute-force
recovery is always possible. In this sense, a key recoverable according to an RENS scheme is
never lost.
For any given key, only specific discount codes lead to the recovery. Any discount provided
triggers nevertheless a recovery attempt with the associated lower cost. An unsuccessful
recovery also respects the requestor’s timeline. It doubles however the cost of the
successful one. Full cost brute-force recovery remains always an option. Finally, every
discount is guaranteed successful only for one key. For any different one, it basically acts as
random guess of what could be the one.
Key owner may send-out discounts to other potential requestors. Different discounts for the
same backup are possible. Greater discount may reflect higher trust that the selected user
performs the legitimate recovery only.
Below, we first define and analyze the RENS recovery with the coupons using the simplest
code expressions that we will define and schemes based on, so-called, 2-share noised secret.
We reuse for this purpose the schemes in []. We define successively the backup creation,
then the discounted recovery calculation. Next, we analyze the correctness, the complexity
and the safety of the resulting schemes. Next, we discuss other useful coupon code
specifications. In Section _, we address similarly the schemes using more noised shares. We
then discuss the related work and we finally conclude.
2 Backup Creation
Let S be the key to backup, e.g. a 256b long AES key. The key owner or the agent, i.e., a client
program, say, C running on owner’s site, first create a usual 2-share secret, with shares, say
s0 and s1 = S XOR s0. It is the common knowledge that S = s0 XOR s1. Next, the owner chooses
some time D, e.g., 70 days. D is the presumed escrow’s site alone recovery time of S,
assuming a 1-node (core) site.
After that, C defines the hint h = H (s0), using some one-way hash function H, e.g., H = SHA256.
We recall that in practice (i) h is unique for any s0 and (ii) it is impossible for a good H such as
the one mentioned, to find s0 as H-1(h). Next, C determines the number M of match attempts
H (s) =? H that 1-node site could perform in time D, where s is an integer that could be s0.
Then, C chooses a random noise n that is an integer in the noise space I = [0, M[. An integer f
= s0 – n becomes the base noise share, while we call s0 a noised one. The name comes from
the backup representation of s0 that is P = (f, h, M). The backup sent to the escrow is the
couple (s1, P).
3 Discount
In [ ], the brute-force recovery, i.e., using exclusively the backup as defined there and until
now here, we recall, was the only capability of RENS schemes defined there. One may
nevertheless observe that the requestor could in fact have some prior useful knowledge of
s0. The analysis in Section_ shows that the recovery calculation can be effectively make good
use of such knowledge. Transmitted with the backup request, it may lower the recovery
computation complexity. For instance, the key owner could note that s0 is an odd integer.
This would lower the complexity by 50%, as we show. We say that any such knowledge
defines a discount, of 50% in this case.
More precisely, the key owner defines the discount for a given backup according to an RENS
scheme, through some discount code. At its simplest, the code is an m-bit suffix of the
noised share s0 ; m = 0,1…. We implicitly focus on such codes, until we say otherwise. The
value m = 0 tacitly means the no-discount request, i.e., for the brute-force recovery.
Otherwise, we expect m = 8 or m = 16 at most, in practice. We call discount value the
reduction with respect to the brute-force recovery complexity that the recovery with the
code provided offers. The analysis later on shows that the value of any m-bit long code is 2m
for both, worst case and average complexities besides. We expect thus each complexity
reduction of up to 28 = 256 and 216 = 64K times, respectively. Codes that long may appear as
one or two ASCII digits. One may expect them easy to safely keep, e.g., on a smartphone, or
just in memory. Recall that Europeans routinely keep in mind the 4-digit credit card codes.
As the result, these codes may lower the recovery time 28 ÷ 32 times with respect to the
brute-force figures. Alternatively, they may reduce the cloud size for the same timing. Au
finale, they discount accordingly the cloud cost. The value m = 0 tacitly means a nondiscounted request, i.e., for the brute-force recovery. Obviously, the necessary condition for
a successful match attempt is that the noise share embeds the code. The discounted
recovery calculation we define below attempts matches only for such shares.
4 Recovery
4.1 Recovery Request
The escrow performs the recovery upon the legitimate request for. How the escrow knows
which request is legitimate is out of scope here. Recovery schemes with discount discussed
below reuse the scheme for brute-force recovery only defined in [_]. The recovery request
has in particular the same form, augmented however with the discount code. It is thus
formally the tuple Pd = (P, R, d). As for the brute-force recovery alone, here, R designates the
desired time bound on the recovery time, e.g., 10 mins.
4.2 Brute-Force Recovery
If the request has no coupon, i.e., m = 0 in d, the escrow proceeds with the brute-force
recovery decryption. The schemes in [] apply then as is. The escrow forwards thus Pd to
some cloud node, called coordinator, with the exception of s1. In this way no cloud insider
can disclose the recovered key.
With respect to the actual execution on the cloud, managed by the coordinator, we recall
that there were basically two schemes: so called with static or scalable partitioning. The
former was proposed for a homogenous cloud, the latter for a heterogeneous one. Their
common characteristic is that the recovery calculations attempt the matches over different
noise shares f + m, until the successful match. This one must occur, but attempts may
possibly explore even every m in I. Both schemes partition the attempts over N nodes, with
the linear speed-up O (N). The choice of N value depends on the scheme. In both cases, it
makes the recovery computation at each node fitting the time bound provided by the
requestor, e.g., 10 mins. As the result, the whole calculation fits this bound. Typically, N
should be possibly in thousands, as we discussed.
The cloud delivers the noised share s0 found to the escrow. The escrow XORs it with s1 and,
finally, delivers the key to the requestor.
4.3 Discounted Recovery
The discounted recovery request differs from the brute-force one only by additional
presence of the discount code with m > 0. The cloud uses either scheme defined for bruteforce recovery with the following modifications.
1. The coordinator calculates M’ = M \ 2m.
2. It initiates the static or the scalable scheme with M’ instead of M. We recall that this
step determines N, in function of M’ and R.
3. It determines the smallest noise share with suffix d that is greater or equal to f. It may
calculate it as follows:
d’ = f – f \ 2m * 2m ;
/* f ‘ is the m-bit suffix of f
If d’ = d then f’ = f ; else
f ’ = f \ 2m * 2 m + d ;
/* f ‘ is an integer ending with d
4.
5.
6.
7.
If f ’ < f then f ’ := f’ + 2m ; /* f’ is the smallest noise share ending
/* with d
It delivers (1) the “usual” brute-force request for match attempts and (2) d in addition,
to each of N nodes.
Using M’ instead of M, every node calculates one after another each value of noise n for
which it should generate the noise share s for match attempt H (s) =? h.
For each n, the node calculates s as s = f’ + n * 2m
As for the brute-force recovery, except for M’ instead of M, the node starts match
attempts with some n, usually n = 0. If successful, the node reports the result to the
coordinator, unless the node is the coordinator itself. Otherwise, the node tries out
some next n. Node continues, till the last n < M’ or until the node receives the message
from the coordinator requesting to stop the attempts.
8. Assuming the cloud finds the noised share s0, it returns it to the escrow. The escrow
XORs it with s1 and returns the recovered key to the requestor.
If the cloud finds s0, we say that the discount was valid. Otherwise, the cloud got an invalid
one. The legitimate requestor made perhaps some error, or the discount came from an
intruder… The cloud acts in the same way for a valid or an invalid d. There no way for the
coordinator to distinguish between both upfront. Notice that the brute-force recovery
(normally) always terminates with the successful search only.
Ex. 1. Consider that M = 250 that should be rather typical. We suppose the noise shares 256b
long, as an AES key. Next, let f be f = ‘10….1011’ and m = 2. We further suppose d = ‘01’. The
coordinator calculates M’ as M = 248. Let us suppose then that the static scheme is used and that
after the calculations for M’ as in [ ] for M, we have N = 1K, hence n = 0,1…N – 1. The coordinator
calculates first f’ = ‘11’. Since f’ is not d, the next calculation yields f’= ‘10….1001’. We have f’ < f
and since f is the minimal noise share, f’ cannot be one. The smallest noise share with suffix
d is therefore f’ = f’ + 22 = ‘10….1111’. Node 0 attempts the matches for noises k = 0, 1024,
2048…, i.e., with each successive k such that k mod N = 0 and till the largest such k < M’.
Each k is multiplied by 22 then added to f’, then node 0 attempts the match of the resulting
noise share, etc. Likewise, node 1, attempts the matches for noises k = 1, 1025, 2045…., i.e.,
where k mod N = 1 and also till the largest such k < M’. Etc. In general every node n attempts
in this way the matches for each and only k that yields k mod N = n.
The integer division ‘\’ by 2m denotes in fact m-bit right shift. Likewise, the multiplication by
2m denotes the m-bit left shift. Dedicated functions, when available to the compiler, may be
faster than the arithmetic calculations. There are thus various ways to implement the
algorithm we do not address further here.
5 Algorithm Analysis
5.1 Correctness
For every N, every f and every d, each RENS schema under consideration generates every
noise share ending with d. No such share is generated twice.
The proof is rather easy to see from the example. First, we start with determining the
smallest noise share, here f’, that ends up with d. The loop that attempts the matches at
each node starts with f’ or the smallest value above f’, among those handled by the node.
The value of f’ has to be greater or equal to f. By definition of f, a smaller integer simply
cannot be a noise share. The algorithm nevertheless produces at 1st the calculation of f’ with
last m bits being d’ = ‘0..0’. This f’ can be under f. The algorithm produces then f’ + d, to have
a value with the right suffix. The example showed one case, where this value is smaller than
f. That is why it was then adjusted to, obviously, the smallest possible value over f ending
with d. Next, observe that for d = ‘11’ instead, we would have d’ = d, hence f’ = f. Finally, for
f = ‘10….1001’ and d = ‘11’, the calculation f ’ = f \ 2m * 2m + d would yield directly f’ > f.
Obviously, there cannot be other cases for f’ being the smallest noise share, to begin with.
Next, it should be clear from the example that whatever is N and discount code, all M’ noises
are possibly explored, once per noise. Similar analysis holds for any partitioning that could
be generated by a scalable scheme. We skip here the tedious details, referring the reader
to [].
5.2 Complexity
An m-bit long d decreases the recovery calculation complexity (hardness) 2m times in
practice. Respectively, we have O (M / 2m) for the worst case and O (M / 2m + 1) on the
average.
Proof. For the brute-force recovery, the complexity could be measured basically by the
number of noises to try out: at most or on the average. Each noise may indeed trigger a
match attempt. The computational cost of SHA256, as well as any other known good 1-way
hash function dominated additional operations required, at the start-up or termination etc.
of the algorithms. We had thus basically the complexity of O (M) in the worst case, for both
static and scalable schemes. The algorithm for the discounted recovery has M’ noises to try
out at most, in both cases also. This is 2m time smaller. On the other hand, the discounted
recovery algorithm requires an additional initial calculation of M’. Next, it requires then then
calculation of f’. Finally, at each attempt, there is an additional multiplication by 2 m.
However, it is the common knowledge that the cumulated computational cost of a few such
operations is again negligible with respect to that of SHA256 or another good 1-way hash
calculation. Hence, we have basically the O (M’) worst case complexity, i.e., the O (M / 2m)
one.
For the average case, we had under similar assumptions O (M / 2) for the brute-force
recovery. The reason was that both schemes enumerated all attempts till the successful one,
while every noise, hence every noise share tried out, were equally likely to try out and
succeed, provided a good 1-way hash, as we supposed. For the discounted recovery, every
attempt uses again a different noise and at worst all noises M’ noises are explored. The
discount code is (pseudo)random, hence every code is equally likely. Also, the rest of s0,
beyond the discount code, is (pseudo)random. Hence, every noise share generated is again
equally likely to be the noised one, under the same good 1-way hash assumption. We thus
have on the average the O (M’ / 2) complexity, hence O (M / 2m + 1).
Ex. 2 Consider the running example in [_] where the encryption complexity is set up so that
1-node recovery would require up to prohibitive 70-days and 35 days on the average. To
recover the key in 10 min at most instead, using the brute-force, a 10K-node wide cloud may
do. The actual cost could be 200$. Consider that the owner retained some 8b discount code.
Now, 40-node cloud may suffice for the same timing. Alternatively, the same 10K cloud,
delivers the discounted recovery in up to a couple of seconds. In both cases, the cost
theoretically drops to less than 1$. A 16b discount would discount these figs respectively
further, by the same factor. The requestor could even recover the key at the her/his own
presumably single node, in about 2mins.
5.3 Safety
1. Knowledge of a discount code cannot lower the complexity of the requested backup
under values O (M’) at worst and O (M’/2) on the average, provided by our algorithm
(see below).
Proof. Our algorithm enumerates all attempts till the successful one (if any). Every attempt
uses a different noise among M’ and, at worst, all noises M’ noises must be explored. The
rest of s0, beyond the discount code, is (pseudo)random and thus independent of the
discount code value. Also, for a good 1-way hash as we suppose, each such value is equally
likely to generate the matching f’. Hence, whatever is a given a discount code, one cannot
calculate from it or otherwise any f’ that could be less or more likely than any other possible.
No method exists that would allow to attack the requested backup from its given discount,
towards lowering the complexity under that of our algorithm.
2. Guessing a discount code does not lower the complexity of any backup under O (M).
Proof’s sketch. An attacker A considers m = 1 and guesses the value of the 1b code. The
success probability is 0.5. If A succeeds there are up to M / 2 attempts and M/4 on the
average. If A, misses, there are M / 2 attempts, all unsuccessful. They have to be followed by
up to M/2 ones until the successful one. On the average there is then M/4 such attempts. In
total, the maximal complexity remains M (and O (M) more generally). The average one is
(M/4 + (M/2 + M/4)) / 2 = M. The guessing did not help. It’s easy thought tedious to see that
the end result is the same for any m > 1.
3. A discount code d for backup B does not lower the complexity of any discounted
recovery using d for a different backup B’. The latter remains O (M) characterizing bruteforce recovery of B’.
Proof. The discount codes being (pseudo)random, it would be indeed like guessing in (2).
Property 3 means that the knowledge a discount code for a backup by the escrow, does not
threaten any different backup at escrow’s possession. Or, in other words, a discount code
once used by the escrow is of no further utility.
6 Key-owner defined discount codes
The owner chooses for a key a convenient, i.e., easy to store or remember discount code.
Possibly then, one chooses the code that an outsider cannot know. Could be, e.g., 1st two
letters of the childhood pet. This way of proceeding is OK for a single backup used. The
attacker getting the knowledge of any such code within the recover request cannot lower
the complexity, since all the other bits of s0 are random. It may be in contrast risky for safety
of multiple backups. Especially, using the same code for multiple backups is an obvious
invitation to disaster.
One discount code production rule could be then that the owner derives the discount from
the ID of the encrypted dataset concatenated with in practice impossible to guess secret
convoluted passphrase, e.g., Dali’srabbit’sSwatch*shows17:15. The derivation then may use
a 1-way hash, e.g., our SHA256, being finally truncated to desired length. The rule seems
safe as long as secret data remains undisclosed. Any dictionary attack etc. appears indeed to
have impractical complexity.
6.1 Partly defined discounts
Ex. The discount code has m = 8 and it’s a number 0…9 in ASCII code. Such discount spec.
potentially reduces the recovery complexity 25.6 times.
TBC
Download