NIST Special Publication 800-XXX The Customizeable Shake Function (Cshake) John Kelsey Computer Security Division Information Technology Laboratory http://dx.doi.org/10.6028/NIST.SP.XXX Month and Year of Publication U.S. Department of Commerce Penny Pritzker, Secretary National Institute of Standards and Technology Willie May, Under Secretary of Commerce for Standards and Technology and Director Authority This publication has been developed by NIST to further its statutory responsibilities under the Federal Information Security Management Act (FISMA), Public Law (P.L.) 107-347. NIST is responsible for developing information security standards and guidelines, including minimum requirements for Federal information systems, but such standards and guidelines shall not apply to national security systems without the express approval of appropriate Federal officials exercising policy authority over such systems. This guideline is consistent with the requirements of the Office of Management and Budget (OMB) Circular A-130, Section 8b(3), Securing Agency Information Systems, as analyzed in Circular A-130, Appendix IV: Analysis of Key Sections. Supplemental information is provided in Circular A-130, Appendix III, Security of Federal Automated Information Resources. Nothing in this publication should be taken to contradict the standards and guidelines made mandatory and binding on Federal agencies by the Secretary of Commerce under statutory authority. Nor should these guidelines be interpreted as altering or superseding the existing authorities of the Secretary of Commerce, Director of the OMB, or any other Federal official. This publication may be used by nongovernmental organizations on a voluntary basis and is not subject to copyright in the United States. Attribution would, however, be appreciated by NIST. National Institute of Standards and Technology Special Publication 800-XXX Natl. Inst. Stand. Technol. Spec. Publ. 800-XXX, NNN pages (Month YYYY) http://dx.doi.org/10.6028/NIST.SP.XXX CODEN: NSPUE2 Certain commercial entities, equipment, or materials may be identified in this document in order to describe an experimental procedure or concept adequately. Such identification is not intended to imply recommendation or endorsement by NIST, nor is it intended to imply that the entities, materials, or equipment are necessarily the best available for the purpose. There may be references in this publication to other publications currently under development by NIST in accordance with its assigned statutory responsibilities. The information in this publication, including concepts and methodologies, may be used by Federal agencies even before the completion of such companion publications. Thus, until each publication is completed, current requirements, guidelines, and procedures, where they exist, remain operative. For planning and transition purposes, Federal agencies may wish to closely follow the development of these new publications by NIST. Organizations are encouraged to review all draft publications during public comment periods and provide feedback to NIST. All NIST Computer Security Division publications, other than the ones noted above, are available at http://csrc.nist.gov/publications. Public comment period: Month Day, YYYY through Month Day, YYYY National Institute of Standards and Technology Attn: Computer Security Division, Information Technology Laboratory !1 100 Bureau Drive (Mail Stop 8930) Gaithersburg, MD 20899-8930 Email: internal-hash@nist.gov Reports on Computer Systems Technology The Information Technology Laboratory (ITL) at the National Institute of Standards and Technology (NIST) promotes the U.S. economy and public welfare by providing technical leadership for the Nation’s measurement and standards infrastructure. ITL develops tests, test methods, reference data, proof of concept implementations, and technical analyses to advance the development and productive use of information technology. ITL’s responsibilities include the development of management, administrative, technical, and physical standards and guidelines for the cost-effective security and privacy of other than national security-related information in Federal information systems. The Special Publication 800-series reports on ITL’s research, guidelines, and outreach efforts in information system security, and its collaborative activities with industry, government, and academic organizations. Abstract This Recommendation specifies Cshake, a customizeable variant of Shake128 and Shake256, as defined in FIPS 202. Cshake provides a rich functionality for customizing the behavior of the Shake functions, which may be used both directly by users, and by NIST in defining addtional named functions. Keywords hash function; cryptography; information security; integrity; KECCAK; pseudorandom function; SHA-3. !2 Acknowledgements The author thanks the KECCAK team members: Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. !3 Table of Contents 1. INTRODUCTION ............................................................................... 5 2. GLOSSARY...................................................................................... 6 2.1 TERMS AND ACRONYMS ....................................................................... 6 2.2 BASIC OPERATIONS AND FUNCTIONS ..........................................................7 3. PRELIMINARY FUNCTIONS AND CONSTANTS ................................................8 3.1 OVERVIEW .......................................................................................8 3.2 ENCODING STRINGS..............................................................................8 3.3 PADDING ........................................................................................8 4. CSHAKE...........................................................................................9 4.1 OVERVIEW .......................................................................................9 4.2 PARAMETERS ....................................................................................9 4.3 CSHAKE DEFINITION BASED ON KECCAK ........................................................10 5 SECURITY PROPERTIES ........................................................................10 5.1 EQUIVALENT SECURITY TO SHAKE FOR ANY LEGAL S, N ........................................10 5.2 ANY CHANGE TO S, N, OR BOTH LEADS TO COMPLETELY UNRELATED FUNCTIONS ..............11 5.3 SEPARATION BETWEEN N AND S ................................................................11 6 USING THE CUSTOMIZATION STRING ........................................................11 6 USING THE NAME TO DEFINE ADDITIONAL SHA3-DERIVED FUNCTIONS ...............12 7 PERFORMANCE ISSUES .........................................................................13 REFERENCES .......................................................................................14 APPENDIX A: INTEGER TO BYTE STRING ENCODING ........................................14 !4 1. Introduction FIPS 202 introduces a new kind of cryptographic primitive, called a XOF (eXtendible Output Function). The specific XOFs defined in FIPS 202 are called Shake128 and Shake256. Unlike earlier hash functions, the Shakes are named for their expected security level. FIPS 202 also provides a flexible scheme for domain separation between different functions derived from Keccak. This is used to ensure that different named functions (such as SHA3-512 and Shake128) give unrelated outputs. However, the domain separation also makes it possible, with some additional work, to offer users the ability to customize their use of these and other functions. Allowing a user to customize a particular use of a function is analogous to strong typing in a programming language--it makes it virtually certain that computing that function with two different customizations will not give the same answer, and thus that (for example), a key fingerprint and an email signature can never be confused for each other. In this document, we define two new functions: Cshake128 and Cshake256. Each function is based on Keccak as defined in FIPS202, and provides a customizeable version of the Shake functions from that document. These functions have the following properties: a. Cshake128 provides a 128-bit security level; Cshake256 provides a 256-bit security level. b. Both Cshake functions take four parameters: • An input string, X • An output length in bits, L • An optional customization string, S, a byte string which may be empty. (An empty string should be considered the "default value" for S.) • An optional function name string, N, a byte string which may be empty. (An empty string should be considered the "default value" for N.) c. When S and N are both empty strings, Cshake128 behaves exactly like Shake128, and Cshake256 behaves exactly like Shake256. Thus, Cshake provides a kind of backward-compatibility with Shake as defined in FIPS 202. d. By convention, S is an optional user-selected customization string, useful for naming a particular use of a function. e. By convention, N is an optional string describing the name of some function defined by NIST in terms of Cshake, to provide some additional useful functionality beyond what SHA3 and Shake provide. Only NIST-defined name strings should be used, but an implementation of Cshake should usually not try to enforce this, as it would complicate the definition and use of additional NIST-defined functions derived from Cshake. !5 f. An implementation of Cshake may reasonably support only byte-oriented output lengths; if so, a request for a non-byte-oriented output length would result in an error. 2. Glossary In this document, bits are indicated in the Courier New font. Bytes are typically written as two-digit hexadecimal numbers from the ASCII characters 0 through 9 and A through F, preceded by the prefix “0x”. In binary representation, bytes are written low order bit first, while in hexadecimal representation, bytes are written with the high order digit first. E.g., 0x01 = 10000000 and 0x80 = 00000001. These bit-ordering conventions follow the conventions established in Sec. B.1 of [5]. 2.1 Terms and Acronyms Bit A binary digit: 0 or 1. Capacity In the sponge construction, the width of the underlying function minus the rate. Domain Separation For a function, a partitioning of the inputs to different application domains so that no input is assigned to more than one domain. Extendable-Output Function (XOF) A function on bit strings in which the output can be extended to any desired length. FIPS Federal Information Processing Standard. FISMA Federal Information Security Management Act. Hash Function A function on bit strings in which the length of the output is fixed. The output often serves as a condensed representation of the input. HMAC Keyed-Hash Message Authentication Code. KDF Key Derivation Function. KECCAK The family of all sponge functions with a KECCAK-f permutation as the underlying function and multi-rate padding as the padding rule. KECCAK is standardized in [5] and was originally specified in [7]. !6 KMAC KECCAK Message Authentication Code. Length For a given bit string, the number of bits in the string. MAC Message Authentication Code. NIST National Institute of Standards and Technology. PRF See Pseudorandom Function. Pseudorandom Function (PRF) A function that can be used to generate output from a random seed and a data variable, such that the output is computationally indistinguishable from truly random output. Rate In the sponge construction, the number of input bits processed or output bits generated per invocation of the underlying function. SHA-3 Secure Hash Algorithm-3. Sponge Construction The method originally specified in [6] for defining a function from the following: 1) an underlying function on bit strings of a fixed length, 2) a padding rule, and 3) a rate. Both the input and the output of the resulting function are bit strings that can be arbitrarily long. Sponge Function A function that is defined according to the sponge construction, possibly specialized to a fixed output length. String A sequence of bits. XOF See extendable-output function. 2.2 Basic Operations and Functions [T]2 An integer T represented as a binary string (denoted by the “2”) with a length specified by the function, an algorithm, or a protocol that uses T as an input. ⌈x⌉ For a real number x, ⌈x⌉ is the least integer that is not strictly less than x. For example, ⌈3.2⌉ = 4, ⌈−3.2⌉ = −3, and ⌈6⌉ = 6. !7 0s For a positive integer s, 0s is the string that consists of s consecutive 0s. enc8(i) For an integer i ranging from 0 to 255, enc8(i) is the byte encoding of i, with bit 0 being the low order bit of the byte. len(X) For a bit string X, len(X) is the length of X in bits. X || Y For strings X and Y, X || Y is the concatenation of X and Y. For example, 11001 || 010 = 11001010. left_encode(n) A function for encoding an integer n as a string, so that the string may be unambiguously parsed from the beginning. The definition of left_encode appears in Appendix A. right_encode(n) A function for encoding an integer n as a string, so that the string may be unambiguously parsed from the end. The definition of right_encode appears in Appendix A. 3. Preliminary Functions and Constants 3.1 Overview The following internal functions are used in the definition of Cshake in the remainder of this Recommendation. 3.2 Encoding Strings The string_encode function is used to encode strings in a way that may be parsed unambiguously from the beginning of the string. The function is defined as follows: string_encode(S): if len(S) is not divisible by 8: raise an error condition return left_encode(len(S)/8) || S 3.3 Padding !8 The bytepad(X,w) function encodes an input string X in a way that can be parsed unambiguously from the beginning of the string, and that also takes up an integer multiple of w bytes. The definition of pad() is as follows: bytepad(K,w): if len(K) is not divisible by 8: raise an error condition if w <1: raise an error condition z=K while (len(z)/8) mod w != 0: z = z || 0x00 4. Cshake 4.1 Overview Cshake128( X, L, C, N) and Cshake256( X, L, C, N) are defined in terms of the Shake and Keccak functions, both of which appear in FIPS 202. 4.2 Parameters The parameters of Cshake are: • • • • X = the input string, which must be a byte string L = the output length requested, in bits C = the customization string, with a default value of "" (empty string) N = the function name, with a default value of "" (empty string) When C and N are both set to the empty string, Cshake(X, L, S, N) works exactly like Shake as defined in FIPS 202. Thus Cshake128(X, L, "", "") = Shake128(X, L) Cshake256(X, L, "", "") = Shake256(X, L) Cshake is designed so that for any two instances Cshake(X1, L1, S1, N1) Cshake(X2, L2, S2, N2) unless S1==S2 and N1==N2, the two instances are completely unrelated; knowledge of Cshake(X1, L1, S1, N1) gives no information about the value of Cshake(X2, L2, S2, N2) !9 for any choice of the inputs such that S1<>S2 and N1<>N2. Note that this includes the case where S1=="" and N1=="". That is, Cshake with any customization is domainseparated from ordinary Shake. Cshake itself is defined in terms of Keccak, as specified in FIPS 202. 4.3 Cshake Definition Based on Keccak Cshake either returns the result of a call to Shake (if S and N are both empty strings), or a call to Keccak with a padded encoding of S and N concatenated to the input X. Cshake128(X, L, S, N): if S=="" and N=="": return Shake128(X, L) else: return Keccak[256]( bytepad(encode_string(S) || encode_string(N), 168) || X || 00, L) Cshake256(X, L, S, N): if S=="" and N=="": return Shake256(X, L) else: return Keccak[512]( bytepad(encode_string(S) || encode_string(N), 136) || X || 00, L) 5 Security Properties 5.1 Equivalent Security to Shake for Any Legal S, N For a given choice of S and N, Cshake(X, L, S, N) has exactly the same security properties as Shake(X, L). Specifically, Cshake128() claims a security level of 128 bits, and Cshake256 claims a security level of 256 bits. When Cshake128() is called with an output of at least 256 bits, the function provides 128 bits of collision-resistance; that is, an attacker seeking to find a pair of inputs X1, X2 such that Cshake128(X1, L, S, N) == Cshake128(X2, L, S, N) expects to need at least 2^{128} operations to find such a pair. !10 When Cshake128() is called with an output of at least 128 bits, the function provides 128 bits of preimage-resistance: that is, an attacker given a target value T and seeking to find some input X such that Cshake128(X, L, S, N) = T, expects to need at least 2^{128} operations to find such a value. Similarly, Cshake256(), when called with an output of at least 512 bits, provides 256 bits of collision resistance, and when called with an output of at least 256 bits, provides 256 bits of preimage resistance. 5.2 Any Change to S, N, or Both Leads to Completely Unrelated Functions Suppose that either S1 <> S2, or N1 <> N2, or both. Then, f1(X, L) = Cshake(X, L, S1, N1) and f2(X,L) = Cshake(X, L, S2, N2) are entirely unrelated functions. Specifically, knowing the value of Cshake(X, L, S1, N1) gives an attacker no information at all about the value of Cshake(X', L', S2, N2) for any X', L'. 5.3 Separation Between N and S The padding scheme used to define Cshake encodes the two strings in a way that can be parsed unambiguously from the beginning of the string input to Keccak. N and S are separated in the padding, and so there is no ambiguity introduced between the contents of N and S. 6 Using the Customization String Cshake provides an input string intended to allow users to customize their use of the function. For example, someone using Cshake128 to compute a key fingerprint (the hash of a public key) might use: KF = Cshake128( public_key, 256, "key fingerprint", "") Later, the same user might decide to customize a different Cshake computation used for signing an email: H = Cshake128( email_contents, 256, "email signature", "") !11 The power of the customization string is that there is now essentially no chance of a collision between these two values--it will never be possible for an attacker to somehow use one computation (the email signature) to get the result of the other computation (the key fingerprint). Conceptually, this is like strong typing in a programming language. The result of computing Cshake128() for a key fingerprint and for an email signature are different "types," and so they will never give the same result. Thus KF == H has a negligible probability of being true. S may be any legal sequence of bytes. However, implementations may restrict the length of S they will accept. 6 Using the Name to Define Additional SHA3-Derived Functions Cshake also includes a name input (N). This is intended for use by NIST in defining additional SHA3-derived functions, and should only be set to values defined by NIST for named Keccak-derived functions. This provides a level of domain separation by function name. Users of Cshake should not make up their own names--that kind of customization is the purpose of the customization string S. In order to define a new SHA3-derived function, N is set to a new reserved value that isn't the empty string, and typically some additional operations are done to construct the inputs to Cshake. For example, a not-very-useful function to generate a single bit from the date could be defined as follows: bit_from_time(year, month, day, S): X = year as a 4-digit decimal character string X = X || month as a 2-digit decimal character string X = X || day as a 2-digit decimal character string N = "bit_from_time" L=1 return Cshake128(X, L, S, N) Note that this example function is customizeable--any string contents in S will customize the function, so that bit_from_time(2000,01,01,"hello, world") is unrelated to bit_from_time(2000,01,01,"happy new year!"). !12 7 Performance Issues Cshake is defined to fill one entire call to the underlying Keccak-F function with the padded C and N. However, an efficient implementation will precompute the result of processing this padded block with Cshake, and so will suffer no performance penalty when reusing the same choices of C and N multiple times. !13 References 1. Federal Information Processing Standards Publication 180-4, Secure Hash Standard (SHS), Information Technology Laboratory, National Institute of Standards and Technology, March 2012, http://csrc.nist.gov/publications/fips/ fips180-4/fips-180-4.pdf. 2. R. Merkle, “One way hash functions and DES,” Advances in Cryptology CRYPTO '89 Proceedings, Lecture Notes in Computer Science, Vol. 435, G. Brassard, ed., Springer-Verlag, 1989, pp. 428-446. 3. I. Damgård, “A Design Principle for Hash Functions,” Advances in Cryptology CRYPTO '89 Proceedings, Lecture Notes in Computer Science, Vol. 435, G. Brassard, ed., Springer-Verlag, 1989, pp. 416-427. 4. Federal Information Processing Standards Publication 198-1, The Keyed-Hash Message Authentication Code (HMAC), Information Technology Laboratory, National Institute of Standards and Technology, July 2008, http://csrc.nist.gov/ publications/fips/fips198-1/FIPS-198-1_final.pdf. 5. Federal Information Processing Standards Publication 202, the SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions, Information Technology Laboratory, National Institute of Standards and Technology, August 2015, http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf. 6. G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche, “Cryptographic sponge functions,” January 2011, http://sponge.noekeon.org/CSF-0.1.pdf. 7. G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche an, “The KECCAK reference, version 3.0,” January 2011, http://keccak.noekeon.org/Keccakreference-3.0.pdf. 8. G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, and R. Van Keer, “CAESAR submission: KETJE v1,” March 2014, http://competitions.cr.yp.to/round1/ ketjev1.pdf. Appendix A: Integer to Byte String Encoding This Recommendation uses two internal functions for encoding integers as strings. Both functions are capable of encoding integers up to an extremely large maximum. The !14 largest integer that may be encoded (max_integer) is also a constant used in the remainder of this document. left_encode(n) encodes the integer n as a string in a way that can be unambiguously parsed from the beginning of the string. right_encode(n) encodes the integer n as a string in a way that can be unambiguously parsed from the end of the string. [[ Note: I'm more than happy to take someone else's standard encoding scheme here, I just want one that parses from the left and one that parses from the right. --JMK]] The definitions (using enc8() to encode individual bytes) is as follows: right_encode(x): 1. n is the smallest integer for which 28n > x. 2. Let x1, x2, …, xn be the base-256 digits of x satisfying: x = ∑ 28(n-i)xi, for i = 1 to n. 3. Let Oi = enc8(xi), for i = 1 to n. 4. Let On+1 = enc8(n). 5. Return O = O1 || O2 || … || On || On+1. left_encode(x): 1. n is the smallest integer for which 28n > x. 2. Let x1, x2, …, xn be the base-256 digits of x satisfying: x = ∑ 28(n-i)xi, for i = 1 to n. 3. Let Oi = enc8(xi), for i = 1 to n. 4. Let O0 = enc8(n). 5. Return O = O0 || O1 || … || On-1 || On. !15 !16