2. the encryption scheme - University of Colorado Colorado Springs

Analysis of an HMAC Based Database Encryption Scheme Brad Baker 7/24/2009 CS592 through CS960 independent study University of Colorado, Colorado Springs 1420 Austin Bluffs Pkwy Colorado Springs, CO 80918 bbaker@uccs.edu ABSTRACT Encryption in database systems is an important topic for research, as secure and efficient algorithms are needed that provide the ability to query over encrypted data and allow optimized encryption and decryption of data. Values in a database must be encrypted and decrypted separately for insertion or update, so traditional cipher chaining methods for symmetric encryption are not ideal. This paper presents an analysis of one proposed database encryption scheme for integer values, which is based on the keyed Hash Message Authentication Code (keyed HMAC) operation over numeric bucket and remainder values. This database encryption scheme was presented in “How to Construct a New Encryption Scheme Supporting Range Queries on Encrypted Database” presented by Dong Hyeok Lee, You Jin Song, Sung Min Lee, Taek Yong Nam and Jong Su Jang at the IEEE 2007 International Conference on Convergence Information Technology. This analysis includes an implementation and test of the proposed algorithm and identification of potential areas for future research on this topic. such as cipher chaining or cipher feedback are problematic in databases because ciphertext values must be independent of one another. This project studied and implemented a proposed database encryption algorithm that employs symmetric, hash based encryption. This algorithm was presented in “How to Construct a New Encryption Scheme Supporting Range Queries on Encrypted Database” presented by Dong Hyeok Lee, You Jin Song, Sung Min Lee, Taek Yong Nam and Jong Su Jang at the IEEE 2007 International Conference on Convergence Information Technology. The referenced paper reviews several algorithms for database encryption methods, and presents a new algorithm for numeric encryption using a bucket and Keyed Hash Message Authentication Code (Keyed HMAC) based process. This algorithm was implemented and tested for this project in C using the SHA1 hash algorithm. 2. THE ENCRYPTION SCHEME Database Encryption, keyed HMAC The HMAC and bucket based encryption scheme analyzed for this project has several advantages. The encryption does not preserve plaintext ordering, it protects against inference attacks, and the strength of the algorithm can be improved by using different hash mechanisms for the HMAC operation. Additionally, individual values can be securely encrypted and decrypted independently without decreasing security, so block chaining and feedback in encryption is not needed. Several performance and configuration challenges are presented with the encryption scheme, and discussed with test results. 1. INTRODUCTION 2.1 Summary of HMAC Database encryption is an important topic for continuing research, as databases are commonly used for storing and processing large amounts of sensitive data in various industries. Goals include efficient and secure algorithms that allow some forms of querying over encrypted data, simple insertion or update of encrypted values, and selective decryption of data. In the context of most database systems, symmetric algorithms are preferred for efficiency and the dual party nature of asymmetric algorithms is not needed. Typically the database administrator can use a secret key that is shared among authorized users. Traditional methods to strengthen symmetric algorithms Keyed HMAC, or Hash Message Authentication Code, is a process that uses a secret key and a hash algorithm such as MD5 or SHA-1 to generate a message authentication code. This process is symmetric, so two parties communicating with HMAC must share the same secret key. By using a hash algorithm in conjunction with a key, it prevents an unauthorized user from modifying the message or the digest without being detected. This can protect against man-in-the-middle attacks on the message, but it is not designed to encrypt the message itself; only protect it from unauthorized update. HMAC can be defined as a function that takes a key and a plaintext message as input. Any hash Categories and Subject Descriptors C. [Data]:Data Encryption – Public key cryptosystems General Terms Experimentation, Security Keywords algorithm can be used, including MD5, SHA-1, SHA-256, etc. The HMAC algorithm defines two padding constants, the inner pad and the outer pad, with values (0x3636…) and (0x5c5c…) respectively, each expanded to the block size of the hash algorithm. To calculate the HMAC, first the exclusive-or of the key and the input pad is found. This result is appended to the beginning of the message to be processed. The result is then hashed with the chosen hash algorithm, producing an intermediate digest. In the next step, the exclusive-or of the key and the output pad is found, and that result is appended to the beginning of the intermediate digest. The result is hashed again, producing the final message authentication code. This operation is summarized in Figure 1, where { } denotes exclusive-or, {++} denotes concatenation, {K} is the secret key, {m} is the plaintext message, and {H} is the hash function. Figure 1 - HMAC operation Each calculation of the HMAC digest requires running the underlying hash function twice. The output of HMAC is a binary code, equal in length to the hash function digest. This code can only be reproduced with the same key and message, and the cryptographic strength is based on the strength of the hash algorithm, which can be modified if required. 2.2 Summary of the Encryption Scheme The data encryption scheme studied for this project makes use of the HMAC operation recursively for encryption and decryption of integer data. The scheme works primarily for positive integer values, however it can be extended to real numbers by scaling with a factor of 10 to convert the real number to an integer. In addition, negative numbers cannot be stored with the algorithm as designed due to the use of modular arithmetic. It could be possible to encrypt real numbers and negative numbers if a method is designed to store the scale factor and sign of the plaintext data. On the encryption side the algorithm performs a pre-processing step, a transformation step, and storage of the encrypted data in a database. On the decryption side the algorithm uses an inverse transformation and post-processing to reproduce the plaintext. The encryption and decryption processing steps are shown in Figure 2. Figure 2 - Processing steps in encryption/decryption 3. PROCESSING DETAILS 3.1 Pre-Processing Step The pre-processing step performs modulus arithmetic on the plaintext integer, calculating the remainder or residual {r} with the formula {r = m mod Sb}, where {m} is the plaintext and {Sb} is a predefined bucket size. After calculating the residual, the bucket ID {I b} is found using the formula {Ib = (m – r)/ Sb}. As an example, when processing the integer 485,321 with a 20,000 bucket size, the residual is 5,321 and the bucket ID is 24. The bucket ID and the residual integer are encrypted separately in the transformation phase. The selection of bucket size is an important factor in the application of this encryption scheme; it will affect efficiency and validity of the encryption process if the bucket size is incorrectly selected. The modulus operation and calculation of bucket ID provide difficulty when encrypting negative integers. 3.2 Encryption Transformation The next phase of the algorithm is the transformation step which provides the primary encryption operation. The inputs for this step include a secret key, a seed value, the plaintext bucket ID and the residual. Keyed HMAC is used recursively to encrypt the bucket ID and residual independently. The encrypted bucket ID is found by calculating the HMAC repeatedly N times, where N is equal to the bucket ID. On the first iteration, the secret key and a predefined seed value are used as input into the HMAC operation. For successive iterations, the output of the previous HMAC is used as input into the next iteration, along with the secret key. This is repeated until bucket ID iterations are completed. For example, in the case of bucket ID equal to 24, HMAC will be executed recursively 24 times, using a predefined seed value for the initial message. The result is labeled {T(Ib)K}, designating the transformation on bucket ID {Ib} using key {K}. In this way, the bucket ID is not directly encrypted, but the execution of HMAC is based on the value of the bucket ID. The encrypted value for the residual is found in a similar operation, differing only in the secret key that is used. Each bucket ID and residual forms a pair from the decomposition of the plaintext. When encrypting the residual value, the corresponding bucket ID is appended to the beginning of the secret key to form a new key. After finding the new key, the recursive HMAC operation is the same. Beginning with the seed, the digest is calculated N times where N is equal to the value of the residual. This result is labeled {T(r)Ib||K}, designating the transformation on residual {r} using the composite key {Ib||K}As an example, consider the encryption of integer 336,789 with a bucket size of 1,000. The bucket ID is 336 and the residual is 789. If using the SHA-1 hash algorithm, a key of “999”, and a seed value of “test”, HMAC will be executed recursively 336 times for the bucket ID, and 789 times for the residual. Both recursions use “test” as the initial HMAC message, but the bucket ID uses key {K} and the residual uses key {Ib||K}. The resulting encrypted values are {T(Ib)k} = “2CI0b3pNB8KbiCIUbKkOd2ciRAc=” and {T(r)Ib||k} = “PynDpvSFSSUZCqk3yVY8J2g3Ks4=”, using base64 encoding. Note that the output in this situation is two 28 character base64 encoded strings, which is a result of the 160 bit digest output of the SHA-1 hash used with HMAC in this project. The pseudocode for the encryption transformation is presented in Figure 2. Procedure Transformation Begin t:=seed For j=1 to x t := HMAC(t)K Endfor return t End. T(x)K Figure 3 – Pseudocode for encryption transformation The transformation is used twice, once for the bucket ID and once for the residual. The resulting value {t} is the encrypted data. The values for {T(Ib)K} and {T(r)Ib||K} calculated from the encryption transformation represent the ciphertext and are stored in the database. Note that a single integer data field is replaced with two data fields, and an increased amount of data. Considering a typical range for long integer values, 4 bytes of data will be replaced with 56 bytes of data if using base64 encoding on ciphertext. This is a 14-fold increase in stored data. 3.3 Decryption Transformation To reproduce the plaintext from the ciphertext, an inverse transformation is defined. Because the algorithm uses a hash as the basis of its encryption a direct inverse cannot be calculated. The inverse transformation must search through potential bucket ID and residual values. The inverse transformation uses the set of possible bucket IDs as a range for the search process. This requires that the values of possible bucket IDs is known beforehand, possibly from domain knowledge or the data type being encrypted. Because the set of all bucket IDs is processed even for decryption of one value, it is more efficient on a cost per record basis to decrypt all records at the same time. In the decryption transformation, the first step is finding the bucket ID of the ciphertext elements. This operation will reproduce the value of {Ib} from ciphertext {T(Ib)K}. The same seed and key value are used in the HMAC operation, and this operation is executed N times, where N is the number of possible buckets in the domain. For example, if using a bucket size of 2,000 in a domain where the maximum data value is 1,000,000, there are 500 possible bucket values and HMAC is executed 500 times. In this way the upper limit of allowable data values must be known in order to provide a limit to the HMAC search loop. While the N iterations of HMAC are calculated, the input for each calculation is based on the output of the previous iteration. Each time, the resulting value is compared against all encrypted bucket IDs for a match. If a match is found, the bucket ID plaintext is equal to the number of loops executed in the search. Once a bucket ID is found, a similar search can be made for the residual value. Once again, a new key is constructed by appending the decrypted bucket ID to the beginning of the secret key, and HMAC is calculated N times, where N is equal to the bucket size. The bucket size defines all possible residual values. Once a match is found between the HMAC output and the encrypted residual value, the plaintext residual is equal to the number of loops executed in the search. The pseudocode for the decryption transformation is shown in Figure 4. Procedure Inverse Transformation T-1(T)K Begin u := s for i=1 to MAX(T) u := HMAC(u)K if i>=MIN(T) Then find Ti = u ? If find any Ti Then ITi := i Endif Endfor return IT End. Figure 4 – Pseudocode for decryption transformation The inverse transformation is executed twice for each plaintext value, once to find the bucket ID and once to find the residual. The value of MAX(T) in pseudocode represents the maximum number of possible buckets when searching for {T-1(Ib)K}, and it represents the bucket size when searching for {T-1(r)Ib||K}. In the pseudocode, Ti represents the encrypted values, and IT i represents the resulting plaintext. If an encrypted value matches the HMAC output, then the number of loop iterations is the plaintext value, for either bucket ID or residual. 3.4 Post-Processing Step The post processing step reverses the modulus operation from the pre-processing step to generate the original plaintext from the decrypted bucket ID and residual. The plaintext value {m} is found using {m = Ib * Sb + r}, where {Ib} is the decrypted bucket ID, {r} is the decrypted residual, and {Sb} is the bucket size. In the post processing step, and scaling of plaintext or encoding of negative values is reversed if these actions were performed in the preprocessing step. 4. EXPERIMENTATION 4.1 Implementation In this project, the HMAC based database encryption scheme was implemented in C, using the SHA-1 hash algorithm and base64 data encoding. The HMAC and SHA-1 algorithms were implemented using existing source under the GPL license, distributed by the free software foundation. Existing source was used to avoid errors in implementing SHA-1 and HMAC, and to focus resources on the proposed encryption scheme and testing. Several challenges were encountered in the implementation based on data handling in C, including memory management, null byte processing, and data encoding including base64. For the purposes of this project, configuration parameters such as the bucket size, maximum number of buckets, and maximum number of encrypted records were defined in compiled constant values. A production version of the algorithm should allow dynamic specification of these parameters. 4.2 Testing The test phase of the project focused on three aspects: data validity, program efficiency, and ideal bucket size. The testing strategy included two input data sets, each with 2,000 random integer values. One data set was very large integers, ranging from 1,000,000 to 999,000,000 the other data set was small integers ranging from 1 to 999. These datasets were run through encryption and decryption transformations in five program configurations. Four of the configurations supported a maximum integer value of 2,500,000,000. The fifth configuration supported a maximum integer value of 5,000,000. The maximum supported integer is equal to the bucket size multiplied by the number of possible buckets. The product represents the largest value that can be encrypted safely by the algorithm. The five configurations used for testing were: - 500,000 bucket size, 5000 possible buckets o 2.5 Billion supported values - 50,000 bucket size, 50,000 possible buckets o 2.5 Billion supported values - 5,000 bucket size, 500,000 possible buckets o 2.5 Billion supported values - 500 bucket size, 5,000,000 possible buckets o 2.5 Billion supported values - 500 bucket size, 10,000 possible buckets o 5 Million supported values Distinct differences in performance between different bucket configurations were observed. Additionally, a relationship between the distribution of plaintext values and the efficiency of the program when using different configurations was discovered. The results for the five program configurations for encryption and decryption are presented in Table 1 and Table 2 on the next page. The values for time elapsed are in minutes and seconds for the encryption and decryption operation over the 2,000 integer values. In the small dataset results presented in Table 1, the encryption operation was much faster than the decryption operation. This is due to the small or non-existent bucket ID values, and the small residual values. The recursive HMAC operation required a small number of iterations to calculate the encrypted values. test data set mode time elapsed data validity test set 1 small 500K bucket encrypt 00:02.2 match 2 small 500K bucket decrypt 29:36.4 match 3 small 50K bucket encrypt 00:01.8 match 4 small 50K bucket decrypt 03:05.2 match 5 small 5K bucket encrypt 00:01.8 match 6 small 5K bucket decrypt 01:27.1 match 7 small 500 bucket encrypt 00:00.9 match 8 small 500 bucket decrypt 11:35.8 match 9 small 5M max int encrypt 00:00.9 match 10 small 5M max int decrypt 00:03.2 Table 1 - Test results for small data set match The decryption operation is less efficient in all tests because of the large number of residual values that must be searched to find the plaintext. As the bucket size decreases across different tests, the number of residuals also decreases and the decryption operation is faster. Once the 500 bucket size limit is reached, decryption slows again because plaintext values require a bucket ID search in addition to residual searching. An important point in the small data set result is the improved performance in the 5 million supported value test (500 buckets, 10,000 possible buckets). This is because of the smaller search domain in that test relative to the original integer data. For maximum performance, the bucket size multiplied by the possible buckets should be as small as possible while still able to represent all possible data values. test data set mode time elapsed data validity test set 11 large 500K bucket encrypt 14:28.3 match 12 large 500K bucket decrypt 29:49.1 match 13 large 50K bucket encrypt 02:01.9 match 14 large 50K bucket decrypt 03:06.6 match 15 large 5K bucket encrypt 05:50.7 match 16 large 5K bucket decrypt 01:27.5 match 17 large 500 bucket encrypt 56:56.4 match 18 large 500 bucket decrypt 11:31.6 19 large 5M max int encrypt 59:01.2 20 large 5M max int decrypt 00:03.9 Table 2 - Test results for large data set match no rows match no rows match In the large data set tests presented in Table 2 the encryption operation was also faster than decryption with two exceptions. In the case of the 500 bucket size test, there were 5 million possible buckets, and each plaintext value executed the HMAC operation for all possible buckets during encryption. The decryption process was faster with a 500 bucket size because the bucket ID decoding step was done once, and yielded all plaintext bucket values. The residual decryption step for each bucket ID was fast because of the small bucket size. In the case of the 5 million supported value test, encryption time was slow for the same reason as the 500 bucket size test, the domain of bucket IDs was very large, and all values exceeded the maximum possibly number of buckets. Decryption was extremely fast because it could find no matching rows within the 5 million possible records. This demonstrates that the bucket size multiplied by the possible number of buckets must support all integer values found in the domain. Generally, a calculated bucket ID cannot exceed the number of possible buckets. Another important point seen in the large data set results is the sequence from 500K bucket size, to 50K bucket size, to 5K bucket size. In the 500K bucket size, both encryption and decryption times were increased because of the large amount of residual value recursion and searching. The 50K bucket size balances the number of records evenly between bucket ID and residuals, resulting in improved encryption and decryption times. With the 5K bucket size, the decryption time is much faster due to the small number of residual values to search against, but increased time is invested in the encryption step. From these results, it appears that a smaller bucket size improves decryption performance, and a moderate bucket size improves encryption performance. 4.3 Analysis Several points to guide configuration and use of the algorithm were obtained from testing including: - (Bucket size * possible buckets) should be small - (Bucket size * possible buckets) should be greater than the maximum desired integer value - Smaller bucket sizes improve decryption performance - Moderate bucket sizes improve encryption performance It is apparent that the algorithm is computationally intensive for large integers because of the large number of recursive hash calculations, and the exhaustive search strategy for decryption. For the encryption side, if {N} is the number of plaintext records, {P b} is the number of possible buckets, and {Sb} is the bucket size, encryption could require up to {N*(Pb *2)*(Sb*2)} hash operations, because each HMAC operation includes two hash calculations. For a dataset of one million records, with a 50K bucket size and 50K possible buckets, the maximum number of hash operations would be 1x10 16. The actual number of operations would decrease if the plaintext values were smaller in the set of {Pb * Sb} integers. databases, an increase of this proportion can be difficult to implement. 5. FUTURE WORK For decryption, the bucket recursion is calculated once for all plaintext values, but the residual recursion is calculated for each plaintext value. The decryption process results in a maximum number of hash operations of {(Pb *2) + N*(Sb*2)}. Using the same dataset parameters as above, this would result in 1x1011 hash operations, a significant improvement but still an intensive operation. The proposed encryption scheme can be improved to include the efficiency seen in the decryption process. If encryption iterated over the number of possible bucket IDs once, the performance should be similar to the decryption process. In the discussion of performance, it is clear that there is a benefit to encrypting or decrypting a large number of values at one time. While the algorithm can support encryption and decryption of individual values, the performance per record is worse in that scenario. The underlying cost for one record, in decryption of the above example would be 200,000 hash operations. When decrypting one million records, the average cost is 100,000 hash operations per plaintext value. These results are assuming the full range of possible integer values, from 1 to {Pb * Sb}. The performance could be improved by reducing the maximum and minimum integer values to fit a limited problem domain. For example, if only values from 10,000 to 100,000 are required, the search over residual values will be more efficient. However this provides an attacker with additional information about the data being encrypted. In further analysis of the ciphertext, it is not clear how range queries would be executed over encrypted data. The encryption scheme was presented as supporting range queries and some aggregation queries such as MIN, MAX and COUNT. Equality queries or list criteria could be easily implemented by selectively encrypting the criteria values with the same seed and secret key, and comparing to the stored ciphertext data. Because the ciphertext is unordered hash output which has no relation to the plaintext data, range and aggregation operations aren’t possible over the encrypted data. The solution to this problem would require decryption of all ciphertext data in a database table in order to execute the query. Based on the 100,000 average hash operations per value, this could be a computationally intensive process for large databases. As noted previously, the encryption output contains two base64 strings, for a total of 56 bytes of character data storage in the database. In many systems, 4 bytes are required to store a typical 2.5 billion integer value. This is a factor of 14 increase to the data storage requirements, from 10 Mb to 140 Mb for a given integer table. For very large 5.1 Algorithm Improvements There are several ways to improve the algorithm as presented. Because this encryption scheme uses a hash method to create ciphertext, it is not possible to incorporate data ordering into the output. This is good because it prevents inference attacks and deduction based on known plaintext/ciphertext pairs. The best way to support queries with a method like this is to improve efficiency, primarily on the decryption side. It appears that recursive HMAC was included in the encryption scheme to support the exhaustive search process. This process does not improve the security of the algorithm, and does not provide querying support. An alternative process that can be considered in future research is a non-recursive HMAC based algorithm. Rather than using a seed value, the process would calculate HMAC based on the bucket ID and residual directly. This process would still employ the secret key, and the modified key concatenated with bucket ID. The performance of the encryption process would be improved in this situation. However, the decryption process would still require exhaustive searching across the bucket ID and residual numeric domains to identify plaintext values. 5.2 Potential Multiple Bucket Solution The encryption and decryption processes of this algorithm could be improved further by decomposing the original number into a larger number of smaller values, and applying HMAC to the smaller values. One model is to use powers of 1000 to decompose the plaintext into multiple buckets. For example, if encrypting a value of 122,344,566,788 then four values could be ran through HMAC. Three buckets would have values of 122, 344 and 566, with a residual of 788. Each of these for values exists in a range of 1 through 999, giving a small search range to identify each plaintext value in decryption. This potential solution maximizes the effect of small search ranges, but it also increases the problem of data storage. In this scenario, four base64 encoded values are stored for a total of 112 bytes ciphertext per plaintext value. This is a 28-fold increase in data storage requirements, which is excessive. Additional research is needed to determine if this drawback ban be improved. In multiple bucket solution described above, the maximum number of hash operations for encryption would be {(N*log1000(m)*2 + N*2}, where log1000(m) represents the number of buckets encrypted for the process. This would require eight million hash operations for one million records, an average of eight hash operations per record to encrypt data, which is an acceptable cost. Pseudocode for the encryption process is shown in Figure 5 below. This solution defines the encryption transformation as Tx(m)K, where the pre-processing and encryption steps are presented together to illustrate the concept. Procedure Transformation Tx(m)K Begin d := log1000(m) t := m For j=d to 1 **Find bucket for each power of 1000 r = t mod (1000d) Bj = (t – r) / 1000d t = r Endfor **Data Bd to B1 are buckets. **R is residual for <1000 **Encrypt buckets in d HMAC operations **Key is modified on each loop For j=d to 1 Ej := HMAC(Bj)K K = Bj || K Endfor **Encrypt the residual value ER := HMAC(r)K return Ed, Ed-1,..., E1, ER End. Figure 5 – Pseudocode for multiple bucket encryption In the encryption process presented in Figure 5, the number of buckets of size 1,000 is found using log1000(m). The number of encrypted values produced is log1000(m)+1, for the additional residual. The total number of HMAC operations used for encrypting N plaintext values is {N*log1000(m)+N}, making it very efficient. Because the primary cryptographic strength is the use of a hash algorithm and a strong secret key, this process will not decrease the cryptographic security. The decryption process is demonstrated in pseudocode in Figure 6. Procedure Inverse Transformation Tx-l(c)K Begin **c is log1000(domain) encrypted values d := MAX(log1000(domain)) t := T For j=d to 0 **Find bucket for each power of 1000 For i=1 to 1000 e := HMAC(i)K **test possible buckets vs cipher If e = cd Bd = i K = Bd || K End if Endfor Endfor r = B0 m = Bd*1000d + Bd-1*1000d-1 + ... + r return m End. Figure 6 – Pseudocode for multiple bucket decryption In the decryption process presented in Figure 6, the only domain based information that is required is the MAX(log1000(domain)), representing the largest power of 1,000 that will be supported. For example, in a domain of one billion, the MAX(log1000(domain)) is three. In a domain of one trillion, it is four. The remainder of the algorithm steps through the powers of 1,000 from most significant to least significant, collecting the plaintext bucket values and composite key along the way. In the post-processing step, all bucket values and the residual are combined into the plaintext. The decryption process requires more hash operations than the encryption side, because searching is still performed. The maximum number of operations for decryption would be {(N*MAX(log1000(domain))*1000*2}. This is due to two hash operations per HMAC process, 1,000 possible searches per bucket, MAX(log1000(domain)) buckets, and N ciphertext values. For one million ciphertext values in a domain of one billion, the number of decryption hash operations would be 6x109 or 6000 operations per record on average. With the proposed modified algorithm, the processing efficiency for encryption and decryption could be greatly improved. The major drawback to this multiple bucket method is the greatly increased ciphertext storage requirements, requiring a 28-fold increase in stored text. There could be unforeseen drawbacks to the multiple bucket method other than the large ciphertext out. Additional research and improvement for this algorithm is a potential topic for future research. 6. CONCLUSIONS 7. REFERENCES The database encryption scheme researched for this project provides an interesting use of the keyed Hash Message Authentication Code algorithm in conjunction with an underlying hash algorithm such as SHA-1. The proposed process uses information about the problem domain to encrypt and decrypt integer values using modular arithmetic, buckets and residual values. The encryption and decryption process use secret keys and strong hash algorithms to ensure the security of encrypted data. The HMAC operation is used recursively in both the encryption and decryption sides which creates performance problems for large integer values. On the decryption side, an exhaustive search is performed to determine the correct plaintext from the ciphertext and secret key. This exhaustive search requires between 100,000 and 200,000 hash operations on average to identify plaintext values, making processing large amounts of data infeasible. The efficiency problem can be minimized in certain problem domains by defining the minimum and maximum possible plaintext values, and picking a small bucket size. In some domains such as personally-identifiable social security numbers (SSNs), a plaintext range of 100,000,000 to 999,000,000 could be defined, with a 1,000 bucket size. With this configuration the efficiency should be manageable. Whenever the encrypted data is stored in a database, the attacker will most likely know that the field stores SSN data rather than other numeric data, but they will be unable to distinguish patterns or plaintext values from the ciphertext. Another drawback to the proposed method is the greatly increased data storage requirements for ciphertext data. Once four byte input value will produce two 28 byte ciphertext outputs, resulting in a 14-fold increase of stored data. The proposed algorithm has several strengths; it protects against inference attacks, does not preserve plaintext ordering, and supports single record encryption/decryption. Because strong hash algorithms are used, individual encryption will not reveal patterns that can be exploited to find the key. This means that traditional defenses for symmetric ciphertext such as cipher block chaining, cipher feedback, etc are not needed. Because these chaining methods are not used, each ciphertext value is independent of the other cihpertext values. There is an opportunity for future research motivated by this encryption scheme, in order to improve the processing efficiency of the algorithm. One potential area for research is a multiple bucket based HMAC encryption schemete. Challenges still remain with the quantity of ciphertext data produced in relation to the plaintext data values. [1] Dong Hyeok Lee; You Jin Song; Sung Min Lee; Taek Yong Nam; Jong Su Jang, "How to Construct a New Encryption Scheme Supporting Range Queries on Encrypted Database," Convergence Information Technology, 2007. International Conference on , vol., no., pp.1402-1407, 21-23 Nov. 2007 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4 420452&isnumber=4420217 [2] Forouzan, Behrouz A. 2008. Cryptography and Network Security. McGraw Hill higher Education. ISBN 978-0-07287022-0 [3] Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, Yirong Xu, "Order Preserving Encryption for Numeric Data," Proceedings of the 2004 ACM SIGMOD international conference on Management of data. 2007 URL: http://doi.acm.org/10.1145/1007568.1007632 [4] Tingjian Ge; Zdonik, S., "Fast, Secure Encryption for Indexing in a Column-Oriented DBMS," Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on , vol., no., pp.676-685, 15-20 April 2007 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4 221716&isnumber=4221635 [5] Wikipedia, July 2009. HMAC reference material. URI= http://en.wikipedia.org/wiki/Hmac [6] Wikipedia, July 2009. SHA-1 reference material. URI= http://en.wikipedia.org/wiki/SHA-1 [7] Simon Josefsson, 2006. GPL implementation of HMACSHA1. URI= http://www.koders.com/c/fidF9A73606BEE357A031F14689 D03C089777847EFE.aspx [8] Scott G. Miller, 2006. GPL implementation of SHA-1 hash. URI= http://www.koders.com/c/fid716FD533B2D3ED4F230292A 6F9617821C8FDD3D4.aspx [9] Bob Trower, August 2001. Open source base64 encoding implementation, adapted for test program. URI= http://base64.sourceforge.net/b64.c

2. the encryption scheme - University of Colorado Colorado Springs

Related documents

Products

Support

2. the encryption scheme - University of Colorado Colorado Springs

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib