DBMS_OBFUSCATION_TOOLKIT DBMS_OBFUSCATION_TOOLKIT In this section, we’ll take a look at encryption of data. We will discuss the supplied DBMS_OBFUSCATION_TOOLKIT package for Oracle 8.1.6 and 8.1.7. We will look at another implementation (a wrapper) that could be placed on top of this package, increasing its functionality. We will discuss some of the caveats with regards to using this package and perhaps most importantly, we’ll touch on the subject of key management. In Oracle 8.1.6, database packages for encryption were introduced. These packages have been enhanced for Oracle 8.1.7 to include support wider ranging encryption key sizes and MD5 hashing. In Oracle 8.1.6, support for single DES (Data Encryption Standard) encryption using a 56-bit key is provided. In Oracle 8.1.7, support for both single and triple DES encryption is provided, allowing for the use of 56, 112, or 168-bit keys. DES is implemented using a symmetric key cipher. What this means simply, is that the same key used to encrypt data, is used to decrypt data. DES encrypts data in 64-bit (8 byte) blocks using a 56-bit key. We’ll see below how this 8-byte fact affects us when using the encryption routines. The DES algorithm ignores 8 bits of the 64-bit key that is supplied. However, developers must supply a 64-bit (8 byte) key to the algorithm. Triple DES (3DES) is a far stronger cipher than DES. The resulting encrypted data is much harder to break using an exhaustive search. It would take 2**112 attempts using two-key (16 byte key) 3DES, or 2**168 attempts using three-key (24 byte key) 3DES, as opposed to 2**56 attempts with single-key DES. To quote the executive summary of rfc1321 (for a full description of rfc1321, the website is http://www.ietf.org/rfc.html), the new MD5: ... takes as input a message of arbitrary length and produces as output a 128-bit “fingerprint” or “message digest” of the input. It is conjectured that it is computationally infeasible to produce two messages having the same message digest, or to produce any message having a given prespecified target message digest. The MD5 algorithm is intended for digital signature applications, where a large file must be “compressed” in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem such as RSA. 1123 5254AppAIcmp2.pdf 1 2/28/2005 6:49:47 PM Appendix A In essence, MD5 is a way to verify data integrity, and is much more reliable than checksum and many other commonly used methods. In order to run the DES3 and MD5 examples below, you will need access to an Oracle 8.1.7 database. The DES examples require the use of Oracle 8.1.6, or higher. The encryption and MD5 routines have the following constraints that make them a bit unwieldy to use ‘out of the box’ in the DBMS_OBFUSCATION_TOOLKIT package. They are: ❑ The data being encrypted must have a length that is evenly divisible by 8. A 9-byte VARCHAR2 field for example, must be padded out to 16 bytes. Any attempt to encrypt or decrypt a piece of data that does not have a length evenly divisible by 8, will fail with an error. ❑ The key used to encrypt the data must be 8 bytes long for DESEncrypt, and either 16 or 24 bytes for DES3Decrypt. ❑ There are different routines to be called depending on whether you are using 56 bit encryption (DESENCRYPT and DESDECRYPT) versus 112/168bit encryption (DES3ENCRYPT and DES3DECRYPT). I personally find it nicer to have one set of routines for all three. ❑ The encryption routines in Oracle 8.1.6 are procedures and therefore, they are not callable from SQL (procedures cannot be called from SQL). ❑ The ‘out of the box’ encryption routines support encryption of up to 32 KB of data. They do not encrypt/decrypt LOBs. ❑ The encryption routines in Oracle 8.1.7 include functions. However, these functions are overloaded in such a way (see the example below) that also makes them not callable from SQL. ❑ The MD5 routines are likewise overloaded in such a way as to make them not callable from SQL. I find the first constraint, that the data length must be a multiple of 8, to be the hardest to satisfy in an application. Typically, I just have some data, like a salary amount or some other sensitive data, and want to encrypt it. I do not really want to be bothered with ensuring it is a multiple of 8 bytes in length. Fortunately, we can easily implement our own encryption wrapper package to hide this, and most all of the other issues. The fact that the key must be 8, 16 or, 24 bytes in length is something that you must do yourself, however. What I intend to do here is to create a wrapper package, that is installable in 8.1.6 and later versions, which provides support for all of the encryption functionality, and adds support for: ❑ Calling the functions from SQL ❑ Single function calls regardless of the key length ❑ Encryption/decryption of LOBs callable from both PL/SQL and SQL ❑ Installing successfully regardless of which version of the database (8.1.6 or 8.1.7) you are using. That is, it is not dependent on DES3Encrypt/Decrypt, and MD5 support. 1124 5254AppAIcmp2.pdf 2 2/28/2005 6:49:47 PM DBMS_OBFUSCATION_TOOLKIT The Wrapper We’ll start with the package specification. Here we will define an API that provides functions to encrypt and decrypt VARCHAR, RAW, BLOB, and CLOB data. The algorithm used (single key DES or 3DES with 16 or 24 byte keys) will be decided, based on the key length. In our API, the length of the key you send in implies the algorithm. The API is set up so that the key may be passed in with each call, or it may optionally be set for the package by calling SETKEY. The advantage of using SETKEY is that a certain amount of work must be performed to look at the key length and figure out which algorithm to use. If you set the key once and call the routines many times, we can avoid performing this iterative work over and over. Another detail about the key we must use is that if you are working with RAW or BLOB data, you must use a RAW key. If you want to use a VARCHAR2 as the key for RAW/BLOB data, you must cast it to be a RAW using the UTL_RAW package discussed in a later section of this appendix. On the other hand, if you are working with VARCHAR2 and CLOB data, the key must be a VARCHAR2. In addition to providing a layer on top of encryption, this package provides access to the MD5 CHECKSUM routines if installed (version 8.1.7 and up). This wrapper package adds a couple of new possible errors to the documented set of DBMS_OBFUSCATION_TOOLKIT errors (which this package will simply propagate). The following ‘new’ errors will only occur when using version 8.1.6: ❑ PLS-00302: component ‘MD5’ must be declared ❑ PLS-00302: component ‘DES3ENCRYPT’ must be declared ❑ PLS-00302: component ‘THREEKEYMODE’ must be declared You will get these errors if you attempt to use the 8.1.7 functionality of DES3 encryption or MD5 hashing, in the 8.1.6 database. Here is our suggested wrapper package specification. Explanation of the procedures and functions listed below follow the code: create or replace package crypt_pkg as function encryptString( p_data p_key function decryptString( p_data p_key in in in in varchar2, varchar2 default NULL ) return varchar2; varchar2, varchar2 default NULL ) return varchar2; function encryptRaw( p_data in raw, p_key in raw default NULL ) return raw; function decryptRaw( p_data in raw, p_key in raw default NULL ) return raw; function encryptLob( p_data p_key function encryptLob( p_data p_key function decryptLob( p_data p_key function decryptLob( p_data in in in in in in in clob, varchar2 default NULL ) return clob; blob, raw default NULL ) return blob; clob, varchar2 default NULL ) return clob; blob, 1125 5254AppAIcmp2.pdf 3 2/28/2005 6:49:47 PM Appendix A p_key in raw default NULL ) return blob; subtype checksum_str is varchar2(16); subtype checksum_raw is raw(16); function function function function md5str( md5raw( md5lob( md5lob( p_data p_data p_data p_data in in in in varchar2 ) return checksum_str; raw ) return checksum_raw; clob ) return checksum_str; blob ) return checksum_raw; procedure setKey( p_key in varchar2 ); end; / The functions ENCRYPTSTRING and DECRYPTSTRING are used to encrypt/decrypt any STRING, DATE, or NUMBER data up to 32 KB in size. 32 KB is the maximum size of a PL/SQL variable, and is considerably larger than the maximum size that can be stored in a database table, where the limit is 4,000 bytes. These functions are callable from SQL directly so you’ll be able to encrypt data in the database using an INSERT or UPDATE statement, and retrieve decrypted data using a simple SELECT. The KEY parameter is optional. If you have set a key via the SETKEY procedure, we do not need to pass it with each and every call. Next we have ENCRYPTRAW and DECRYPTRAW. These functions are to the RAW data type what the previous two functions are to VARCHAR2s. Note how we purposely avoided function overloading the encrypt/decrypt routines for RAW and VARCHAR2 data by naming them differently. We did this because of the following issue: tkyte@TKYTE816> create or replace package overloaded 2 as 3 function foo( x in varchar2 ) return number; 4 function foo( x in raw ) return number; 5 end; 6 / Package created. tkyte@TKYTE816> select overloaded.foo( ‘hello’ ) from dual; select overloaded.foo( ‘hello’ ) from dual * ERROR at line 1: ORA-06553: PLS-307: too many declarations of ‘FOO’ match this call tkyte@TKYTE816> select overloaded.foo( hextoraw( ‘aa’ ) ) from dual; select overloaded.foo( hextoraw( ‘aa’ ) ) from dual * ERROR at line 1: ORA-06553: PLS-307: too many declarations of ‘FOO’ match this call The database does not distinguish between RAW and VARCHAR2 in the signature of an overloaded function. We would have no way to call these functions from SQL. Even if we used different parameter names for the inputs to these routines (as DBMS_OBFUSCATION_TOOLKIT currently does), they cannot be called from SQL because named parameter notation cannot be used in SQL. The only viable solution to this conundrum is to use functions with unique names to identify the routine we really want. 1126 5254AppAIcmp2.pdf 4 2/28/2005 6:49:47 PM DBMS_OBFUSCATION_TOOLKIT Next we have the ENCRYPTLOB and DECRYPTLOB functions. These are overloaded functions designed to work with either CLOBs or BLOBs. Oracle is able to successfully overload, based on these types, so we’ll take advantage of this fact. Since we are limited to encrypting at most 32 KB of data by the DBMS_OBFUSCATION_TOOLKIT routine, these wrapper APIs will implement an algorithm that encrypts 32 KB chunks of a LOB. The resulting LOB will be a series of encrypted 32 KB chunks of data. The decrypt wrapper we implement understands how the data was packed by the LOB encryption routines, and will decrypt the pieces for us and re-assemble them back into our original LOB. Next, we have the routines for the MD5 CHECKSUMs. To better define what these routines return, we’ve set up the subtypes: subtype checksum_str is varchar2(16); subtype checksum_raw is raw(16); and defined our routines to return these types. You can declare variables of this type yourself: tkyte@TKYTE816> declare 2 checksum_variable 3 begin 4 null; 5 end; 6 / crypt_pkg.checksum_str; PL/SQL procedure successfully completed. This saves you from having to guess how big the checksum return types are. We’ve provided for four different CHECKSUM routines, one each for VARCHAR2 (includes the DATE and NUMBER types), RAW, CLOB, and BLOB data. It should be noted that the MD5 checksum will only be computed on the first 32 KB of the CLOB or BLOB data, as this is the largest variable PL/SQL can work with. The implementation of the package below will not only give us an easier to use encryption package, but it also shows a couple of useful concepts. Firstly, it shows how you can easily create your own wrapper layer to provide for a more customized interface to the database packages. In this case, we are working around some perceived limitations of the DBMS_OBFUSCATION_TOOLKIT package. Secondly, it shows one method for developing a package that is protected from enhancements in the supplied database packages over time. We would like to provide a single wrapper package that works both in 8.1.6 and 8.1.7, but which provides total access to the 8.1.7 functionality. If we used static SQL to access the DESENCRYPT, DES3DECRYPT, and MD5 routines, we would need a different package for 8.1.6 because MD5 and the DES3 functions do not exist in 8.1.6. The dynamic invocation we’ll utilize below allows us to develop a package that can be used by both versions of the database. It also reduces the amount of code has to be written. Here is the implementation of the CRYPT_PKG with explanations of what is taking place intermixed with the code: create or replace package body crypt_pkg as -- package globals g_charkey varchar2(48); g_stringFunction varchar2(1); g_rawFunction varchar2(1); g_stringWhich varchar2(75); g_rawWhich varchar2(75); g_chunkSize CONSTANT number default 32000; 1127 5254AppAIcmp2.pdf 5 2/28/2005 6:49:47 PM Appendix A The package begins with a few global variables. They are: ❑ G_CHARKEY – This stores the RAW or VARCHAR2 key for use by the encryption routines. It is 48 bytes long to support holding a 24 byte RAW key (which will be doubled in length due to the hexadecimal conversion caused by placing a RAW string in a VARCHAR2 variable). ❑ G_STRINGFUNCTION and G_RAWFUNCTION – Contains either Null or the string ‘3’ after a call to SETKEY. We will dynamically add this string to the routine name at run-time so we either run DESENCRYPT or DES3ENCRYPT, depending on the key size. In short, this is used to construct the appropriate function name we need to call. ❑ G_STRINGWHICH and G_RAWWHICH – Used with DES3EN/DECRYPT only. Adds the fourth optional parameter to force 3 key mode, when doing 3DES in 3 key mode. Whereas the string function variables above tell us whether to call DESENCRYPT or DES3ENCRYPT, this tells us what value we need to pass in for the key mode (two key or three key). ❑ G_CHUNKSIZE – A constant that controls the size of the LOB chunk we’ll encrypt/decrypt. It also controls the maximum amount of data sent to the MD5 checksum routines when working with LOBs. It is crucial that this number is a multiple of 8 – the implementation below counts on it! Continuing on, we have six small ‘private’ routines to implement. These are helper functions used by the other routines in the package: function padstr( p_str in varchar2 ) return varchar2 as l_len number default length(p_str); begin return to_char(l_len,’fm00000009’) || rpad(p_str, ( trunc(l_len/8)+sign(mod(l_len,8)) )*8, chr(0)); end; function padraw( p_raw in raw ) return raw as l_len number default utl_raw.length(p_raw); begin return utl_raw.concat( utl_raw.cast_to_raw(to_char(l_len,’fm00000009’)), p_raw, utl_raw.cast_to_raw( rpad(chr(0), (8-mod(l_len,8))*sign(mod(l_len,8)), chr(0)))); end; It you recall from the description of the DES encryption algorithm, it was stated that ‘DES encrypts data in 64-bit (8 byte) blocks...’ A side effect of this is that the DBMS_OBFUSCATION_TOOLKIT package works only on data whose length is a multiple of 8. If you have a string that is 7 bytes long, it must be padded to 8 bytes. A 9 byte string must be padded out to 16 bytes. The above two routines encode and pad out strings and RAW data. They encode the string or RAW by placing the original length into the string/RAW itself. Then they pad out the string with binary zeros (CHR(0)) to make it a multiple of 8 bytes in length. For example, the string Hello World will be encoded as follows: 1128 5254AppAIcmp2.pdf 6 2/28/2005 6:49:47 PM DBMS_OBFUSCATION_TOOLKIT tkyte@TKYTE816> 2 from 3 ( select 4 5 6 7 from 8 9 10 11 ) 12 / select length(padstr), padstr, dump(padstr) dump to_char(l_len,’fm00000009’) || rpad(p_str, (trunc(l_len/8)+sign(mod(l_len,8)) )*8, chr(0)) padstr ( select length( ‘Hello World’ ) l_len, ‘Hello World’ p_str from dual ) LENGTH(PADSTR) PADSTR DUMP -------------- ------------------------- -----------------------------24 00000011Hello World Typ=1 Len=24: 48,48,48,48,48,4 8,49,49,72,101,108,108,111,32, 87,111,114,108,100,0,0,0,0,0 The final length of the encoded string is 24 bytes (LENGTH(PADSDTR)), and the original length was 11 (this is visible in the first 8 characters of the PADSTR column). Looking at the DUMP column, which shows the ASCII values of the bytes in the string, we can see it ends with 5 binary zeroes. We needed to pad out 5 bytes to make the 11 byte Hello World string a multiple of 8. Next, we have the routines that ‘undo’ the padding we did above: function unpadstr( p_str in varchar2 ) return varchar2 is begin return substr( p_str, 9, to_number(substr(p_str,1,8)) ); end; function unpadraw( p_raw in raw ) return raw is begin return utl_raw.substr( p_raw, 9, to_number( utl_raw.cast_to_varchar2(utl_raw.substr(p_raw,1,8)) ) ); end; They are straightforward enough. They assume the first 8 bytes of the string or RAW is the original length of the string, and return the SUBSTR of this encoded data appropriately. Continuing on, we have the last of our internal helper routines: procedure wa( p_clob in out clob, p_buffer in varchar2 ) is begin dbms_lob.writeappend(p_clob,length(p_buffer),p_buffer); end; procedure wa( p_blob in out blob, p_buffer in raw ) is begin dbms_lob.writeappend(p_blob,utl_raw.length(p_buffer),p_buffer); end; 1129 5254AppAIcmp2.pdf 7 2/28/2005 6:49:48 PM Appendix A These simply make it easier to call DBMS_LOB.WRITEAPPEND, by shortening the name to WA, and by passing in the length of the buffer to write for us, which in our case is always the current length of the buffer. Now we hit our first externally callable routine SETKEY: procedure setKey( p_key in varchar2 ) as begin if ( g_charkey = p_key OR p_key is NULL ) then return; end if; g_charkey := p_key; if ( length(g_charkey) not in ( 8, 16, 24, 16, 32, 48 ) ) then raise_application_error( -20001, ‘Key must be 8, 16, or 24 bytes’ ); end if; select decode(length(g_charkey),8,’’,’3’), decode(length(g_charkey),8,’’,16,’’, 24,’, which=>dbms_obfuscation_toolkit.ThreeKeyMode’), decode(length(g_charkey),16,’’,’3’), decode(length(g_charkey),16,’’,32,’’, 48,’, which=>dbms_obfuscation_toolkit.ThreeKeyMode’) into g_stringFunction, g_stringWhich, g_rawFunction, g_rawWhich from dual; end; This routine is used whether you call it or not. The remaining externally callable routines below will call SETKEY regardless of whether you do or not. This routine will compare your key P_KEY to the one in the global variable G_CHARKEY. If they compare, or no key was provided, this routine simply returns. It has no work to perform. If the P_KEY is different from G_CHARKEY however, this routine will continue. The first thing it does is a sanity check to verify that the key is a valid multiple of 8. The key must be 8, 16 or 24 bytes in length. Since this routine may be passed RAW data, which causes each byte to be expanded into a 2 byte hexadecimal code, 16, 32, and 48 are valid lengths as well. This check does not fully guarantee the key will work however. For example, you could send a 4 byte RAW key that will appear as 8 bytes to us. You will get a run-time error from DBMS_OBFUSCATION_TOOLKIT later in that case. The SELECT with a DECODE is a used to set up the remaining global variables. Since we cannot tell the difference between a RAW and VARCHAR2 string at this point, we set up all four possible variables. The key thing to note about this piece of code is that if the length of the key is 8 bytes (16 bytes when RAW), then the FUNCTION variable will be set to a Null string. If the key length is 16 or 24 bytes (32 or 48 bytes when RAW), the FUNCTION variable will be set to the string ‘3’. This is what will cause us to call DESENCRYPT or DES3Encrypt later. The other thing to notice here is the setting of the WHICH global variable. This is used to set the optional parameter to the DES3ENCRYPT routine. If the key length is 8 or 16 bytes (16 or 32 bytes RAW), we set this string to Null – we do not pass a parameter. If the key length is 24 bytes (48 bytes RAW), we set this string to pass THREEKEYMODE to the ENCRYPT/DECRYPT routines to instruct them to use this larger key. 1130 5254AppAIcmp2.pdf 8 2/28/2005 6:49:48 PM DBMS_OBFUSCATION_TOOLKIT Now we are ready to see the functions that do the actual work for us: function encryptString( p_data in varchar2, p_key in varchar2 default NULL ) return varchar2 as l_encrypted long; begin setkey(p_key); execute immediate ‘begin dbms_obfuscation_toolkit.des’ || g_StringFunction || ‘encrypt ( input_string => :1, key_string => :2, encrypted_string => :3’ || g_stringWhich || ‘ ); end;’ using IN padstr(p_data), IN g_charkey, IN OUT l_encrypted; return l_encrypted; end; function encryptRaw( p_data in raw, p_key in raw default NULL ) return raw as l_encrypted long raw; begin setkey(p_key); execute immediate ‘begin dbms_obfuscation_toolkit.des’ || g_RawFunction || ‘encrypt ( input => :1, key => :2, encrypted_data => :3’ || g_rawWhich || ‘ ); end;’ using IN padraw( p_data ), IN hextoraw(g_charkey), IN OUT l_encrypted; return l_encrypted; end; The ENCRYPTSTRING and ENCRYPTRAW functions work in a similar manner to each other. They both dynamically call either DESENCRYPT or DES3ENCRYPT. This dynamic call not only reduces the amount of code we have to write as it avoids the IF THEN ELSE we would have use to statically call either routine, but it also makes it so the package can be installed in 8.1.6 or 8.1.7, without change. Since we do not statically reference DBMS_OBFUSCATION_TOOLKIT, we can compile against either version. This dynamic invocation is a technique that is useful any time you are not certain what might or might not be installed in the database. I’ve used it in the past when writing utility routines that needed to be installed in 7.3, 8.0, and 8.1. Over time, additional functionality was added to the core packages, and when the code was running in 8.1, we wanted to take advantage of it. When we were running in 7.3, the code would still function; it just wouldn’t be able to benefit from the newer functionality. The same concept applies here. When installed in an 8.1.7 database, the above code can, and will call DES3ENCRYPT. When installed in 8.1.6, any attempt to invoke the DES3ENCRYPT will result in a run-time error (instead of preventing you from installing this package). The calls to DESENCRYPT will function as expected in 8.1.6. These functions work simply by creating a dynamic string using the FUNCTION and WHICH we set in the SETKEY routine. We will either add the number 3 to the procedure name, or not. We will add the optional fourth parameter to DES3ENCRYPT when we want three key mode. Then we execute the string, 1131 5254AppAIcmp2.pdf 9 2/28/2005 6:49:48 PM Appendix A send the data and key to be encrypted, and receive the encrypted data as output. Notice how we bind the PADSTR or PADRAW of the original data. The data that is encrypted is the encoded string, which is padded out to the proper length. Now for the inverse of the above two functions: function decryptString( p_data in varchar2, p_key in varchar2 default NULL ) return varchar2 as l_string long; begin setkey(p_key); execute immediate ‘begin dbms_obfuscation_toolkit.des’ || g_StringFunction || ‘decrypt ( input_string => :1, key_string => :2, decrypted_string => :3’ || g_stringWhich || ‘ ); end;’ using IN p_data, IN g_charkey, IN OUT l_string; return unpadstr( l_string ); end; function decryptRaw( p_data in raw, p_key in raw default NULL ) return raw as l_string long raw; begin setkey(p_key); execute immediate ‘begin dbms_obfuscation_toolkit.des’ || g_RawFunction || ‘decrypt ( input => :1, key => :2, decrypted_data => :3 ‘ || g_rawWhich || ‘ ); end;’ using IN p_data, IN hextoraw(g_charkey), IN OUT l_string; return unpadraw( l_string ); end; DECRYPTSTRING and DECRYPTRAW work in a similar manner as the ENCRYPT routines above functionally. The only difference is they call DECRYPT instead of ENCRYPT in the DBMS_OBFUSCATION_TOOLKIT package, and call UNPAD to decode the string or RAW data. Now onto the routines for encrypting LOBs: 1132 5254AppAIcmp2.pdf 10 2/28/2005 6:49:48 PM DBMS_OBFUSCATION_TOOLKIT function encryptLob( p_data in clob, p_key in varchar2 ) return clob as l_clob clob; l_offset number default 1; l_len number default dbms_lob.getlength(p_data); begin setkey(p_key); dbms_lob.createtemporary( l_clob, TRUE ); while ( l_offset <= l_len ) loop wa( l_clob, encryptString( dbms_lob.substr( p_data, g_chunkSize, l_offset ) ) ); l_offset := l_offset + g_chunksize; end loop; return l_clob; end; function encryptLob( p_data in blob, p_key in raw ) return blob as l_blob blob; l_offset number default 1; l_len number default dbms_lob.getlength(p_data); begin setkey(p_key); dbms_lob.createtemporary( l_blob, TRUE ); while ( l_offset <= l_len ) loop wa( l_blob, encryptRaw( dbms_lob.substr( p_data, g_chunkSize, l_offset ) ) ); l_offset := l_offset + g_chunksize; end loop; return l_blob; end; These are overloaded procedures for BLOBs and CLOBs. They work by creating a temporary LOB to write the encrypted data into. Since we change the length of a string/RAW data when encrypted to preserve its original length and pad it out, doing this ‘in place’ using the existing LOB would not be possible. For example, if you had a 64 KB LOB, we would take the first 32 KB, and make it ‘larger’ than 32 KB. Now we would need to slide over the last 32 KB of the existing LOB to make room for this larger chunk of data. Also, it would make it not possible to call these functions from SQL, since the LOB locator would have to be IN/OUT, and IN/OUT parameters would preclude this from being called from SQL. So, we simply copy the encrypted data into a new LOB which the caller can use anywhere, even in an INSERT or UPDATE statement. The algorithm used to encrypt and encode the LOB data is as follows. We start at byte 1 (L_OFFSET) and encrypt G_CHUNKSIZE bytes of data. This is appended to the temporary LOB we created. We add G_CHUNKSIZE to the offset, and continue looping until we have processed the entire LOB. At the end, we return the temporary LOB to the caller. 1133 5254AppAIcmp2.pdf 11 2/28/2005 6:49:48 PM Appendix A Next for the decryption routines for LOB data: function decryptLob( p_data in clob, p_key in varchar2 default NULL ) return clob as l_clob clob; l_offset number default 1; l_len number default dbms_lob.getlength(p_data); begin setkey(p_key); dbms_lob.createtemporary( l_clob, TRUE ); loop exit when l_offset > l_len; wa( l_clob, decryptString( dbms_lob.substr( p_data, g_chunksize+8, l_offset ) ) ); l_offset := l_offset + 8 + g_chunksize; end loop; return l_clob; end; function decryptLob( p_data in blob, p_key in raw default NULL ) return blob as l_blob blob; l_offset number default 1; l_len number default dbms_lob.getlength(p_data); begin setkey(p_key); dbms_lob.createtemporary( l_blob, TRUE ); loop exit when l_offset > l_len; wa( l_blob, decryptRaw( dbms_lob.substr( p_data, g_chunksize+8, l_offset ) ) ); l_offset := l_offset + 8 + g_chunksize; end loop; return l_blob; end; Once again, for the same reasons as before, we utilize a temporary LOB to perform the decryption. This time however, there is one additional reason for the temporary LOB. If we did not use a temporary LOB to decrypt the data into, we would actually be decrypting the data in the DATABASE. Subsequent SELECTs would see already decrypted data if we didn’t copy it into a new LOB! Here, the temporary LOB usage is even more important than before. The logic employed here is to loop over the chunks in the LOB as before. We start at offset 1 (the first byte) in the LOB and SUBSTR off G_CHUNKSIZE+8 bytes. The additional 8 bytes caters for the 8 bytes the PADSTR/PADRAW functions added to the data when we encoded it. So, all we do is walk through the LOB G_CHUNKSIZE+8 bytes at a time, decrypting the data, and appending it to the temporary LOB. This is what gets returned to the client. And now for the last part of the CRYPT_PKG, the interface to the MD5 routines: 1134 5254AppAIcmp2.pdf 12 2/28/2005 6:49:48 PM DBMS_OBFUSCATION_TOOLKIT function md5str( p_data in varchar2 ) return checksum_str is l_checksum_str checksum_str; begin execute immediate ‘begin :x := dbms_obfuscation_toolkit.md5( input_string => :y ); end;’ using OUT l_checksum_str, IN p_data; return l_checksum_str; end; function md5raw( p_data in raw ) return checksum_raw is l_checksum_raw checksum_raw; begin execute immediate ‘begin :x := dbms_obfuscation_toolkit.md5( input => :y ); end;’ using OUT l_checksum_raw, IN p_data; return l_checksum_raw; end; function md5lob( p_data in clob ) return checksum_str is l_checksum_str checksum_str; begin execute immediate ‘begin :x := dbms_obfuscation_toolkit.md5( input_string => :y ); end;’ using OUT l_checksum_str, IN dbms_lob.substr(p_data,g_chunksize,1); return l_checksum_str; end; function md5lob( p_data in blob ) return checksum_raw is l_checksum_raw checksum_raw; begin execute immediate ‘begin :x := dbms_obfuscation_toolkit.md5( input => :y ); end;’ using OUT l_checksum_raw, IN dbms_lob.substr(p_data,g_chunksize,1); return l_checksum_raw; end; end; / The MD5 routines act as a passthrough to the native DBMS_OBFUSCATION_TOOLKIT routines. The one thing they do differently is that they are not overloaded, allowing them to be called directly from SQL. You should note that the MD5 LOB routines only compute the CHECKSUM based on the first G_CHUNKSIZE bytes of data. This is due to the limitation of PL/SQL in regards to variable sizes. Now we will briefly test out and demonstrate the functionality of this package. The following examples were executed in an Oracle 8.1.7 database. If executed in 8.1.6, you should expect the DES3 and MD5 examples to fail at run-time: 1135 5254AppAIcmp2.pdf 13 2/28/2005 6:49:48 PM Appendix A tkyte@DEV817> declare 2 l_str_data varchar2(25) := ‘hello world’; 3 l_str_enc varchar2(50); 4 l_str_decoded varchar2(25); 5 6 l_raw_data raw(25) := utl_raw.cast_to_raw(‘Goodbye’); 7 l_raw_enc raw(50); 8 l_raw_decoded raw(25); 9 10 begin 11 crypt_pkg.setkey( ‘MagicKey’ ); 12 13 l_str_enc := crypt_pkg.encryptString( l_str_data ); 14 l_str_decoded := crypt_pkg.decryptString( l_str_enc ); 15 16 dbms_output.put_line( ‘Encoded In hex = ‘ || 17 utl_raw.cast_to_raw(l_str_enc) ); 18 dbms_output.put_line( ‘Decoded = ‘ || l_str_decoded ); 19 20 crypt_pkg.setkey( utl_raw.cast_to_raw(‘MagicKey’) ); 21 22 l_raw_enc := crypt_pkg.encryptRaw( l_raw_data ); 23 l_raw_decoded := crypt_pkg.decryptRaw( l_raw_enc ); 24 25 dbms_output.put_line( ‘Encoded = ‘ || l_raw_enc ); 26 dbms_output.put_line( ‘Decoded = ‘ || 27 utl_raw.cast_to_varchar2(l_raw_decoded) ); 28 end; 29 / Encoded In hex = 7004DB310AC6A8F210F8467278518CF988DF554B299B35EF Decoded = hello world Encoded = E3CC4E04EF3951178DEB9AFAE9C99096 Decoded = Goodbye PL/SQL procedure successfully completed. This shows the basic functionality of the ENCRYPT and DECRYPT routines. Here, I am calling them procedurally – below we will do it in SQL. I test against both string and RAW data in this example. On line 11 of the code, I call SETKEY to set the encryption key, to be used for encoding and decoding of the VARCHAR2 data elements, to the string MAGICKEY. This saves me from having to repeatedly pass this string into these routines. Then I encrypt the string into L_STR_ENC. I then decrypt this string just to make sure everything is working as expected. On lines 16-18 I print out the results. Since the encrypted data can contain various characters that drive terminal emulators crazy, I print out the encrypted string using UTL_RAW.CAST_TO_RAW on line 17. This has the effect of just changing the data type of the VARCHAR2 into a RAW, as mentioned previously. The underlying data is not changed at all. Since RAW data is implicitly converted to a string of hexadecimal digits, we can use this as a convenient way to dump data to the screen in hexadecimal. On lines 20 through 27 I do the same thing to RAW data. I must call SETKEY once again, this time with 8 bytes of RAW data. For convenience sake, I use UTL_RAW.CAST_TO_RAW to change a VARCHAR2 key into a RAW key. I could have used HEXTORAW and passed a string of hexadecimal characters as well. I then encrypt the data, and decrypt the encrypted data. When I print it out, I just print the encrypted data (it’ll display in hexadecimal), and I cast the data I decrypted back to VARCHAR2 so we can see that it worked. The output confirms the package functions. 1136 5254AppAIcmp2.pdf 14 2/28/2005 6:49:48 PM DBMS_OBFUSCATION_TOOLKIT Next, we’ll look at how this might work in SQL. We’ll test out the triple DES encryption in two key mode this time: tkyte@DEV817> drop table t; Table dropped. tkyte@DEV817> create table t 2 ( id int primary key, data varchar2(255) ); Table created. tkyte@DEV817> insert into t values 2 ( 1, crypt_pkg.encryptString( ‘This is row 1’, ‘MagicKeyIsLonger’ ) ); 1 row created. tkyte@DEV817> insert into t values 2 ( 2, crypt_pkg.encryptString( ‘This is row 2’, ‘MagicKeyIsLonger’ ) ); 1 row created. tkyte@DEV817> select utl_raw.cast_to_raw(data) encrypted_in_hex, 2 crypt_pkg.decryptString(data,’MagicKeyIsLonger’) decrypted 3 from t 4 / ENCRYPTED_IN_HEX -----------------------------------------------0B9A809515519FA6A34F150941B318DA441FBB0C790E9481 0B9A809515519FA6A34F150941B318DA20A936F9848ADC13 DECRYPTED ------------This is row 1 This is row 2 So, simply by using a 16 byte key as input to the CRYPT_PKG.ENCRYPTSTRING routine, we automatically switched over to the DES3ENCRYPT routine, within the DBMS_OBFUSCATION_TOOLKIT package. This example shows how easy it is to use the CRYPT_PKG in SQL. All of the functions are callable from SQL and can be used anywhere where SUBSTR, for example, could be used. The CRYPT_PKG could be used in the SET clause of an UPDATE, the VALUES clause of an INSERT, the SELECT clause of a SQL query, and even in the WHERE clause of any statement if you like. Now we will look at how this package may be used on LOBs and demonstrate the MD5 routines as well. We’ll use a 50 KB CLOB as our test case. First we must load the LOB into the database: tkyte@DEV817> create table demo ( id int, theClob clob ); Table created. tkyte@DEV817> create or replace directory my_files as 2 ‘/d01/home/tkyte’; Directory created. tkyte@DEV817> declare 2 l_clob clob; 3 l_bfile bfile; 1137 5254AppAIcmp2.pdf 15 2/28/2005 6:49:49 PM Appendix A 4 5 6 7 8 9 10 11 12 13 14 15 16 begin insert into demo values ( 1, empty_clob() ) returning theclob into l_clob; l_bfile := bfilename( ‘MY_FILES’, ‘htp.sql’ ); dbms_lob.fileopen( l_bfile ); dbms_lob.loadfromfile( l_clob, l_bfile, dbms_lob.getlength( l_bfile ) ); dbms_lob.fileclose( l_bfile ); end; / PL/SQL procedure successfully completed. The above procedure has loaded some data into the CLOB. Now we would like to perform some operations on it. Again, we will use SQL, as this is a fairly natural way to interact with the data. We’ll start by computing a CHECKSUM based on the first 32 KB of the CLOB: tkyte@DEV817> select dbms_lob.getlength(theclob) lob_len, 2 utl_raw.cast_to_raw( crypt_pkg.md5lob(theclob) ) md5_checksum 3 from demo; LOB_LEN MD5_CHECKSUM ---------- -------------------------------50601 307D19748889C2DEAD879F89AD45D1BA Again, we use the UTL_RAW.CAST_TO_RAW to convert the VARCHAR2 returned from the MD5 routines into a hexadecimal string for display. The VARCHAR2 string will most likely contain data that is ‘unprintable’ on your terminal, or it may contain embedded newlines, tabs, and other control characters. The above code shows how easy it is to use the MD5 routines – just send it some data and it will compute the CHECKSUM. Next, we want to see how one might encrypt and decrypt a LOB. We’ll do it with a simple UPDATE. Notice that this time our encryption key is 24 bytes long. We will be using the DES3ENCRYPT routine with the optional which => ThreeKeyMode parameter set. This gives us 3 key, triple DES encryption: tkyte@DEV817> update demo 2 set theClob = crypt_pkg.encryptLob( theClob, 3 ‘MagicKeyIsLongerEvenMore’ ) 4 where id = 1; 1 row updated. tkyte@DEV817> select dbms_lob.getlength(theclob) lob_len, 2 utl_raw.cast_to_raw( crypt_pkg.md5lob(theclob) ) md5_checksum 3 from demo; LOB_LEN MD5_CHECKSUM ---------- -------------------------------50624 FCBD33DA2336C83685B1A62956CA2D16 1138 5254AppAIcmp2.pdf 16 2/28/2005 6:49:49 PM DBMS_OBFUSCATION_TOOLKIT Here we can see by the fact that the length has changed from 50,601 to 50,624 bytes, and that the MD5 CHECKSUM is different, that we have in fact modified the data. What we did, if you recall from the algorithm above, is to take the first 32,000 bytes of the CLOB, added 8 bytes onto the front of this 32,000 bytes as part of the string encryption, and encrypted it. Then, we retrieved the remaining 18,601 bytes. We padded this out to 18,608 bytes (divisible by 8 evenly), and added 8 bytes to remember the original length. This gives us our expanded length of 50,624 bytes. Lastly, we will look at how to retrieve the CLOB decrypted from the database: tkyte@DEV817> select dbms_lob.substr( 2 crypt_pkg.decryptLob(theClob), 100, 1 ) data 3 from demo 4 where id = 1; DATA ---------------------------------------set define off create or replace package htp as /* STRUCTURE tags */ procedure htmlOpen; procedure An interesting thing to note here is that I did not pass in the encryption key. Since we save this key in the package state, it is not necessary here. The package will remember it from call to call, but not session to session. I could send the key, but I do not need to. The key is stored in a package body global variable, so it is not visible to anything other than the functions in the package body, and cannot be seen by other sessions. Caveats Currently, there exists a situation with the DBMS_OBFUSCATION_TOOLKIT whereby data encrypted on a ‘little endian’ system cannot be decrypted using the same key on a ‘big endian’ system. ‘Endian’ has to do with he ordering of bytes in a multi-byte number. Intel platforms (NT, many Linuxes, and Solaris x86 run on Intel) have a little endian byte ordering. Sparc and Risc typically have a big endian. Data that is encrypted on Windows NT using a key of ‘12345678’ cannot be decrypted on Sparc Solaris using the same key. The following example demonstrates the issue (and shows the workaround). On Windows NT: tkyte@TKYTE816> create table anothert ( encrypted_data varchar2(25) ); Table created. tkyte@TKYTE816> insert into anothert values 2 ( crypt_pkg.encryptString( ‘hello world’, ‘12345678’ ) ); 1 row created. tkyte@TKYTE816> select crypt_pkg.decryptstring( encrypted_data ) from anothert; CRYPT_PKG.DECRYPTSTRING(ENCRYPTED_DATA) ------------------------------------------------------------------------hello world tkyte@TKYTE816> host exp userid=tom/kyte tables=anothert 1139 5254AppAIcmp2.pdf 17 2/28/2005 6:49:49 PM Appendix A I ftp this EXPDAT.DMP file to my Sparc Solaris machine, and load the data into it. Then, when I attempt to query it out I receive: ops$tkyte@DEV816> select 2 crypt_pkg.decryptstring( encrypted_data, ‘12345678’ ) 3 from t; crypt_pkg.decryptstring( encrypted_data, ‘12345678’ ) * ERROR at line 2: ORA-06502: PL/SQL: numeric or value error: character to number conversion ORA-06512: at “OPS$TKYTE.CRYPT_PKG”, line 84 ORA-06512: at “OPS$TKYTE.CRYPT_PKG”, line 215 ORA-06512: at line 1 ops$tkyte@DEV816> select 2 crypt_pkg.decryptstring( encrypted_data, ‘43218765’ ) 3 from t; CRYPT_PKG.DECRYPTSTRING(ENCRYPTED_DATA,’43218765’) ------------------------------------------------------------------------hello world The error above is coming from my wrapper package. I am taking the first 8 bytes of data in the string, and assuming that it is a number. Since the key could not successfully decrypt the data, the first 8 bytes is in fact not my length field – it is some garbage set of characters. Apparently, our 8 byte (or 16 or 24 byte) key is addressed internally as a series of 4 byte integers. We must reverse the bytes in every 4 byte group in our key, in order to decrypt data on one system that was encrypted on another with a different byte order. Therefore, if I use the key ‘12345678’ on Windows NT (Intel), I must use the key ‘43218765’ on Sparc Solaris. We take the first 4 bytes and reverse them, then take the next 4 bytes and reverse them (and so on for larger keys). This is an important fact to keep in mind if you either move the data from NT to Sparc Solaris for example, or use database links to query the data out. You must be prepared to physically reorder the bytes to decrypt successfully. This particular issue is corrected in the patch release of Oracle 8.1.7.1 and up, and the need for the byte swapping is removed. Key Management I would like to briefly talk about key management here for a moment. Encryption is only part of the solution to making data ‘secure’. The primary reason people quote for encrypting data in the database is to make it so the DBA, who can query any table, cannot make sense of the data in the table. For example, you are an online web site taking customer orders. People are giving you their credit card numbers. You are storing them in the database. You would like to assure that neither the DBA, who must be able to backup your database, nor the malicious hacker that breaks into your database, could read this highly sensitive information. If you stored it in clear text, it would be easy for someone to see it, if they gained DBA access to your database. If it is stored encrypted this would not be the case. The encrypted data is only as secure as the key you use to encrypt the data. The key that is used is the magic component here. If the key is discovered, the data might as well not be encrypted (as evidenced by the fact that we can simply select to decrypt data if we were given the key). 1140 5254AppAIcmp2.pdf 18 2/28/2005 6:49:49 PM DBMS_OBFUSCATION_TOOLKIT Therefore key generation and key protection are two things you must give a lot of thought to. There are many paths you could follow. What follows are some ideas, and concepts that you can use, but each has its own specific issues. The Client Application Manages and Stores Keys One approach is to keep the keys out of the database, perhaps even on another machine (just don’t lose them – it’ll take you a couple hundred CPU years to guess what they might be!). In this scenario, the client application, be it software in a middle tier application server, or a client-server application, manages the keys on its system. The client software determines if the user accessing the data is permitted to decrypt it or not, and sends the key to the database. If you choose to do this, to transmit the key over the network, we must add yet one more layer of encryption and this is data stream encryption for the Net8 traffic. Bind variables and string literals are transmitted unencrypted by default. In this situation, since the keys are so critical, you would need to make use of a technology such as the ASO (Advanced Security Option). This Net8 option provides full data stream encryption so that no one can ‘sniff’ your keys from the network. As long as the key is securely maintained by the client application (this is up to you to ensure), and you use ASO, this would be a viable solution. Store the Keys in the Same Database Here, you would store the keys in the database itself with the data. This is not a perfect solution, as you now know that a DBA with enough time on their hands (or a hacker who has gotten into a DBA account) could possibly discover your keys and the data that was encrypted by them. What you need to do in a case like this, is to make it as hard as possible to put the keys together with the data. This is a hard thing to do because both are in the same database. One approach is to never directly relate the key table to the data table. For example, you have a table with CUSTOMER_ID, CREDIT_CARD, and other data. The CUSTOMER_ID is immutable; it is the primary key (and we all know you never update a primary key). You might set up another table: ID number primary key, DATA varchar2(255) This is the table in which we will store our keys, one key per customer ID. We will provide a packaged function that will return the key if and only if the appropriate user in the appropriate environment is executing it (similar to the concept behind FGAC; you can only get to the data if you have set up the correct application context). This package would provide two basic functions: ❑ Function 1: The addition of a new customer – In this case, the function would perform some operation on the customer ID to ‘scramble’ it (convert it into another string). This function would be deterministic so that given the same customer ID we would always get the same output. Some ideas for how to mangle this customer ID, or any string in fact, follow below. It would also generate a random key for this customer. Some ideas on how to generate that key follow. It would then use dynamic SQL to insert the row into the key table (which is not going to be named KEY_TABLE or anything obvious like that). 1141 5254AppAIcmp2.pdf 19 2/28/2005 6:49:49 PM Appendix A ❑ Function 2: The retrieval of a key for a customer – This function would take a customer ID, run it through the same deterministic function as above, and then using dynamic SQL, it would look up the key for this customer, and return it. It will only perform these functions if the current user is executing in the appropriate environment. The reason for the use of dynamic SQL is that it may become obvious to people that this package is the one performing the key management. A user may be able to query ALL_DEPENDENCIES to find all of the tables this package statically references. By using dynamic SQL, there will be no correlation between this package, and the key table. We are not preventing a really smart person from deducing the key table here, just making it as hard as we possibly can. As for how to scramble the customer ID (or any set of immutable data related to this row, the primary key is a good candidate as long as you never update it), we can use many algorithms. If I was using Oracle 8.1.7, I may send this piece of information concatenated with some constant value (commonly referred to as a ‘salt’) to the MD5 routines, to generate a 16 byte CHECKSUM. I would use this as my key. Using Oracle 8.1.6, I might do the same sort of operation, but use DBMS_UTILITY.GET_HASH_VALUE with a very large hash table size. I could use an XOR algorithm after reversing the bytes in the CUSTOMER_ID. Any algorithm that would be hard to guess given the resulting output data would suffice. By now, you should be saying, ‘Ahh but the DBA can just read the code out, see the algorithm, and figure this all out.’ Not if you wrap the code. Wrapping PL/SQL code is very straightforward (see the WRAP utility that is documented in the PL/SQL User’s Guide and Reference). This will take your source code and ‘obfuscate’ it. You will load the wrapped version of the code into the database. Now no one can read your code. There are no ‘reverse wrap’ tools available. Just make sure to maintain a good copy of your algorithm somewhere safe. Since there are no reverse wrap, tools you will not be able to resurrect your code from the database if you need to. Now, to generate a key for this customer, we need a randomizer of sorts. There are many ways we could do this. We could use the same basic routines we used to obfuscate the CUSTOMER_ID. We could use a random number generator (such as DBMS_RANDOM or one we develop ourselves). The goal is to generate something that cannot be ‘guessed’, based on anything else. Personally speaking, this would be my preferred choice, storing the keys in the database. If I let the client application manage them, there is the risk of the client application ‘losing’ the keys due to a media failure, or some other system catastrophe. If I use the next method, storing them in the file system, I stand the same sort of risk. Only if I store the keys in the database can I be ensured that the encrypted data can in fact be decrypted – my database is always in ‘sync’, and backup and recovery is assured. Store the Keys in the File System with the Database You could also store the keys used to encrypt the data in files within the file system and, using a C external procedure, access them. I suggest using a C external procedure because the goal here is to prevent the DBA from ‘seeing’ them, and quite often the DBA has access to the Oracle software account. UTL_FILE, BFILES, and Java stored procedures doing I/O, run as the Oracle software account user. If the DBA can become the Oracle software owner, they can see the files. If they can see the files, they can see the keys. Using an external procedure written in C, the EXTPROC service can run under a wholly different account by running the listener for the EXTPROC service as another user. In this fashion, the Oracle account cannot ‘see’ the keys. Only via the EXTPROC listener can I gain access to them. This just adds a layer of assuredness to your solution. See Chapter 18 on C-Based External Procedures for more information regarding this approach. 1142 5254AppAIcmp2.pdf 20 2/28/2005 6:49:49 PM DBMS_OBFUSCATION_TOOLKIT Summary We have spent quite a bit of time looking at the DBMS_OBFUSCATION_TOOLKIT package in this section. Here, we have seen how to effectively create a wrapper package that provides the functionality in the way we would like to (feel free to write your own wrappers if you don’t like my implementation). We’ve learned also how to use dynamic SQL to create packages that can be installed into databases with different capabilities (8.1.6 versus 8.1.7 encryption capabilities in this case). We investigated a crossplatform issue that accompanies the DMBS_OBFUSCATION_TOOLKIT package, that of byte ordering in the keys. We learned how to solve this issue, if it occurs, by rearranging the bytes in the key itself. An interesting enhancement to the CRYPT_PKG above would be to have it automatically detect if you were on a little or big endian system, and swap the bytes in the key on one of them for you, so you would not even be aware of these differences. This would be an excellent idea given that in version 8.1.7.1, the need to reverse the bytes goes away and you could just remove the code that does this, and be fully functional again. Lastly, we looked at the very important area of key management. I spent a lot of time working on a nice wrapper package to make encryption and decryption look easy. The hard part is still left up to you – how to secure the keys. You have to consider the fact that if you believe your keys have been compromised, you would have to come up with a new set of keys, and need to decrypt/encrypt all your existing data to protect it. Planning ahead will avoid you from having to do such things. 1143 5254AppAIcmp2.pdf 21 2/28/2005 6:49:49 PM