20012_IFIPS_CMS_Vertical_Data_Security_Using_pTrees

advertisement
Knowledge Discovery in Protected Vertical Information
Dr. William Perrizo
University Distinguished Professor of Computer Science
North Dakota State University,
Fargo, ND 58108
Abstract
The predicate-Tree (pTree) is a lossless compression of vertical data structures. The pTree is specifically
designed for efficient data mining of a bit-level vertical database. A very desirable feature of any data store is the
ability to secure the data from unauthorized access and manipulation. In fact, it is a requirement in many cases (consider the HIPPA requirements for medical records and the FERPA requirements for academic records). In this paper, we develop two approaches which contribute to this end. The first is a method of summarization of very large
data sets as they are captured so that the high level (and small volume) information can be used for data mining and
analysis purposes, but the data itself remains protected. The second is a storage architecture that anonymizes and
pads the individual vertical structures to secure the information they contain. The two methods can be used separately or combined for enhanced information availability and security.
Vertical Data and pTrees
A file or dataset, as represented in a computer, comprises one or more records, each record comprises one
or more fields, and each field comprises some number of bits that represent a piece of data. Traditionally, the data
set is viewed horizontally as a set of records, and the data are processed vertically, record by record. Database management systems (DBMS) based such traditional data representation are called row-store architectures.
In contrast to a data set "as a set of rows", a data set may also be viewed vertically. The data sets may comprise a column of data fields (attributes) and is stored contiguously as such. The DBMS designs that are based on a
data set "as a set of columns" are called column-store architectures [1]. As part of the column-store architecture, the
stored columns are densepacked, and this simple method of compression provides significant algorithm and coding
advantages compared to a row-store design. As an integral part of column-store DBMS, its operations are applied to
the compressed representation whenever possible [1].
At a next level within the column-store architecture, a column-based data set in the database may be structured as a set of vertical data bit-vectors. In this column-store architecture, a data set may be viewed as an ordered
set of sets of bits, such that each set of bits is a list of bit values found in a pre-determined position within the binary
representation of the field value that comprises the data column. This list of bit values is called a bit-vector and one
or more of these bit-vectors, when stored in a manner that mirrors the order of their position in the field's binary
representation, then encodes the column as a vertical data object.
Once structured in this way, the data may then be processed horizontally, bit by bit, using bit-level operations. Thus, the order of the bit-vectors stored in the database is a essential component of each data set. Importantly,
the structure of a data set also depends on the structure of the column-store of the data as well, when conceptualized
as a multi-dimensional array. The number of dimensions and the dimension sizes are important order relations to the
structure of the vertical data object. Together with the binary bit-order, these additional structures placed on the column-store format are the overall order substructure of the database when structured as a bit-based vertical database.
Each vertical data structure, as an n-dimensional set of ordered bit sets (representing a vector, matrix, or
tensor) has a unique pTree representation [2] [25]. A pTree representation is a tree structure having a zero bit value
for each internal node and a zero or one bit value for each of its leaves (external nodes). Based on the dimension of
the data structure, the vertical bit data are Peano-order stored into a compressed pTree representation such that its 2n
internal branch-nodes, together with the leaves, encode the complete bit placement for the data. During the Peanoordering compression, the pTree's leaves that all match at a lower branched level in the tree are rolled up to the next
branched (parental) level. This roll-up process continues upwards in the process of constructing the pTree until there
are no leaves at a level that all match.
Advantages to using the Peano-order (also called the Z-order), compared to a standard raster ordering, for
example, are its good clustering and compression capabilities. Vertical data structures, when represented by pTrees,
are particularly adapted to fundamental data mining operations applied to spatial data [3].
There are three basic pTree operations: AND, OR, and complement. These operations are defined bit-wise
against pTree-based vertical data structures. The pTree-based vertical structures provide a format that permits an
efficient methodology to mine data relationships [2],[4].
A pTree is thus a lossless and compressed representation of a vertical data object as a bit-based structure.
Fast algorithms for pTree generation and the associated operations have been implemented, demonstrating small
space and time costs when compared to using the original data structures for data mining [4]. These important features of pTree-compressed vertical objects may lend significantly to the already high performance obtained by using
column-store architectures for data mining [5],[1].
Anonymizing Vertical Data to provide Security
Just as a data set can be structured vertically, an entire database can be structured as a large collection of
pTrees (compressed or uncompressed bit-vectors) stored in a very specific order. The database, as such, is structured
for efficient storage and for processing by data mining operations. Knowing the order in which the vertical data
structures are stored is paramount to working with the data. Once structured in such a way, vertically defined bitbased operations can be applied to these vectors to extract answers to queries that are applied to the database.
Many DBMS have the ability to secure the data within the database. Given a vertically structured database,
how might the data be made secure and yet there is no compromise to the efficiencies inherent in the data structure
design? How can such a database be made secure in such a way that the pTree-based vertical data can be processed
without an expensive pair of encrypting and decrypting steps?
A solution was proposed in [23]: suppose there are N data objects, with N ≥ 1. The data objects have an
(implicit) essential order that is directly related to their use. By repositioning the objects, they can effectively appear
to be nonsensical and meaningless. If one is unable to identify which vertical slice or vertical map it is, then the data
is useless. A simple way to provide this anonymization, as a security measure, is to generate a random permutation
of size N, and use this bijective function to permute the position of the data objects [24]. By doing so, we would
implicitly know the correct positions for the data objects by using the permutation as a set of indexes. However, an
adversary would find it infeasible to rearrange the objects in hope of finding the correct order.
In general, a permutation of order n is an arrangement of n symbols. The set of permutations are known as
the symmetric group on n elements. There are n! distinct permutations of length n and they form a group with the
group operation of composition. There is the identity permutation (the permutation that does not rearrange any of the
n symbols) and each permutation has an inverse. Thus, a permutation is a bijective function defined on a finite set
(the set of symbols) [6]. A cyclic permutation is a permutation that comprises a single cycle that contains all the
listed elements from 1 to N [6].
Therefore, a first level of security for the database may be a secret rearrangement of the order of the vertical data pieces. If N is the number of vertical data vectors then a natural bijective function, such as a permutation,
will reorder the N data objects. The inverse of the permutation will place the N data objects into their original order.
The rearrangement of the vertical data order is called anonymizing and the permutation that is constructed to reorder
the vertical data objects is called the anonymizing permutation.
This same method can also be applied to the bit-vectors themselves to provide a second level of security in
the form of random bit padding. Padding is applied to the database vectors to add another level of protection against
adversaries and their ability to break the reordering scheme. Padding will randomly place the true starting position
of the data in the vertical structures somewhere in the middle of the perceived structure by filling the front (and
back) of the structure with a random length of random bits of data.
A second permutation would serve as a method to pad each bit-vector in the database front and back with a
random string of bits (0/1) prior to the database reordering process. The padding would be unique to each vertical
data structure. The padding would also account for the fact that bit-vectors in the database are not required to have
the same lengths. This second permutation is called the padding permutation.
Why Use Permutations
There are two important reasons that permutations are proposed as a solution to the problem of securing a
bit-based vertical database.
The first reason is based on the theoretical concern to provide a cryptographically strong reordering of the
vertical data objects in the database. [ It can be shown that pseudorandom permutations are cryptographically strong
(as discussed in the section titled X) when constructed with the novel algorithms presented in this report. ] Therefore, using random permutations can provide superior security against adversaries attempting to determine the original and meaningful order of the database objects.
The second reason is based on the practical concern of how to securely store and then use the secret indexes
with which the database was reordered. Regardless of the method for reordering the database, the bijection between
the original order and the new secure order must be readily available to the database each time it is called upon to
perform a data operation. Therefore, this bijection, as a list of pointers, must be stored securely, meaning it must be
encrypted and a database user must have the decryption half of the key. Depending on the size of the database, this
bijection can be large, and the decryption process, even as a one time occurrence for each user, may be timeexpensive.
As a reordering bijection, a random permutation can be deterministically generated with a single random
number between 1 and N, where N represents the total number of vertical data objects (bit-vectors) in the database.
The value of N will almost certainly require a binary representation with no more than 128 bits. Therefore, using
random permutations can provide a significantly compressed version of the required order-reorder bijection that can
be repeatedly reconstructed in O(n) time.
To apply permutations to the security of a vertical database, a user (or the DBMS) would be required to securely know three numbers. These numbers are: (i) the (likely 32-bit) key to generate the anonymizing permutation,
the (ii) the (same-sized) key to generate the padding permutation, and (iii) the length M of the padding permutation
if the user has chosen to pad with a length less than N (as explained in the section titled X). Therefore, a user can
secure the data in the database by using three 32-bit integers.
In fact, these numbers can reside within the DBMS as part of its small encrypted parameters, which can
then be manipulated by the security operations in the database for such things as periodic administrated re-securing
of the database. Therefore, user would need a single secret key, used to gain secure access to the database through
the DBMS, and thus allowing the DBMS to decrypt to database security keys, reconstruct the permutations, and
then have them available to perform requested operations with the data.
Defining The Permutations
In this report, we present a set of novel permutation generating algorithms to securely pad and then reorder
the structures that make up a vertical database. Each algorithm constructs a permutation that will serve as a set of
indexes to the reordered vertical data vectors. Each algorithm constructs a permutation in a manner that makes it
infeasible for adversaries to reconstruct the permutation without knowing how it was constructed. By padding and
ordering the bit-vectors according to the padding and anonymizng permutations, the data become secure from unwanted scrutiny.
We therefore wish to construct a permutation such that: (1) the permutation is a random permutation, (2) it
is easily reconstructed whenever needed, and (3) it comprises a single full-length cycle that permutes the positions
of all the vertical data objects in the database.
Given such a permutation, the vertical data objects are then rearranged accordingly, using the permutation
as a list of pointers (exactly equal to the full-length cyclic permutation that it is). To apply a vertical data-based query to the database, the DBMS reconstructs the permutation (the secret reordering), and then applies it as a list of
pointers (implemented as indirect addressing), to query the vertical database using vertical data-based bit operations.
The Anonymizing Algorithms
The anonymizing algorithms are designed to securely reorder vertical data. The algorithms are the SOSEED,
the SORANK, and the SORAND algorithms. The three algorithms are illustrated in Figures 1, 2, and 3.
These three algorithms are based on four high-level operations. These operations are: (1) computing a random permutation of length N [7],[8],[9], (2) (optionally) determining the permutation rank (its position in the list of
all N-length permutations) [7],[10], (3) restructuring the permutation into a permutation with a full-length cyclic
structure [11], and (4) (optionally) physically reordering the vertical data as specified by the secure permutation.
As can be seen, there are two operations to the anonymizing process specifically designed to provide security: constructing a random permutation (RP), and restructuring the RP with a cyclic permutation.
A subtle part of the algorithms is the use of a full-length cyclic permutation. Using the standard method for
generating a random permutation [9] (see Figure 4), there is no guarantee that the permutation will reposition every
object (bit-vector) that comprises the data set to be secured. That is, there can be parts of the permutation that will
not place a bit-vector in a different position, and the number of these "identity" pieces of the permutation can be
large compared to the size of the permutation itself. That is, because the permutation is truly random, it may randomly not reorder a large number of the data structures, and may therefore be a poor security reordering. This problem can occur regardless of the method used to generate a random permutation, simply by definition.
A well-known theorem in group theory states that, for any permutation , the cycle structures of a permutation and ( o o
) are the same. Naor and Riengold [11] have shown that if is a random permutation
then ( o o
) is distributed uniformly among the permutations with the same cycle type as . By multiplying the random permutation by a cyclic permutation, the algorithms produce a random permutation that will move
every data object to a different position.
The SOSEED Algorithm
The SOSEED algorithm is presented in Figure 1. A concern with the SOSEED algorithm is that the user must
depend on having the same pseudorandom number generator (RNG) that was used to construct the original random
permutation. Over time and across platforms, this concern may constrain the user to specific pseudorandom number
generator algorithms, perhaps even related its specific platform implementation, to use the securely reordered vertical data.
This concern may be a limitation regarding security. This concern may be substantial, as it relates to a deterministic process and our understanding to use the process to simulate randomness.
However, over the last 10-15 years, efforts have been made to resolve this issue. For example, the
Mersenne twistor and its streaming cipher version (CryptMT3) are available as open-source (C++) random number
generators. Using the open-source software, these RNG have been consistently implemented across platforms
[12],[13].
The SORANK Algorithm
As an alternative, and perhaps a more general algorithm, the SORANK algorithm (Figure 3) does not depend
on a RNG once the reordering permutation is constructed. The user is only required to provide N and the reordering
permutation's rank K. The user's secret key is the rank of the permutation based on its position in the list of ordered
permutations. The additional parameter for the algorithm is the pre-defined rank and unrank functions [7].
A fundamental property of the symmetric group is the order in which the permutations are constructed from
first to last. The construction order for permutations is called the lexigraphical order if the permutations are generated as they would appear if they were sorted numerically. When constructing the set of permutations, a permutation's
position in the list is called its rank.
There are many ways to construct the list of permutations, and so a permutation's rank is directly related to
the construction process. The construction process comprises two functions, called the ranking and unranking functions. These two functions are inverses of each other, and are defined as follows: rank(P) is the position of the permutation P in the construction order, and unrank(m,n) determines the permutation P in position m of the n! permutations of n objects.
There are many available rank and unrank functions. Recently, Myrvold and Ruskey [10] has proposed two
sets of rank and unrank functions that are computable in linear time. The first Myrvold and Ruskey rank function is
the rank function proposed for the SORANK algorithm (Figure 2). The Myrvold and Ruskey rank-unrank function
pair is illustrated in Figures 5 and 6.
The SORAND Algorithm
An important property of a rank/unrank function pair is the ability to generate random permutations. If a
number is randomly selected from 1 to n! and then it is unranked, the random number produces a random permutation. This ability provides the rationale for the third algorithm, since randomly choosing a number between 1 and n!
is equivalent to choosing a random permutation. The first Myrvold and Ruskey unrank function (Figure 6) is the
unrank function proposed for the SORAND algorithm.
Un-Anonymizing The Vertical Data
Whenever a user wishes to apply operations to the set of secure pTree-based vertical data, the user simply
enters two things: the permutation key and the associated full-length cycle. If the user has originally reordered the
vertical data using a well-known cyclic permutation, then the only important part of the security key is the permutation's rank in the pre-defined construction order. For example, well-known and easy-to-remember cyclic permutations on n objects are: (2, 3, 4, ..., n, 2) and (n, 1, 2, ..., n-1).
Figure 7 illustrates the process of using the security key to perform operations with the pTree-based vertical
data structures. The process reverses the SORAND algorithm. Given a permutation, any indirect addressing scheme
will implement a (direct) application of bit operations. The advantage of this method is the ability to unanonymize
the data bit-vectors without additional work.
The Padding Algorithm
Even though each of the bit-vectors have been repositioned to secure the database, it may be possible that
an adversary can re-anonymize the bit-vectors such that parts of some data files make sense. If each of the bitvectors were padded at the beginning of the vector with a random string of 0/1 values, this random front-end padding would significantly hinder this possibility. For this reason, a padding algorithm was developed to place unique,
random, length-varying strings of 0/1 at the front-end and at the back end of each bit-vector. With a high probability,
no two bit-vectors will have the same length and random strings.
Therefore, a second level of security may be applied to the vertically structured database by applying one
of the SO algorithms a second time, producing a single random permutation that is used to pad each of the bitvectors.
Under normal circumstances, the size, or length, of a pTree structure will vary with the type of data set that
it represents. Different data files in a database normally comprise different numbers and structures of records, depending on the purpose of the data file. The padding algorithm adds a random number of random bits (0/1) to the
front and back of a bit-vector, regardless of its original length.
The random padding scheme uses one of the SO algorithm and a stream cipher, such as the publicly available stream cipher called CryptMT3 [13]. The random padding scheme would be applied to each vertical data structure separately, thereby padding each data structure uniquely. Other cryptographically strong stream ciphers are
available as well [14].
To begin, the user selects one of the SO algorithms. The user provides the algorithm with a key K and the
length N (where N is the number of database bit-vectors), and the SO algorithm produces a pseudorandom permutation called the padding permutation. Each bit-vector in the database is then padded as follows: (1) the ith bit-vector is
assigned an amount of padding in front of the bit-vector equal to the ith value of the padding permutation, (2) the
stream-cipher is used to place a random string of bits (0/1) at the front end of the bit-vector with its padding length
determined in the first step, (3) the ith bit-vector is next assigned an amount of padding in back of the bit-vector
equal to the ith value of the inverse of the padding permutation, and (4) the stream-cipher is used to place a random
string of bits (0/1) at the back end of the bit-vector with its padding length determined in the third step.
Iterating these four steps through the database replaces each of the original bit-vectors with its front-back
padded bit-vector version.
The issue with this straight-forward algorithm relates to the size of the database itself. If the number of bitvectors in the database is large, then the bit-vectors will each be padded, counting the number in front and the numbers of bits in back, with a number of bits at least equal to or greater than the total number of bit-vectors in the database. This amount of padding therefore may be significantly more than the amount of padding needed and may significantly increase the overall size of the database.
Therefore, the padding algorithm can be modified slightly to allow the user to present it with a number M
that is smaller than N. Given the number M, the algorithm would generate a random permutation of length M, and
would then use this padding permutation repeatedly in the algorithm until all bit-vectors are padded.
Assuming M << N, then L1 = N/M and L2 = N-(M*N/M). The algorithm would use the M-length padding permutation L1 or L1+1 times, depending on whether M divides N evenly. If M does not divide N evenly, then
the algorithm would use the padding permutation through to the L2 position only.
Random Permutations Are Sufficient
Applying random permutations provides a significant level of security to the database without being burdened with encryption and decryption during the course of using the database for data-mining activities. An important question to ask is whether using random permutations are a secure method for anonymizing and padding the
pTree-based vertical data structures.
Permutations, as bijective functions between finite sets, form the basis for the fundamental building block
in cryptography, called the block cipher [15],[16]. The block cipher is a symmetric key cipher that operates on a
fixed-length group of bits, called a block. As input, these bits are called plaintext, and, as output, these bits are called
ciphertext. Importantly, these bits form the finite set to which a permutation is applied. The permutation and its associated application to the plaintext is controlled by the secret key K.
Clearly, then, a database management system may apply a secure block cipher to encrypt the reordering of
the vertical data structure. Using format-preserving encryption and a secret key, the correct order of the data structures would serve as the plaintext and the corresponding ciphertext would serve as the newly reordered (and encrypted) positions. Hence, the ciphertext output by a secure block cipher is the reordering random permutation.
One issue with a block cipher in general, as it relates the problem discussed in this report, is that its cyphertext output is much longer than its plaintext input. A block cipher expands the data, sometimes significantly. Another issue, again as it relates to our problem, is that the cyphertext is not formatted to match the input. These two issues together are called the format-preserving problem and block ciphers that solve them are called formatpreserving encryption [17].
Black and Rogaway [18] developed a block cipher algorithm called the prefix algorithm that is formatpreserving encryption. The algorithm is easy to understand and easy to implement. The method effectively produces
a random permutation (as an ideal block cipher) and then uses it to encrypt the data. The algorithm encrypts each of
the input values using a typical block cipher, such as AES [19], and records the output along with the input. The
algorithm next sorts this list based on the encrypted value. The rearranged input values, from 1 to N, is then a permutation that can be applied to input values from 1 to N, thereby producing a format-preserved output.
The prefix cipher produces a format-preserving random permutation. The prefix cipher applies a standard
encryption algorithm (with O(1) time) and a sorting algorithm (with O(nlogn) time). This algorithm was modified
slightly by Spies [20] by adding "cipher tweaking" [21] so as to increase the efficiency of altering the cyphertext
permutation without having to rebuild the table. The Spies modified prefix cipher makes it an excellent alternative to
the SO algorithms.
The permutation derived from the prefix cipher is the same type of permutation produced by the SOSEED,
SORANK, and SORAND algorithms presented in this report. The SO algorithms apply random number generation (with
O(1) time) and ranking and unranking functions (with O(n) time to produce the random permutation. As part of their
reported research, Bellare et al [17] provide theoretical analysis for general ranking methods applied to formatpreserving encryption, which they call the "rank-then-encrypt" approach.
The proof given by Black and Rogawaya about breaking the prefix cipher reduces to breaking the underlying cipher. In the SO algorithms, if the construction of a pseudorandom permutation is ideal, then it is equally likely
to be any of the n! permutations. It is therefore secure because it is provable infeasible for an adversary to distinguish it from a truly random permutation.
Using the same proof method presented by Black and Rogawaya for the prefix cipher (now considered a
standard argument), the SO algorithms are based on RNG and so will remain secure as long as the RNG is secure.
The Blum RNG [22] and the Mersenne Twister RNG [12] are two examples of known secure RNG (generators that
are not predictable in polynomial time). The implementation of the Mersenne Twister RNG and its stream cipher
derivative (CryptMT3 [13]) are publicly available via the internet.
Recently, Salmon et al [14] have reported encouraging research and performance analysis for highly parallelizable and cryptographically strong RNG. The implementation of RNG are publicly available via the internet.
Implementation And Performance
[ The process of constructing and adding two numbers (as address offsets) to the address of a pTree (i) to
find the true position of the pTree in the reordered database and (ii) to find the true starting bit of the pTree within
the padded bit-vector adds a minimal amount of time to vertical data mining operations. [Details To Follow] ]
[ The database will most likely grow and shrink, as it reflects the growth and shrinkage of its data files during expected use and maintenance of such a system. It is easy to understand how the security algorithms presented in
this report can be adapted to expected changes to the data. For example, if data are inserted or deleted from an existing file, the representative set of pTrees will change, but their reordered position in the database do not change and
the front-back padding do not change. Also, if a new data file is created, then it can be secured into the database at
the next round of periodic administrated re-securing of the database. Also, if a data file is deleted in its entirety, it
can be marked as such and remain in the database until the next round of periodic administrated re-securing. [Details
To Follow] ]
References
[1]
Stonebraker M, Abadi D, Batkin, et al. C-store: A column-oriented DBMS. Proceedings of the 31st Very
Large Database Conference, Trondheim, Norway, 2005.
[2]
Perrizo W, Ding Q, Ding Q, Roy A. Deriving high confidence rules from spatial data using Peano count
trees. Pacific-Asian Data Mining, Conference, Springer-Verlag Lectures In CS, 2001; 2118: 91-102.
[3]
Ding Q, Khan M, Roy A, Perrizo W. p-tree algebra. ACM Symp. on Applied Computing 2002; 426-431
[4]
Perrizo W, Jockheck W, Perera A, Ren D, Wu W, Zhang Y. Multimedia data mining using p-trees. Lecture
Notes In Computer Science 2003; 2797: 100-117.
[5]
Ding Q, Ding Q, Perrizo W. PARM - an efficient algorithm to mine association rules from spatial data.
IEEE Transactions Systems Man Cybernetics- Part B 2008; 38(6): 1513-1524.
[6]
Contributors. Symmetric group. Wikipedia, en.wikipedia.org/-wiki/Symmetric_group, 23-September-2011.
[7]
[8]
Skiena S. The Algorithm Design Manual, 2nd ed. Springer-Verlag; United Kingdom, 2010, pages 448-451.
Sedgewick R. Permutation generation methods. ACM Computing Surveys 1977; 9(2): 137-164.
[9]
Durstenfeld R. Algorithm 235, random permutation G6. Communications of the ACM 1964; 7(7): 420.
[10]
Myrvold W, Ruskey F. Ranking and unranking permutations in linear time. InfoProcLetters01; 79: 281-283
[11]
Naor M, Reingold O. Constructing pseudo-random permutations. Journal of Cryptology 2002; 15: 97-102.
[12]
Matsumoto M, Nishimura T. Mersenne twister: A 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Transactions on Modeling and Computer Simulation 1998; 8(1): 3-30.
[13]
Matsumoto M, Saito M,Nishimura T, Hagita M. CryptMT3 stream cipher. Lectures In CS 2008; 4986: 7-19
[14]
Salmon J, Moraes M, Dror R, Shaw D. Block cipher. Parallel random numbers: As easy as 1, 2, 3. SC11ACM Conference; Seattle, Washington, Nov 12–18, 2011.
[15]
Contributors.
Block
cipher.
Wikipedia,
(en.wikipedia.org/wiki/Block_cipher).
[16]
Goldwasser S, Bellare M. Lecture Notes on Cryptography. MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA, July 2008.
[17]
Bellare M, Ristenpart T, Rogaway P, Stegers T. Format-preserving encryption. Cryptology ePrint Archive,
Report 2009/251, 2009 (eprint.iacr.org).
[18]
Black J, Rogaway P. Ciphers with arbitrary finite domains. Lectures In Computer Sci. 2002; 2271: 114-130
[19]
Contributors. Adv encryption standard (en.wiki-pedia.org/wiki/Advanced_encryption_standard), 04-11-11
[20]
Spies T. Format preserving encryption: www.voltage.com. Database and Network Journal 2008; 38(6).
[21]
Liskov M, Rivest R, Wagner D. Tweakable block ciphers. Lectures In Computer Science 02; 2442: 31-46.
[22]
Blum L, Blum M, Shub M. A simple unpredictable pseudorandom number generator. SIAM Journal of
Computing 1986; 15(2): 364-383.
[23]
Perrizo, William. NDSU Notes on pTree-based Pretty Good Protection of Data (pPDP-D), 121-NDSU17358, October, 2011.
[24]
Brewer, James. NDSU Vertical Data Security for pTree-based Data Mining, Term Paper in Introduction to
Database Systems, 121-NDSU-17358, December 15, 2011.
[25]
Treeminer, Inc. The Vertical Data Mining Company, 175 Admiral Cochran Drive, Suite 300, Annapolis,
Maryland, 21401 (240) 389-0750, http://treeminer.com/
The
Free
Encyclopedia;
05-November-2011,
Figure 1. The SOSEED algorithm for secure vertical data reordering.
Figure 2. The SORANK algorithm for secure vertical data reordering.
Figure 3. The SORAND algorithm for secure vertical data reordering.
Figure 4. Durstenfeld's random permutation algorithm.
Figure 5. The Myrvold & Ruskey ranking permutation algorithm [10].
Figure 6. The Myrvold & Ruskey unranking permutation algorithm [10].
Figure 7. Process for using securely ordered pTree-based vertical data.
Download