2. RELATED WORKS We present in this section the most advanced secure access control evaluation that have been devised up to now which are based on encryption and show their limitations. Super encryption In [MS04], Miklau and Suciu rely on XML encryption which keeps the structure of the original XML document where subparts of it are encrypted in place in the document. When receiving the document, the user decrypts subparts of the document depending of decryption keys in his possession. The decrypted data can be in turn a subtree containing also encrypted parts. The user can continue the decryption process recursively as long as he got the decryption keys. One of the most interesting aspect of their work is that they compile in the encryption logical condition to access the data, e.g., an access to a subtree may be granted for all the users in possession of the key K1 and K2 or simply K3. This is achieved by encrypting the subtree P with an inner key X3 and by using extra nodes next to P as follows: a node containing a key X1 encrypted with K1, another containing the key X2 encrypted with K2 and the last one containing the key X3 encrypted with K3 such that X3=S1 S2, where denotes the XOR operator. If the user has got the key K3, he will have access to the key S3 and so to P. In the same way, owning the keys K1 and K2 will give access to X1 and X2 and so to P after computing X3 from X1 and X2. While this solution provides an elegant way to reduce the number of keys to be distributed to the users (in the extreme case, only one key is needed per user), it suffers from many limitations. First, it does not solve the problem of the dynamicity of rights. Indeed removing a right to a user, incurs to re-encrypt parts of the document he was previously authorized to see using a different encryption key. This process is particularly complex considering super-encryption. Second, the decryption cost incurred by recursive encryption and the use of inner key adds in the cryptographic initialization process making it inappropriate for device with low processing capacities. Finally, as no compression is considered, the overhead incurred by XML encryption and inner key can be important. Well-formed encryption The previous solution does not perform well when a user is interested in a rather small subset of the document. Indeed, there is no indexation structure to converge towards relevant part in the previous solution. The idea developed in [Carminati] is to rely on well-formed encryption which is to encrypt tags, attributes and values in place in the document depending of the access control policies. A query on the structure can be performed easily on the encrypted document, encrypting in place in the XPath expression the tags, attributes and values (e.g., /a/b[c=5] can become /eZ/r5[er=53]). To support selection on values, they rely on [Hacigumus] and consider index partitioning which consists of appending to each encrypted value an index. When considering numerical values, the index tells to which interval the value belongs (e.g., values between [1, 100] has an index value of 1). In this scheme, the relative order is conserved and values having a greater index have a greater value. This enables selections using inequalities. These indexes are appended in the form of extra elements in the document. Finally, the problem when dealing with queries is to guarantee completeness. This is achieved extending the Merkle Hash Tree (reference) to an XML tree. Each internal XML node of the tree is associated to a hash computed as the hash of the concatenation of the tag name, its content and the hash of all its child nodes. While this solution provides a compilation of many existing techniques to secure the access control, it suffers from many weaknesses. First, the well-formed encryption can be subject to inference attacks (e.g., statistics on the number of occurrences and inference on structures). Second, the encoding scheme has a dramatic overhead: when considering secure encryption function such as 3DES or AES (which produces 64 bit or 128 bit blocks), tags and values has to be padded accordingly which can drastically increase the size of the document. Moreover, index information and schema information (which basically tells the key used for encryption) added as extra elements contributes to the space overhead. Third, extending the Merkle Hash Tree which originally operates on binary tree leads to dramatic overhead: when requesting an element having n siblings, their n hashes (SHA-1 considers hash of 20 bytes) are sent along with the answer. Finally, this model does not support well updates. Indeed, when access controls are updates, data needs be re-encrypted accordingly. In the context of XML filtering [expedite] and XML routing [suciu avec SIX], the authors devised a streaming index which consists of appending to each subtree its size, giving the possibility to skip it. However no information about the content of the subtree is provided making its use very limited (e.g., if the query /a/b only the sibbling of b can be skipped however when considering //, no skip can be done). Delivery When the elements are delivered to the terminal, we use a simple representation of data based on tag compression. Basically, the starting tag is represented using an id, and the ending tag using the null byte. Characters are output in place as is. Tags encoding are prefixed using a bit set to one, and characters by a zero bit. The terminal then replace tag id by the proper tag names using a tag dictionary. In case, a positive rule is nested in a negative rule, an orphan subtree can be output. In this case, the tags (found in the tag stack) linking the subtree to the last output element are appended to the orphan subtree in order to keep the document structure consistent. Pending delivery The pending parts are externalized to the terminal in an encrypted form using a temporary encryption key. If later in the parsing ,the pending part is found to be authorized, then the temporary key is delivered and discarded otherwise. A different temporary key is generated for every pending parts which depends of different predicates. We refer in the following to output block as an contiguous encrypted output encrypted with the same key or contiguous clear-text output. When issuing an output block, we have to consider the case that some of the previous issued output blocks may be discarded and so to find a way to connect the block to the last authorized (possibly pending) output block which may not be known in advance. To this end, we append to each output block the following information. The last tag of the output block is marked with a random number which serves as an anchor (this marked is also appended in the tag stack). The list of tags to connect to the last authorized output block (either clear-text output or encrypted block which keys has been delivered). For each tag in the list, we append the mark if present. This way, the terminal can easily reconstruct the document. The attentive reader may notice that a user can infer from discarded output blocks their possible values (e.g., we know that a salary is less than $1000 since we know that a rule conditioned by the salary is defined). To tackle this problem, fake discarded blocks may be issued randomly. Coping with multiple pending predicates When coping with multiple pending predicates, the problem which arises is on how to manage the different buffers efficiently because the delivery of a subtrees may depend on complex logical expressions of predicates. As the number of pending predicates is likely to be small (less than a dozen), we can modelize the logical expressions using bitarray representing the truth table. The logical expression ab which conditions the delivery of an expression can be represented as in Figure YYY. Only the gray parts are encoded and stored in memory, the rest is implicit. So we consider two vectors: V={(p, d)} a list of predicate identified by the predicate id and the depth at which they occur, B equals to the truth table results which represents the logical expression. The blocks which share the same logical expression E are grouped in classes. Each class are associated with an encryption key which serves to encrypt the blocks. In the following, we describe how the logical expression can be constructed incrementally and how to evaluate them. For ease of understanding, we consider here that only one pending predicate is associated to each rule. The extension to manage multiple predicates per rule is trivial. When considering a subtree which is conditioned by an expression P and an inner one on which a pending rule R having a pending predicate F applies, the resulting logical expression is: E = P F if E is a positive rule and E = P F if it is negative. The new vectors V’ and B’ of E can be constructed from V and B of P as follows. First we copy V to V’. The predicate is inserted in V’ in the (predicate id, depth) order and B’ is computed as follows: B is first expanded by duplicating every set of 2index(p) bits (in Figure 1, (b) is constructed from (a) by duplicating each row and adding the c column to build the new result), index(p) being the position of p in V. Then for the column which is inserted, we consider an alternation of 2 |v| - index(p) bit of 0 and 1 (here an alternance of one bit block of 0 and 1). Finally we compute the resulting bits using a bitwise AND. Then V’ and B’ is compared against the other logical expressions of the other classes. If found in another class, then the two classes merge. (a, 1) (b, 2) ab 0 0 0 0 1 0 1 0 0 1 1 1 (a, 1) (b, 2) (c, 2) abc 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 1 1 1 1 0 (b) Class 2 (a) Class 1 (a, 1) (b, 2) ab (a, 1) a 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 1 (c) Class 1-2 (d) Class 1-2 Figure 1. Multiple pending predicate management When a predicate p is found to be false the row which consider a true value are removed, that is all the rows in the intervals [k*2index(p)+1, k*2index(p)+1 index(p) +2 ] are removed and the rest is shifted backward to remove the gaps, k varying from 1 to n, n being the number of predicates. Conversely if a predicate p is found to be true, every rows in the interval [k*2index(p), k*2index(p) + 2index(p)] are removed. Finally, when all the bits of B are set to one, we can conclude that the associated buffers is to be delivered and if they are all set to zero, they must be discarded. In Figure 1.c we can see that the c which was found to be false are simplified to the array in (c). As you can notice the truth table is the same as the one in the class 1 (in (a)). These two classes are merged into the class 1-2. When the element b is found to be false, the table is simplified as in table (d). As we can see the result is made up of 0 bits so the logical expression is false and all the elements of this class can be discarded. Each class is in charge of a decryption key, and when the logical expression is resolved (true or false) the key are delivered or discarded. When two classes merge, the key of the first class will serve to encrypt the output to come. 6. ACCESS RIGHT MANAGEMENT As access rights can evolve, we design a process to refresh the access rights in the smartcard from the unsecured server. The access rights are stored encrypted on the server and can only be decrypted by the smartcards. As access rights are defined incrementally, we identify them with a timestamp which is incremented for every new access rights so we can detect if an access right is missing. For each of them, we store the document timestamp, user id and encrypted rule definition. Note that only the rules and the signatures are encrypted. Finally the first data block of the document contains the document timestamp (incremented every time the document is updates) and the timestamp of the access rights when the document was modified. To refresh access rights, the smart card requests all the access rights with a timestamp grater than the one of the last connection. 1 TD1 John E(rules) sig 2 TD1 Mary E(rules) sig 3 TD2 John E(rules) sig Figure XXX When considering updates on document, many situation have to be considered. Suppose that a document has been stored (TD1) on the server with its associated rules TR1. Now suppose that the owner of the document updates the document (becoming TD2) and/or the access rights (becoming TR2 which is actually the new access rights from the last connections). When the client request the document, the server may not be trustworthy and four situations may occur: - the server sends TD2 and TR2. The date and the access rights are consistent and up-to-date. - the server sends TD1 and TR1. The data and the access rights are consistent. The user could have missed grant access or deny access. As the user was authorized to see the denied parts at a prior date, this case does not give access to extra information and do not violate the confidentiality constraints. - the server sends TD1 and TR2. Access rights giving extra access can give access to denied parts of TD1 (which have been modified now) for which the user did not have access. That leads to confidentiality leak. This situation is detected thanks to the timestamp of the document appended with the access right. - the server sends TD2 and TR1. Access rights which remove access to subparts of TD2 could have been defined in TR2. In this situation, the user can have extra access to these subparts. That leads to confidentiality leak. This situation is detected thanks to the last timestamp of the access right coming with the document. If we consider the case where access rights are updated but not the document, then if the server did not reflect the new access rights, the user will have in the worse case get access to subparts of the document he has previously access to which do not violate confidentiality constraints. Optimization issues In order to reduce the number of access rights to be fetched, that is by skipping access rights of the other users we append to each access right extra information telling the last timestamps of the other users. This can be reduced using hash partitions on the users, and to give the last timestamp for each partition. In this situation, a user fetches the last timestamp and converge to his access right thanks to the timestamps.