Abstract – This work considers a secured proximity querying of quantitative data from an un-trusted server.
The data is to be revealed only to trusted user and not to anyone else. The need for security may be due to the data being sensitive, valuable or otherwise confidential. Given this setting, this work involves techniques for authentication, encryption, transformation and query processing to provide trade-offs between query cost and accuracy. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of proximity queries.
Keywords- Query Processing, Information Security,
Information Retrieval..
I.
Jeevitha. G & Rosaline Nirmala. J.
NTRODUCTION interesting phenomena. In this scenario, time series can be represented as vectors of values in chronological order . At query time, a user specifies an example time series q and wishes to obtain those time series most similar to q; the system then retrieves the time series p in the database with the minimum distance to q.
Many applications in science and business rely on similarity search of metric data other than time series and vector data. Computer-aided gene sequencing uses the similarity between an unknown sequence from one species and a known sequence from a closely related species to predict the former’s function . In drug design, pharmacists search for the most similar graph structures to their quest for a suitable molecule. In general, the above diverse scenarios have the following common characteristics: valuable data in a metric space are searched based on a similarity measure. When this data is outsourced, they must be secured against leaks or attacks.
In digital measurement and engineering technologies enable the capture of massive amounts of data in fields such as astronomy, medicine, and seismology. The effort of data collection and processing as well as its potential utility for research or business, create value for the data owner. He wishes to store them and allow access by himself, colleagues, and other
(trusted) customers. This can be supported by outsourced servers that offer low storage costs for large databases.) For instance, outsourcing based on cloud computing is becoming increasingly attractive, as it promises pay-as-you-go, low storage costs as well as easy data access. However, care needs to be taken to safeguard data that are valuable or sensitive against unauthorized access. In this context, we call any item in a data collection an object, individuals with authorized access query users, and the entity offering the storage service the service provider.
To analyze the data, authorized scientists may search for similar patterns in collected time series, such as certain daily or hourly subsequences that indicate
A. OVERVIEW
The popularity of Cloud Computing and
Outsourced database rapidly increasing Digital measurement and engineering technologies enable the capture of massive amounts of data field.
Searching is a fundamental problem in computer science, present in virtually every computer application.
Simple applications pose simple search problems, whereas a more complex application will require, in general, a more sophisticated form of searching. The search operation traditionally has been applied to
“structured data. That is, a search query is given and the number or string that is exactly equal to the search query is retrieved. Traditional databases are built around the concept of exact searching: the database is divided into records, each record having a fully comparable key.
ISSN (Print) : 2319 – 2526, Volume-2, Issue-4, 2013
58
International Journal on Advanced Computer Theory and Engineering (IJACTE)
Queries to the database return all the records whose keys match the search key. More sophisticated searches such as range queries on numerical keys or prefix searching on alphabetical keys still rely on the concept that two keys are or are not equal, or that there is a total linear order on the keys.
Even in recent years, when databases have included the ability to store new data types such as images, the search has still been done on a predetermined number of keys of numerical or alphabetical types. With the evolution of information and communication technologies, unstructured repositories of information have emerged. Not only new data types such as free text, images, audio, and video have to be queried, but also it is no longer possible to structure the information in keys and records. Such structuring is very difficult
(either manually or computationally) and restricts beforehand the types of queries that can be posed later.
Even when a classical structuring is possible, new applications such as data mining require accessing the database by any field, not only those marked as “keys.”
Hence, new models for searching in unstructured repositories are needed.
In general, the process of outsourcing is the following: In the construction phase, the data owner creates the MS objects from the original raw data, sends these MS objects to a similarity cloud for indexing and the raw data to data storage. In the search phase, any authorized client can query the similarity cloud to obtain
IDs of the relevant objects referring to original data objects that can be subsequently retrieved from the raw data storage of outsourced secure similarity search.
Resource demanding process (the search itself) should be performed on the server-side as much as possible (clients querying the server might be simple devices without big computational power).
Communication cost between the client and the server should be as low as possible (in optimal case, client sends only initial search request and then receives result from the server).
Data should be stored on the server in a secure way so that a potential attacker can gain as little information about the data as possible.
Levels of Privacy Intuitively, the security requirement goes against the efficiency objective. If most of the computations should be performed on server side, the server has to have enough information about the data to process such task efficiently.
Hence, the right balance between the security and efficiency should be found for each specific application setting.
In this section, several approaches to (secure) outsourcing similarity search are described. We can map lots of databases to metric spaces or similarity spaces, which are finite sets with either a distance measure (a metric) or a similarity function. We can create a data structure that, given some new data set, enables us to compute its nearest neighbor which is the element in our database with minimal distance or maximal similarity to the new data set. This is called nearest neighbor search it has its roots in computational geometry but is used in lots of other application areas like data mining.
Our contributions are as follows: We present three transformation techniques that satisfy the above requirements. They represent various trade-offs among data privacy and query cost and accuracy.
In our first solution, we propose an encrypted index-based technique with perfect privacy, but multiple communication rounds. This technique flexibly reduces round trip latency at the expense of data transfer.
For our second solution, our private anchor-based indexing guarantees the correct answer within only 2 rounds of communication. Retrieval is accelerated by bounding the range of potential nearest neighbors (NN) in the first phase.
Our third solution limits communication to a single round, and also returns a constant-sized candidate set by computing a close approximation of the query result.
II. SYSTEM DESIGN
Figure depicts our scenario for outsourcing data. It consists of three entities: a data owner, a trusted query user, and an untrusted server. On the one hand, the data
ISSN (Print) : 2319 – 2526, Volume-2, Issue-4, 2013
59
International Journal on Advanced Computer Theory and Engineering (IJACTE) owner wishes to upload his data to the server so that users are able to execute queries on those data. On the other hand, the data owner trusts only the users, and nobody else (including the server). The data owner has a set of original objects (e.g., actual time series, graphs, strings), and a key to be used for transformation. First, the data owner applies a transformation function (with a key) to convert original objects into a set of transformed objects, and uploads to the server .
The server builds an index structure on the sets in order to facilitate efficient search. In addition, the data owner applies a standard encryption method (e.g., AES) on the set of original objects; the resulting encrypted objects (with their IDs) are uploaded to the server and stored in a relational table (or in the file system). Next, the data owner informs every user of the transformation key. In the future, the data owner is allowed to perform incremental insertion/deletion of objects .
At query time, a trusted user applies the transformation function (with a key) to the query and then sends the transformed query to the server . Then, the server processes the query, and reports the results back to the user. Eventually, the user decodes the retrieved results back into the actual results. Observe that these results contain only the IDs of the actual objects. The user may optionally request the server to return the actual objects that correspond to the above result set.
III.
YSTEM
ODULE
A. Authentication
Authentication process helps the users of the service to protect his data from illegal access. Here we provide authentication for two entity
1)Consumer Authentication: Helps the user to view the data uploaded by his owner.
2)Owner Authentication: Helps owner to protect his process.
There are three types of techniques for doing this.
The first type of authentication is accepting proof of identity given by a credible person who has evidence on the said identity, or on the originator and the object under assessment as the originator's artifact respectively.
The second type of authentication is comparing the attributes of the object itself to what is known about objects of that origin. The third type of authentication relies on documentation or other external affirmations.
B. Encryption Technique
In cryptography, encryption is the process of encoding messages (or information) in such a way that eavesdroppers or hackers cannot read it, but that authorized parties . In an encryption scheme, the message or information (referred to as plaintext) is encrypted using an encryption algorithm, turning it into an unreadable ciphertext. This is usually done with the use of an encryption key, which specifies how the message is to be encoded. Any adversary that can see the ciphertext should not be able to determine anything about the original message. An authorized party, however, is able to decode the ciphertext using a decryption algorithm that usually requires a secret decryption key that adversaries do not have access to.
For technical reasons, an encryption scheme usually needs a key-generation algorithm, to randomly produce keys.
1. Triple DES (Data Encryption Standard)
It is the common name for the Triple Data
Encryption Algorithm (TDEA or Triple DEA) block cipher, which applies the Data Encryption Standard
(DES) cipher algorithm three times to each data block.
The original DES cipher's key size of 56 bits was generally sufficient when that algorithm was designed, but the availability of increasing computational power made brute-force attacks feasible. Triple DES provides a relatively simple method of increasing the key size of
DES to protect against such attacks, without the need to design a completely new block cipher algorithm.
2. Algorithm
Triple DES uses a "key bundle" which comprises three DESkeys K
1
, K
2
and K
3
, each of 56 bits
The encryption algorithm is: ciphertext =
E
K3
(D
K2
(E
K1
(plaintext))) i.e., DES encrypt with K
1
, DES decrypt with K
2
, then
DES encrypt with K
3
.
Decryption is the reverse: plaintext =
D
K1
(E
K2
(D
K3
(ciphertext))) i.e., decrypt with K
3
, encrypt with K
2
, then decrypt with
K
1
.
Each triple encryption encrypts one block of 64 bits of data.
In each case the middle operation is the reverse of the first and last. This improves the strength of the algorithm when using keying option 2, and provides backward compatibility with DES with keying option 3.
The standards define three keying options:
Keying option 1: All three keys are independent.
Keying option 2: K
1
and K
2
are independent, and K
3
= K
1
.
ISSN (Print) : 2319 – 2526, Volume-2, Issue-4, 2013
60
International Journal on Advanced Computer Theory and Engineering (IJACTE)
Keying option 3: All three keys are identical, i.e. K
1
= K
2
= K
3
.
C. Transformation Function
The data to be uploaded by the owner has to be transformed using the transformation function. Our method supports any disk-based hierarchical index, provided that they permit the computation. To construct the structure, we first build a disk-based tree index on the data set P. Then, for each tree node, we encrypt its content, and send the encrypted node with its disk block
ID to the server. At the end, we send the disk block ID of the root node to the server
1) BASE64:
Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. Base64 encoding schemes are commonly used when there is a need to encode binary data that need to be stored and transferred over media that are designed to deal with textual data.
2) Algorithm
Consider the following example to explain this algorithm:
Man is TWFu. Encoded in ASCII, the characters M, a, and n are stored as the bytes 77, 97, and 110, which are the 8-bit binary values 01001101, 01100001, and
01101110. These three values are joined together into a
24-bit string, producing 010011010110000101101110.
Groups of 6 bits are converted into individual numbers from the base64 index table, which are then converted into their corresponding Base64 character values.
D. Hashing Index Search
Hash index is only used to locate data records in the table and not to return data. A covering index is a special case where the index itself contains the required data field(s) and can return the data.
Eg. To find the Name for ID 13, an index on (ID) will be useful, but the record must still be read to get the
Name. However, an index on (ID, Name) contains the required data field and eliminates the need to look up the record.
E. Query Processing
When the consumer wants to access the specific data stored in the server by his owner, he uses the hierarchal index value to get the key and the query for the data. Based on the query access to data at specific region at the server is permitted and visible to user.
Then the encrypted data is decrypted and retransformed and original data is viewed by the Consumer.
V
YSTEM
ESULT
NALYSIS
ND
ERFORMANCE
VALUATION
100%
80%
60%
40%
20%
0%
1st session
2nd session
3rd session
Authenticati on
Transformat ion
Encryption
4rd session
The figure shows the comparison between users provided with authentication, encryption and transformation along with its security level. In data retrieval, the security level of data should be checked with its time. In this instance a year is divided into four sessions. With encryption the data retrieval and security is higher in 1 st session and later on decreases. So, the encrypted data may not be secured after the use of time.
With encryption and transformation the data retrieval and security is higher in 1 st session and in 2 nd session but in 3 rd session it decreases. So that the encrypted and transformed data may not be secured and it varies along with time. The above condition is deviated when we provide authentication to the user every time he logs in to the server. Here the data retrieval and security are higher in all the session and security of the data is maintained throughout the year.
V.
ONCLUSION
Test variations of the grid file. For instance, the idea of using MPTs within separate cells (as is used in the buddy tree) may prove effective in a static grid file where many cells will be on the very outer boundary of numerous range searches.
If the range search intersects the cell, but not the MBR, all atoms in that MBR can be rejected, and this may result in performance improvements
Search technique for sensitive data metric e.g.
Bioinformatics data
Existing solution either offer query efficiency or complete privacy
MPT stores relative distance information at the server side
ISSN (Print) : 2319 – 2526, Volume-2, Issue-4, 2013
61
International Journal on Advanced Computer Theory and Engineering (IJACTE)
guarantees correctness of the final search result with two round communication
FDH method finished in just a single round communication
Metric Preserving Transformation stores relative distance information at the server with respect to a private set of anchor objects
VI. R EFERENCES
[1] M.L. Yiu, I. Assent, C.S. Jensen, and P. Kalnis,
“OutsourcedSimilarity Search on Metric Data
Assets,” DB Technical ReportTR-28, Aalborg
Univ., 2010.
[2] [31] M.L. Yiu, G. Ghinita, C.S. Jensen, and P.
Kalnis, “OutsourcingSearch Services on Private
Spatial Data,” Proc. IEEE 25th Int’l Conf.Data
Eng. (ICDE), pp. 1140-1143, 2009
[3] W.K. Wong, D.W. Cheung, B. Kao, and N.
Mamoulis, “Secure kNN Computation on
Encrypted Databases,” Proc. 35th ACMSIGMOD
Int’l Conf. Management of Data, pp. 139-152,
2009.
[4] L. Sweeney, “k-Anonymity: A Model for
Protecting Privacy,” Int’lJ. Uncertainty,
Fuzziness and Knowledge-Based Systems, vol.
10, no. 5,pp. 557-570, 2002.
[5] E. Damiani, S.D.C. Vimercati, S.Jajodia, S.
Paraboschi, and P.Samarati, “Balancing
Confidentiality and Efficiency in
UntrustedRelational DBMSs,” Proc. 10th ACM
Conf. Computer and Comm.Security (CCS), pp.
93-102, 2003.
[6] M. Dunham, Data Mining: Introductory and
Advanced Topics.Prentice Hall, 2002.
[7] C. Faloutsos and K.-I. Lin, “FastMap: A Fast
Algorithm forIndexing, Data-Mining and
Visualization of Traditional andMultimedia Data
Sets,” Proc. ACM SIGMOD Int’l Conf.
Management of Data, pp. 163-174, 1995.
[8] G. Ghinita, P. Kalnis, A. Khoshgozaran, C.
Shahabi, and K.L. Tan,“Private Queries in
Location Based Services: Anonymizers Are Not
Necessary,” Proc. ACM SIGMOD Int’l Conf.
Management of Data, pp. 121-132, 2008.
[9] A. Gionis, P. Indyk, and R. Motwani, “Similarity
Search in HighDimensions via Hashing,” Proc.
25th Int’l Conf. Very LargeDatabases (VLDB), pp. 518-529, 1999.
[10] H. Hacigu¨mu¨ s, B.R. Iyer, C. Li, and S.
Mehrotra, “Executing SQLover Encrypted Data in the Database-Service-Provider Model,”Proc.
ACM SIGMOD Int’l Conf. Management of Data, pp. 216-227,2002.
[11] H. Hacigu¨mu¨ s, S. Mehrotra, and B.R. Iyer,
“Providing Database asa Service,” Proc. 18th
Int’l Conf. Data Eng. (ICDE), pp. 29-40, 2002.
A.
Hinneburg, C.C. Aggarwal, and D.A. Keim,
“What Is theNearest Neighbor in High
Dimensional Spaces?,” Proc. 26th Int’lConf.
Very Large Data Bases (VLDB), pp. 506-515,
2000.
[12] G.R. Hjaltason and H. Samet, “Index-Driven
Similarity Search inMetric Spaces,” ACM Trans.
Database Systems, vol. 28, no. 4,pp. 517-580,
2003.
[13] H.V. Jagadish, B.C. Ooi, K.-L. Tan, C. Yu, and
R.Z. 0003,“iDistance: An Adaptive Bþ-Tree
Based Indexing Method forNearest Neighbor
Search,” ACM Trans. Database Systems, vol.
30,no. 2, pp. 364-397, 2005.
[14] C.T. Jr, A.J.M. Traina, B. Seeger, and C.
Faloutsos, “Slim-Trees:High Performance Metric
Trees Minimizing Overlap between Nodes,”
Proc. Seventh Int’l Conf. Extending Database
TechnologyEDBT), pp. 51-65, 2000
[15] G. Aggarwal, T. Feder, K. Kenthapadi, S.
Khuller, R. Panigrahy, D.
[16] Thomas, and A. Zhu, “Achieving Anonymity via
Clustering,”Proc. 25th ACM SIGMOD-SIGACT-
SIGART Symp.Principles of Database Systems
(PODS), pp. 153-162, 2006.
62
ISSN (Print) : 2319 – 2526, Volume-2, Issue-4, 2013