Coherence, A NoSQL Revolution Data Distribution and Replication in Coherence Renu Kanwar Prakriti Trivedi M-Tech, Department of CS and IT Govt. engineering college, Ajmer Rajasthan, India. renu.khangaroth@gmail.com Asst.Prof, Department of CS and IT Govt. engineering college, Ajmer Rajasthan, India. niyuvidu@rediffmail.com Abstract— NoSQL abbreviated as “not only Structured query waiting time for a client to access any data, to solve this problem database clustering emerged as a solution were data were stored at multiple sites to avoid bottlenecking but end up into another problem of distributed locks i.e. whenever update on one DB site is going on rest are locked for any kind of changes to maintain consistency. Language”. Is a solution to all the present problems of data storage, As we have been using the Traditional Relational Database (RDBMS) for storing our data for the past so many years but as the computer world is growing there is a need for the data storage technologies to grow as well, today the world requires databases which could store and process big data effectively, high performing databases for large reads and writes are required, where large concurrent applications are handled such as search engines. NoSQL databases are solution for all speed and scalability issues of RDBMS as they can easily handle unstructured data, NoSQL do not guarantee the properties of ACID instead they use a weaker BASE property which focuses on the eventual consistency. For the same a very fine product of Oracle is used named as Coherence, Coherence works on the NoSQL principles and by dividing the data to be stored into two domains Facts and dimensions where Facts includes all the larger data sets and Dimensions smaller data fragments thus distributing the Facts to various machines and replicating the Dimensions for each machine making scalability for database a child’s play plus enhancing the speed of the data access tremendously. Figure 1. Data Clustering solving problem of Data bottleneck Keywords— NoSQL, Data bottleneck, Coherence, Fact, Dimensions, Distribution, Replication, Coherence cluster, Cluster Nodes I. Limitations of RDBMS Apart from the problem of data bottlenecking there are other issues available with RDBMS like Speed and Scalability, In RDBMS data is stored inside Disk and when large amount of data is stored in the disk and over a single machine congestion occurs at the data store and the access time increases, Also scaling RDBMS is not easy because we need to buy bigger machines to increase the processing and storage space which would cost a lot. INTRODUCTION Motivation Big data storage is the requirement of the computer world today where Databases containing terabytes of data and handles big data quite easily draw the public attention towards themselves, as the best example ever quoted for Big data storage could be “The Internet” it’s the biggest database in the world, however the technique behind managing such big data repository is commendable most of such big data traffic application had turned towards NoSQL. There are certain demerits of the database which makes the data access a problematic issue like a) Disk base input output which is very slow as compared if the data is being accessed from RAM. b) Join between lots of tables. c) Relational verses Object Mismatch: It is a set of conceptual and technical difficulties often encountered when RDBMS is used with object oriented programming language. Data Bottleneck Since long ago traditional relational databases are used for storing data’s but with RDBMS one major and most common problem encountered is when there are huge data traffics and multiple clients want to access same data concurrently a bottleneck occurs at the storage site which increases the 1 Apart from this a plus point of RDBMS is that as all the data sets are accessed from a single machine there is much reduced network latency but when we are considering terabytes of data and we need super speed and scalability for our data accesses with reduced network latency also RDBMS fails to do it and thus we have to look beyond for a better alternative. II. be stored in the cluster than first 10 numbers would be on one node of cluster 11 to 20 would be on another node and so on while in data cluster numbers 1 to 100 are replicated at all the DB sites which eats up the storage space, thus it can be said that Coherence utilizes the storage space optimally without wastage of memory space. NOSQL REVOLUTION Due to the lack of RDBMS to meet up the challenges of the day by day grooming computer world people started thinking of better alternatives which gave rise to the NoSQL Movement which started in 2009 to solve the problems for which RDBMS was not fit, as RDBMs provide a variety of features and rich semantics of ACID property which are more than necessary for particular applications and Use cases thus for avoiding the Unneeded complexity NoSQL was chosen as best alternative. As RDBMS works on strict restricted semantics which increases the overall instructional overheads on the system while NoSQL eliminates all these unnecessary overheads. As shown in the diagram below that only 7% of the instructions are utilized in useful work rest of the instructions performs unnecessary tasks which are not required of every Use Case. HandCodedoptimization 16% Logging 12% Instructions Locking 16% Figure 3. Coherence Cluster Usefull Work 7% Buffer Manager 35% Latching 14% Figure 2. Database Instructional overheads NoSQL instead of using the strict ACID (atomicity, consistency, isolation and durability) semantics used a weaker BASE (basically available, soft state and eventual consistency) which means a system would be available all the time without any bottleneck which leads to waiting but with soft state where system can switch to next state without completing the previous ones and eventually consistent not bothered about the consistency at each state at the end of the process states consistency should be maintained. Oracle Coherence A very fine product of Oracle which works on the NoSQL principles as shown in figure its architecture consist of a coherence cluster in the middle which act as a cache and persistent data store lie behind this cache when a client request for any particular data it is directed to the node containing that data unlike data cluster in coherence cluster data is not replicated at multiple sites instead data is distributed among the cluster like in layman language if numbers till 100 are to As shown in fig 3 whenever any client accesses any data, data is accessed from the cache and later on saved in the persistent data store. Important Features of Coherence which makes it standalone 1. Speed: The middle layer of the Fig. 3 is the coherence cluster where the data is stored this data repository lies inside the memory i.e. RAM thus when data is to be accessed it is accessed from RAM which is much more faster as compared to disk access, But as RAM is a volatile memory and data loss could be faced when there is power failure thus the data is asynchronously stored in the persistent data store which is slow process and completes the task in idle hours when there is not heavy data traffic. 2. Scalability: In NoSQL horizontal scalability is performed therefore system could be scaled with ease like in Coherence whenever the system requires it can scale up just by adding a new machine as a new node of the coherence cluster and the system would become robust as new nodes would be inserted. Figure 4. Scalability Performance 2 In the Coherence cluster Trade data which are bigger data’s are distributed within the cluster i.e. among various machines performing different trades and very small data values which will use very less space are replicated inside each node so that no cross joins are applied and the values used by any trade data are held with themselves decreasing the response time for any application. While comparing the two we could draw some useful results and conclude that the structure formed by Coherence is flexible and data could be easily accommodated in it without increasing further overheads, when we compare the two in regard of response time which they spend for searching data and viewing data we get the result shown in the following graph representing both the database response time and the time taken by coherence. 3. Fault Tolerance: As the data is stored in RAM therefore it is very important to have an excellent backup and failover system thus best feature of coherence is its backup structure where back of any one node would be saved with another node of the cluster say there are 5 nodes numbered 1 to 5 then backup of 1 would be with 4 , 2 with 5, 3 with 1 and so on…. which makes the system robust and fault tolerant. Thus it can be said that with these features coherence had proved to be a solution for various applications and use cases where RDBMS is not an appropriate choice. III. COMPARISON AND RESULTS When we compare the two i.e. RDBMS and NoSQL (Coherence) we come across many important facts about the two, If we take data sets from an Investment bank were trading is done we face data sets were trades with terabytes of data are carried out and which is not possible with the RDBMS to store such big data sets as bigger machines are required were as coherence performs this task with ease as to scale up the machine it would make use of multiple machines acting as nodes of coherence cluster. Response Time (Sec) Response time 3 2.5 2 1.5 1 0.5 0 RDBMS Coherence 1 2 3 4 5 6 7 8 9 10 Figure 7. Response graph of DB and Coherence The above graph shows the two technologies which we are discussing with respect to the response time both will take, while searching any data from the database as it could be seen that initially when we have handful of data the response time taken by DB is less that compared to coherence because the infrastructure required for Coherence is quite large and that takes most of the time while performing an action but as the data would grow time taken by coherence is comparatively much less that DB because we know that it works as a cache and data access from RAM is much faster as compared to that stored in disk. Figure 5. RDBMS schema for an Investment bank RDBMS schema for any problem would look like this with too many joins and relationships and as data would grow the performance would be hampered, while coherence cluster will look like When we talk about the storing large amount of data’s that means we need to scale our data storage options as data will grow simultaneously and we need to have provisions to store and process the data thus we require bigger machine i.e. scaling our database but scaling with RDBMS is not so easy because it works on single machine and purchasing bigger machine costs a lot of money like a machine with 4 Gb RAM would cost for 50,000 INR and 40 Gb RAM machine would cost in crore, while if we join 10 machines (known as horizontal scaling) each of 4 Gb it would cost around 5 lakhs which is much less therefore scaling up machines horizontally is easy and cost effective. Figure 6. Coherence cluster for an investment bank 3 CONCLUSIONS Cost of Machine RDBMS is limiting for many use cases because today’s requirement is Speed and scalability. Therefore many newer databases are in use where applications are shifting towards InMemory databases which could enhance the data access also people are moving towards such databases which could work according to our Use-Case. Cost in $K 400 300 200 RDBMS 100 Coherence 0 1 2 3 4 5 NoSql is appropriate solution for those use cases where ACID is not considered that important and people are ready to stake ACID for speed and scalability. 6 Thus It can be concluded from the above facts that Database is aging as the time is changing, NoSQL will definitely rule the computer world and Big data storage would be the future, There are various solutions available in the market today working on NoSQL, this term is quite popular among Web 2.0 leaders, Big names like Facebook, Google, Digg, Amazon, LinkedIn, and Twitter all are using NoSQL in one form or other using it for different types of applications and Use Cases. Apart from these many multinational companies are in the race launching NoSQL solutions for their own work Coherence is an example developed by Oracle. But the most important fact to be kept in mind is that both SQL and NoSQL would coexist and each would have its place. Figure 8. Infrastructural cost for DB and Coherence As coherence require bit of infrastructural cost which is higher than that of our traditional database therefore for applications where small sized data is to be processed RDBMS wins because when data is less the infrastructural cost of Coherence is much more but as the size of the data grows and we need to increase the storage area and need to scale up machine Coherence takes up the advantage because of its flexibility to introduce any number of machines thus reducing the overall cost of the system. Also when we compare the two as how much CPU load is on stored data we can get some important results ACKNOWLEGMENT App Performance-Impact on DB I extent thanks to Mrs Prakriti Trivedi for guiding me throughout my research work with her guidance teaching and suggestions motivated me for my work. CPU load on DB C 100 P 80 U 60 % 40 REFRENCES [1] 40% reduce ( [2] ) [3] RDBMS 20 [4] 0 1 2 3 4 5 6 7 8 9 10 Coherence Figure 9.Load over CPU by DB and Coherence [5] From the above graph we can conclude that the load on CPU is reduced by 40% which is commendable while working with Coherence in all we can say that the traditional RDBMS is good and should be used till the size of data is small and handled easily but as the size of the data grows there are other factors which are not considered good in the technical market because now it’s the time for the upcoming technologies to meet the demand of the current users while working with the traditional systems it is not possible. [6] [7] [8] 4 Bogdan George Tudorica and Cristian Bucur,”A comparison between several NoSQL databases with comments and notes”, IEEE Conference on Commerce and Enterprise Computing. April 2011. N.Leavitt, “Will NoSQL Databases, Live up to their promises”? IEEE computer Society, vol.43 (2), 2010, pp.12-14. J.Ernst,”SOL databases v. NoSQL databases”,Comm of ACM, vol.53(4),2010. Jing Han, Junde and Meina Song, “A Novel Solution - Distributed Memory NoSQL Database for Cloud Computing”, 10th IEEE International Conference on Computer and Information Science. June 2011. Michael stonebraker, Ugur Cetintemel, “One Size Fits All”: An Idea Whose Time Has Come and Gone, 10th IEEE/ACIS International Conference on Computer and Information Science, 2011. Edlich, Stefan, “NoSQL, your ultimate guide to the non - relational universe!” http://nosql-database.org. Distributed Caching: why it Matters For predictable Scalability on the web, and where it’s proving its Value, White paper, Info world custom solution. Understanding distributed and in-Memory architectures http:www.benstopford.com/2011-/08/14/distributed-storage-phasechange-memory-and-the-rebirth-of-the-in-memory-databases