See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/316498950 Comparative Analysis of Relational and Non-relational Databases in the Context of Performance in Web Applications Conference Paper · April 2017 DOI: 10.1007/978-3-319-58274-0_13 CITATIONS READS 21 14,301 2 authors, including: Malgorzata Plechawska-Wojcik Lublin University of Technology 114 PUBLICATIONS 335 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Brain-computer interface for robot View project Biometrics measurement of cognitive load during arithmetic task View project All content following this page was uploaded by Malgorzata Plechawska-Wojcik on 29 October 2017. The user has requested enhancement of the downloaded file. Comparative analysis of relational and non-relational databases in the context of performance in web applications Konrad Fraczek and Malgorzata Plechawska-Wojcik Institute of Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, Poland fraczek.konrad1@gmail.com, m.plechawska@pollub.pl Abstract. This paper presents comparative analysis of relational and non-relational databases. For the purposes of this paper simple socialmedia web application was created. The application supports three types of databases: SQL (it was tested with PostgreSQL), MongoDB and Apache Cassandra. For each database the applied data model was described. The aim of the analysis was to compare the performance of these selected databases in the context of data reading and writing. Performance tests showed that MongoDB is the fastest when reading data and PostgreSQL is the fastest for writing. The test application is fully functional, however implementation occurred to be more challenging for Cassandra. Keywords: relational databases, NoSQL, MongoDB, Cassandra 1 Introduction Since 1970, when Edgar Codd published his article [1], relational databases have dominated the database market. At present, in the most popular database systems rank, seven of the top ten positions are occupied by relational databases [2]. However, recent years have seen a dynamic growth of the Internet and mobile devices. This causes an enormous increase of the amount of generated data. Engineers started to look for alternatives to relational databases, which are not designed to effectively cope with such a large quantities of data. As a result, NoSQL databases have appeared. They offer better capabilities for performance scalability and a much more flexible data model than relational databases. The aim of this study is to compare relational databases with selected nonrelational databases: a document database (MongoDB), and a column-oriented database (Cassandra). For the purposes of this paper simple social-media web application was developed. The data models used in the application and performance of each database will be compared. The reason why we selected abovementioned database management systems are as follows. Mongo and Cassanda are flagship NoSQL products, where MongoDB is the most popular documentoriented database whereas Cassanda - column-oriented database. As typical non 2 K.Fraczek, M.Plechawska-Wojcik relational databases, Mongo and Cassanda are open-source. That is why among relational database we have chosen PostgreSQL, which is open-source, also available commercially. The paper is a continuation of our previous work [18], where we performed analysis of data models and conducted some performance tests. As the IT market is developing rapidly, there is a need of verifying available database solutions and their adaptation to different conditions. Development of non-relational databases and a lack of extended research about the current state of art motivated us to continue the topic of NoSQL databases application. This paper is organised as follows. Section 2 provides a review of related research. Section 3 contains a description of NoSQL databases. Section 4 introduces the implemented social-media web application. Section 5 presents the performance tests results. Section 6 is a summary of the paper. 2 Related research In the literature there is few research about the current state of art in the area of non-relational databases. In the paper [19] authors analysed the performance of non-relational databases based applications. A general comparison of relational and non-relational database was also discussed by Jatana and collegues [20]. Characterictic of NoSQL databases backgroundand data model was also discussed by Han and collegues [21]. Loureno et al. [3] have reviewed a number of NoSQL databases available on the market, including MongoDB, Cassandra and HBase. They compared them in terms of the consistency and durability of the data stored as well as with respect to thier performance and scalability. They concluded that the MongoDB database can be the successor of SQL databases, because it provides good stability and consistency of data. Cassandra is the best choice in cases when most of the operations are writes to the database. Chandra in his publication [4] reviews the properties of BASE (Basically Available, Soft state, Eventual consistency) in NoSQL databases and compares them with the ACID (Atomicity, Consistency, Isolation, and Durability) properties. He also examines which databases are the most suitable for specific applications - in financial applications the relational databases are reported as the best choice. For the purposes of data analysis and data mining NoSQL technologies turn out to be better. Choi et al. [5] compared the performance of Oracle and MongoDB. They found that the MongoDB database is several times faster than Oracle. The same database was compared by Boicea et al. [6]. The authors conclude their work with the claim that MongoDB is faster and easier to maintain. On the other hand, Oracle is the better choice when there is a need for mapping complex relationships between data. Li and Manoharan [7] compared several NoSQL databases (including MongoDB, Cassandra, Hypertable, Couchbase) and SQL Server Express in the context of performance. They observed that NoSQL databases are not always faster Comparative analysis of relational and non-relational databases 3 than SQL. Lee and Zheng [8] compared the performance of HBase and MySQL. It turned out that when retrieving the same data, the NoSQL database is faster than relational ones. Truica et al. [9] compared the performance of document databases (MongoDB, CouchDB and Couchbase), and relational databases (Microsoft SQL Server, MySQL and PostgreSQL). CouchDB proved to be the fastest during insertion, modification and deletion of data, and MongoDB while reading. 3 NoSQL databases The NoSQL term does not apply to a specific technology. It includes all nonrelational databases. Almost all of them have the following common features: – lack of support for SQL language, most of NoSQL databases define thier own query language, some of them have a syntax similar to SQL, for example CQL for Cassandra, – lack of relations between data, – designed for working in clusters, – no ACID transactions, – flexible data model. One of the biggest problem related to storing data on many servers is ensuring data consistency. The CAP (Consistency, Availability, Partition tolerance) theorem, described by Brewer [10] [11] is related to this issue. It claims that a distributed database system can maintain only two of three conditions at the same time: consistency, availability and partition tolerance. Systems operating on a single machine are examples of CA systems - they are consistent (as there is no replication) and available. Systems operating on multiple machines are CP systems (MongoDB, HBase) or AP systems (Cassandra, CouchDB) [3]. Another term connected with NoSQL databases are BASE properties, which are equivalent to ACID properties known from relational databases [10]: – Basically Available - if part of the servers fails, the rest of them should continue to respond to requests, – Soft State - the state of the database can be changed, even if there are no writing operations performed at this moment, – Eventual Consistency - after writing data on a single server, changes must be propagated to other machines; during this operation data are not consistent. These days there are four types of NoSQL databases [10]: – key-value stores - features offered by these databases are limited to the read, save and delete values for the specified key, – document databases - they store data in documents with a dynamic structure such as JSON or XML, – column-oriented databases - they store data in column families organised into rows; rows from the same column family can have different columns, – graph databases - these are based on a mathematical model of the graph, they store data in graph vertices and relations between data in graph edges. 4 3.1 K.Fraczek, M.Plechawska-Wojcik MongoDB MongoDB [23] is an open source document-based database written in C++. It is the fourth most popular database. At the same time it is the most popular NoSQL database [2]. MongoDB stores data in BSON documents which are binary JSON documents. A single document is equivalent to a row in relational database. Documents are grouped into collections of documents. In contrast to the RDBMS, in MongoDB documents from the same collection may not have the same structure. MongoDB does not support ACID transactions. It offers atomic operations on single document only [12]. The maximum size of a single document is 16 MB. Mongo DB supports horizontal scalability through automatic sharding. Replication is implemented in master-slave mode - data are written to the master and then propagated to slaves [3]. MongoDB offers a very functional query language (which is based on JavaScript). It supports aggregate functions and MapReduce model [6]. MongoDB allows to define indexes to speed up queries [3]. 3.2 Apache Cassandra Cassandra [24] is an open source column-oriented database written in Java. It was developed by Facebook [7]. Cassandra stores data as relational databases, in the form of tables and rows. Each line consists of a primary key and columns. Rows in one table may have different columns. Each column consists of the name, value, and recording time values in milliseconds [3]. Just like MongoDB, Cassandra supports mechanisms of replication and partitioning. Unlike MongoDB, all servers are equal - there is no concept of master and slaves. Each server can handle write requests and propagate it to others. As data access interface Cassandra uses CQL (Cassandra Query Language) which is similar to SQL, however it offers much fewer functionalities. 4 Test application For the purposes of this work a social-media web application was made. The application at a particular moment can use one of the three databases - PostgreSQL, MongoDB or Cassandra, depending on the configuration. It provides such functionalities as sending posts, marking posts with hashtags, adding comments to posts, following other users, viewing the timeline which contains posts of followed users ordered by date in descending order and viewing all posts marked with a specific hashtag. One of the requirements was also the implementation of paging while retrieving messages. What is important, the pagination was carried out directly on the database and not in the application. We managed to achieve this goal for all selected databases. The application was written in Java 8 and JavaScript. Following frameworks and libraries were used: Comparative analysis of relational and non-relational databases 5 – Spring Boot [15] allows to create Java web application in a very simple way. The whole application is a single JAR file with embedded Tomcat, it can be run like standard Java console application. – AngularJS [16] is a JavaScript framework providing such functionalities like automatic data-binding between view and model and dependency injection, – Spring JDBC [17] - makes using JDBC driver easier by automatic opening and closing connections, result sets and statements, handling SQLException, handling transactions, iterating through result sets. No ORM (Object-relational mapping) tool (like Hibernate) was used because it could affect the performance of the application. 4.1 SQL implementation Application was tested with PostgreSQL. Spring JDBC library was used for SQL data access. Fig. 1 contains a data model for the relational database. Fig. 1. Relational database data model One of the most complex queries used in application was query which selects user’s timeline. In case of SQL database it has the following structure: 6 K.Fraczek, M.Plechawska-Wojcik SELECT user.login login, update.id id, update.date date, update.body body FROM user_status_updates update JOIN users user ON user.id = update.userId JOIN followers f ON f.followedId = user.id WHERE f.followerId = ? ORDER BY update.date DESC LIMIT 20 OFFSET (CURRENT_PAGE - 1) * 20 4.2 MongoDB implementation For MongoDB data access the official Java driver was used. As for relational database. Fig. 2 contains data model for MongoDB database. It contains three document collections (comments documents are nested in status updates documents). Nesting data results in a smaller number of data objects than in the relational database. Fig. 2. MongoDB data model The query which retrives user’s timeline looks as follows: db.status_updates. find({"login": {"$in": ["?","?"]}}). sort({date: -1}). skip((CURRENT_PAGE - 1) * 20). limit(20); Where in place of questions marks we put logins of followed users. Comparative analysis of relational and non-relational databases 4.3 7 Cassandra implementation The DataStax driver was used for Cassandra data access. As for SQL and MongoDB databases. Fig. 3 contains the data model schema. Yellow keys stand for partition keys and red keys for clustering keys. Arrows indicate the direction of sorting for the column defined during table creation. This data model is based on the model proposed by Brown [13]. Fig. 3. Cassandra data model In case of Cassandra, selecting user’s timeline is more complex. For the first page of data query looks like this: SELECT statusUpdateLogin, statusUpdateId, toTimestamp(statusUpdateId) as date, body FROM user_status_update_timeline WHERE timelineLogin = ? For every subsequent page we had to add another condition in WHERE clause: SELECT statusUpdateLogin, statusUpdateId, toTimestamp(statusUpdateId) as date, body FROM user_status_update_timeline WHERE timelineLogin = ? and statusUpdateId < ? Where in place of questions marks we put id of last status update from previous page, for example for second page it would be id of last status update from the first page (20th status update). 8 K.Fraczek, M.Plechawska-Wojcik 4.4 Comparison of models Data models for compared databases are entirely different. The SQL data model was designed to avoid redundancy and use relations between data. Therefore in queries there are many joins which can be very inefficient for large data sets. The data model for MongoDB is the simplest one. By using features like nested documents and arrays, it consists of three collections of documents. The data model for Cassandra is the most complex one. It was designed according to the DataStax document [14], where is a one table per query pattern to avoid reading from multiple partitions. Therefore there is a lot of redundancy in this data model. For example, one post is stored 1 + number of followers times - once in the user status updates table and number of followers times in the user status update timeline table. This allows to get user’s timeline by querying only one partition. The table storing hashtags is also more complex than in other databases. It contains three columns - the prefix column contains the first two characters of a hashtag, the remaining one contains the rest of it and the hashtag column contains the entire hashtag. Cassandra Query Language (CQL) does not support the like operator known from SQL and such a structure allows to perform a full-text search operation in Cassandra. 5 Performance tests All performance tests were performed on a PC with the specifications involving: – – – – Intel Core i5-4460 3,2 GHz processor, 16 GB RAM DDR3, Western Digital Blue 1TB SATA 3 7200rpm, Windows 10 Home Edition. The test uses the following databases: – PostgreSQL 9.5 for Windows x64, – MongoDB 3.2 for Windows x64, – Apache Cassandra 3.7. For maximum reliability before every test defragmentation was performed. JMeter was chosen as a tool supporting the tests. For MongoDB and Cassandra the writing options were set in such a way that a write was successful only after saving data on the physical disk. To maximise the speed of reading the data in the databases, indexes were defined on the columns used in the query conditions. 5.1 Simulating users traffic The first type of tests were those simulating the use of the application by 100 users simultaneously. Each test lasted for 5 minutes. The test plan was as follows: – login to the application, Comparative analysis of relational and non-relational databases 9 – view the first four pages of posts sent by current user, – view the first four pages of posts from current user’s timeline, – send new post marked with two hashtags. Each database was tested on four different data sets. Each data set contained a different number of users and posts: 1000 users and 1 million posts, 5000 users and 5 million posts, 10 000 users and 10 million posts, 15 000 and 15 million posts. For each data set, every user followed 100 other users. Fig. 4 contains information about the number of test cycles executed during a 5-minute test. For a small data set PostgreSQL is the fastest database, but its efficiency drops dramatically with an increasing amount of data. For largest data sets the number of executed test cycles is several times smaller than for the other databases. Tab. 1 shows that the slowest operation of PostgreSQL is reading posts from timeline. MongoDB is the most efficient for large data sets. Its performance slightly decreased only for the largest dataset. Cassandra recorded a significant drop in performance only for 15 000 users. Fig. 4. Number of cycles performed during the 5-minute test Table 1. Average execution time of individual operations Average execution time [ms] Number of users [*103] Log in Read one page of posts Read one page of timeline Send new post PostgreSQL 1 5 10 15 30 49 36 24 36 66 47 24 43 99 828 1236 122 214 196 264 MongoDB 1 5 10 15 4 3 4 2 3 2 3 2 6 5 6 5 535 552 537 570 Cassandra 1 5 10 15 29 27 26 31 24 23 23 30 24 24 25 37 421 414 424 520 10 5.2 K.Fraczek, M.Plechawska-Wojcik Data inserting Another operation examined was inserting data. A single test consisted of inserting 1000 records to a table/collection that stores posts. Fig. 5 contains the results. It shows that MongoDB is the slowest when adding data. This is the effect of using the journalled write concern which causes database return success status only after saving data on the physical disk. Fig. 5. Results of data inserting Fig. 6. Results of full-text searching Comparative analysis of relational and non-relational databases 5.3 11 Full-text search The last test was searching for hashtags that start with a specified pattern. For each database three tests were performed, each for different number of hashtags. Fig. 6 shows how long it takes to perform 1000 full-text searches. It turns out that the slowest is Cassandra which needs several times more time to perform this task than the other databases. MongoDB is about twice as fast as PostgreSQL. 6 Summary The aim of this paper was to compare relational and non-relational databases. For the purpose of this work a social-media web application was created. The application was used to examine the performance of the selected databases. All the databases provide a convenient interface for Java. Implementation of certain functions, such as pagination and full-text search is more complicated for Cassandra due to the fact that the query language is not as rich the SQL or MongoDB data access interface. For selected data models the results show the performance advantages of nonrelational databases over relational ones. For sufficiently large sets, the number of operations performed by a relational database is several times smaller. MongoDB was the fastest database in the context of reading. Only in the case of writing data was SQL the fastest. The status of relational databases on the market is not at risk and it is hard to imagine that this will soon change. NoSQL databases are currently still a new and little-known solution. However, further development of the Internet and mobile devices will force software developers into increasing use of NoSQL databases. Our future plans cover performance analysis of application NoSQL databases in BigData. This area grows rapidly and recent research [22] show that this trend is promising. References 1. Codd, E., F.: A Relational Model of Data for Large Shared Data Banks. In: Comun. ACM 13/6 (1970) 377–387 2. NVidia Corporation: DB engines ranking,http://db-engines.com/en/ranking. 3. Lourenco, J., R., Cabral, B., Carreiro, P., Vieira, M., Bernardino, J.: Choosing the right NoSQL database for the job: a quality attribute evaluation. In: Journal of Big Data 2 (2015) 1–26 4. Chandra D. G.: BASE analysis of NoSQL database. Future Generation Computer Systems 52 (2015) 13–21 5. Choi, Y. L., Jeon, W. S., Yoon, S. H.: Improving Database System Performance by Applying NoSQL. JIPS 10 (2014) 355–364 6. Boicea, A., Radulescu, F., Agapin, L. I.: MongoDB vs Oracle - database comparison. In: Proceedings of Third International Conference on Emerging Intelligent Data and Web Technologies (EIDWT) (2012) 330–335 12 K.Fraczek, M.Plechawska-Wojcik 7. Li, Y., Manoharan, S.: A performance comparison of SQL and NoSQL databases. In: Proceedings of IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM) (2013) 15-19 8. Lee, C. H., Zheng, Y. L.: SQL-to-NoSQL Schema Denormalization and Migration: A Study on Content Management Systems. In: Proceedings of IEEE International Conference on Systems, Man, and Cybernetics (SMC) (2015) 2022–2026 9. Truica, C. O., Radulescu, F., Boicea, A., Bucur, I.: Performance evaluation for CRUD operations in asynchronously replicated document oriented database. In: Proceedings of 20th International Conference on Control Systems and Computer Science (2015) 191–196 10. Sullivan, D.: NoSQL for Mere Mortals. Addison-Wesley (2015) 11. Brewer, E.: CAP twelve years later: How the rules have changed. In: Computer 45,2 (2012) 23–29 12. Li, X., Ma Z., Chen, H.: QODM: A query-oriented data modeling approach for NoSQL databases. In: Advanced Research and Technology in Industry Applications (WARTIA) (2014) 338–345 13. Brown M.: Learning Apache Cassandra. Packt Publishing (2015) 14. Hobbs T.: Basic Rules of Cassandra Data Modeling, http://www.datastax.com/ dev/blog/basic-rules-of-cassandra-data-modeling. 15. Spring Boot, https://projects.spring.io/spring-boot/ 16. AngularJS, https://angularjs.org 17. Spring JDBC, https://docs.spring.io/spring/docs/current/ spring-framework-reference/html/jdbc.html 18. Plechawska-Wojcik, M., Rykowski, D.: Comparison of relational, document and graph databases in the context of the web application development. In: Information Systems Architecture and Technology: Proceedings of 36th International Conference on Information Systems Architecture and Technology ISAT (2016) 3–13 19. Vokorokos, L., Uchnar, M., Lescisin, L.: Performance optimization of applications based on non-relational databases. In: International Conference on Emerging eLearning Technologies and Applications (ICETA) (2016) 371–376 20. Jatana, N., Puri, S., Ahuja, M., Kathuria, I., Gosain, D.: A survey and comparison of relational and non-relational database. International Journal of Engineering Research & Technology, 1(6) (2012) 21. Han, J., Haihong, E., Le, G., Du, J.: Survey on NoSQL database. In 6th International Conference on Pervasive Computing and Applications (ICPCA) (2011) 363-0366 22. Gupta, S., Narsimha, G.: Efficient Query Analysis and Performance Evaluation of the Nosql Data Store for BigData. In: Proceedings of the First International Conference on Computational Intelligence and Informatics. Springer Singapore (2017) 549–558 23. Chodorow, K., Dirolf, M.: MongoDB: The Definitive Guide (1st ed.), O’Reilly Media (2010) 24. Hewitt, E.: Cassandra: The Definitive Guide (1st ed.). O’Reilly Media (2010) View publication stats