Uploaded by Pedro Serpa

Analysis between relational-non relational databases

advertisement
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/316498950
Comparative Analysis of Relational and Non-relational Databases in the
Context of Performance in Web Applications
Conference Paper · April 2017
DOI: 10.1007/978-3-319-58274-0_13
CITATIONS
READS
21
14,301
2 authors, including:
Malgorzata Plechawska-Wojcik
Lublin University of Technology
114 PUBLICATIONS 335 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Brain-computer interface for robot View project
Biometrics measurement of cognitive load during arithmetic task View project
All content following this page was uploaded by Malgorzata Plechawska-Wojcik on 29 October 2017.
The user has requested enhancement of the downloaded file.
Comparative analysis of relational and
non-relational databases in the context of
performance in web applications
Konrad Fraczek and Malgorzata Plechawska-Wojcik
Institute of Computer Science, Lublin University of Technology,
Nadbystrzycka 36B, 20-618 Lublin, Poland
fraczek.konrad1@gmail.com,
m.plechawska@pollub.pl
Abstract. This paper presents comparative analysis of relational and
non-relational databases. For the purposes of this paper simple socialmedia web application was created. The application supports three types
of databases: SQL (it was tested with PostgreSQL), MongoDB and Apache
Cassandra. For each database the applied data model was described. The
aim of the analysis was to compare the performance of these selected
databases in the context of data reading and writing. Performance tests
showed that MongoDB is the fastest when reading data and PostgreSQL
is the fastest for writing. The test application is fully functional, however
implementation occurred to be more challenging for Cassandra.
Keywords: relational databases, NoSQL, MongoDB, Cassandra
1
Introduction
Since 1970, when Edgar Codd published his article [1], relational databases have
dominated the database market. At present, in the most popular database systems rank, seven of the top ten positions are occupied by relational databases
[2]. However, recent years have seen a dynamic growth of the Internet and mobile devices. This causes an enormous increase of the amount of generated data.
Engineers started to look for alternatives to relational databases, which are not
designed to effectively cope with such a large quantities of data. As a result,
NoSQL databases have appeared. They offer better capabilities for performance
scalability and a much more flexible data model than relational databases.
The aim of this study is to compare relational databases with selected nonrelational databases: a document database (MongoDB), and a column-oriented
database (Cassandra). For the purposes of this paper simple social-media web
application was developed. The data models used in the application and performance of each database will be compared. The reason why we selected abovementioned database management systems are as follows. Mongo and Cassanda
are flagship NoSQL products, where MongoDB is the most popular documentoriented database whereas Cassanda - column-oriented database. As typical non
2
K.Fraczek, M.Plechawska-Wojcik
relational databases, Mongo and Cassanda are open-source. That is why among
relational database we have chosen PostgreSQL, which is open-source, also available commercially.
The paper is a continuation of our previous work [18], where we performed
analysis of data models and conducted some performance tests. As the IT market
is developing rapidly, there is a need of verifying available database solutions and
their adaptation to different conditions. Development of non-relational databases
and a lack of extended research about the current state of art motivated us to
continue the topic of NoSQL databases application.
This paper is organised as follows. Section 2 provides a review of related
research. Section 3 contains a description of NoSQL databases. Section 4 introduces the implemented social-media web application. Section 5 presents the
performance tests results. Section 6 is a summary of the paper.
2
Related research
In the literature there is few research about the current state of art in the area
of non-relational databases. In the paper [19] authors analysed the performance
of non-relational databases based applications. A general comparison of relational and non-relational database was also discussed by Jatana and collegues
[20]. Characterictic of NoSQL databases backgroundand data model was also
discussed by Han and collegues [21].
Loureno et al. [3] have reviewed a number of NoSQL databases available on
the market, including MongoDB, Cassandra and HBase. They compared them
in terms of the consistency and durability of the data stored as well as with
respect to thier performance and scalability. They concluded that the MongoDB
database can be the successor of SQL databases, because it provides good stability and consistency of data. Cassandra is the best choice in cases when most
of the operations are writes to the database.
Chandra in his publication [4] reviews the properties of BASE (Basically
Available, Soft state, Eventual consistency) in NoSQL databases and compares
them with the ACID (Atomicity, Consistency, Isolation, and Durability) properties. He also examines which databases are the most suitable for specific applications - in financial applications the relational databases are reported as the best
choice. For the purposes of data analysis and data mining NoSQL technologies
turn out to be better.
Choi et al. [5] compared the performance of Oracle and MongoDB. They
found that the MongoDB database is several times faster than Oracle. The same
database was compared by Boicea et al. [6]. The authors conclude their work
with the claim that MongoDB is faster and easier to maintain. On the other
hand, Oracle is the better choice when there is a need for mapping complex
relationships between data.
Li and Manoharan [7] compared several NoSQL databases (including MongoDB, Cassandra, Hypertable, Couchbase) and SQL Server Express in the context of performance. They observed that NoSQL databases are not always faster
Comparative analysis of relational and non-relational databases
3
than SQL. Lee and Zheng [8] compared the performance of HBase and MySQL.
It turned out that when retrieving the same data, the NoSQL database is faster
than relational ones.
Truica et al. [9] compared the performance of document databases (MongoDB, CouchDB and Couchbase), and relational databases (Microsoft SQL Server,
MySQL and PostgreSQL). CouchDB proved to be the fastest during insertion,
modification and deletion of data, and MongoDB while reading.
3
NoSQL databases
The NoSQL term does not apply to a specific technology. It includes all nonrelational databases. Almost all of them have the following common features:
– lack of support for SQL language, most of NoSQL databases define thier own
query language, some of them have a syntax similar to SQL, for example CQL
for Cassandra,
– lack of relations between data,
– designed for working in clusters,
– no ACID transactions,
– flexible data model.
One of the biggest problem related to storing data on many servers is ensuring data consistency. The CAP (Consistency, Availability, Partition tolerance)
theorem, described by Brewer [10] [11] is related to this issue. It claims that a
distributed database system can maintain only two of three conditions at the
same time: consistency, availability and partition tolerance. Systems operating
on a single machine are examples of CA systems - they are consistent (as there
is no replication) and available. Systems operating on multiple machines are CP
systems (MongoDB, HBase) or AP systems (Cassandra, CouchDB) [3].
Another term connected with NoSQL databases are BASE properties, which
are equivalent to ACID properties known from relational databases [10]:
– Basically Available - if part of the servers fails, the rest of them should
continue to respond to requests,
– Soft State - the state of the database can be changed, even if there are no
writing operations performed at this moment,
– Eventual Consistency - after writing data on a single server, changes must be
propagated to other machines; during this operation data are not consistent.
These days there are four types of NoSQL databases [10]:
– key-value stores - features offered by these databases are limited to the read,
save and delete values for the specified key,
– document databases - they store data in documents with a dynamic structure
such as JSON or XML,
– column-oriented databases - they store data in column families organised
into rows; rows from the same column family can have different columns,
– graph databases - these are based on a mathematical model of the graph,
they store data in graph vertices and relations between data in graph edges.
4
3.1
K.Fraczek, M.Plechawska-Wojcik
MongoDB
MongoDB [23] is an open source document-based database written in C++. It
is the fourth most popular database. At the same time it is the most popular
NoSQL database [2]. MongoDB stores data in BSON documents which are binary JSON documents. A single document is equivalent to a row in relational
database. Documents are grouped into collections of documents. In contrast to
the RDBMS, in MongoDB documents from the same collection may not have the
same structure. MongoDB does not support ACID transactions. It offers atomic
operations on single document only [12]. The maximum size of a single document is 16 MB. Mongo DB supports horizontal scalability through automatic
sharding. Replication is implemented in master-slave mode - data are written to
the master and then propagated to slaves [3]. MongoDB offers a very functional
query language (which is based on JavaScript). It supports aggregate functions
and MapReduce model [6]. MongoDB allows to define indexes to speed up queries
[3].
3.2
Apache Cassandra
Cassandra [24] is an open source column-oriented database written in Java. It
was developed by Facebook [7]. Cassandra stores data as relational databases,
in the form of tables and rows. Each line consists of a primary key and columns.
Rows in one table may have different columns. Each column consists of the name,
value, and recording time values in milliseconds [3]. Just like MongoDB, Cassandra supports mechanisms of replication and partitioning. Unlike MongoDB,
all servers are equal - there is no concept of master and slaves. Each server can
handle write requests and propagate it to others. As data access interface Cassandra uses CQL (Cassandra Query Language) which is similar to SQL, however
it offers much fewer functionalities.
4
Test application
For the purposes of this work a social-media web application was made. The
application at a particular moment can use one of the three databases - PostgreSQL, MongoDB or Cassandra, depending on the configuration. It provides
such functionalities as sending posts, marking posts with hashtags, adding comments to posts, following other users, viewing the timeline which contains posts of
followed users ordered by date in descending order and viewing all posts marked
with a specific hashtag. One of the requirements was also the implementation of
paging while retrieving messages. What is important, the pagination was carried
out directly on the database and not in the application. We managed to achieve
this goal for all selected databases.
The application was written in Java 8 and JavaScript. Following frameworks
and libraries were used:
Comparative analysis of relational and non-relational databases
5
– Spring Boot [15] allows to create Java web application in a very simple way.
The whole application is a single JAR file with embedded Tomcat, it can be
run like standard Java console application.
– AngularJS [16] is a JavaScript framework providing such functionalities like
automatic data-binding between view and model and dependency injection,
– Spring JDBC [17] - makes using JDBC driver easier by automatic opening
and closing connections, result sets and statements, handling SQLException,
handling transactions, iterating through result sets.
No ORM (Object-relational mapping) tool (like Hibernate) was used because it
could affect the performance of the application.
4.1
SQL implementation
Application was tested with PostgreSQL. Spring JDBC library was used for SQL
data access. Fig. 1 contains a data model for the relational database.
Fig. 1. Relational database data model
One of the most complex queries used in application was query which selects
user’s timeline. In case of SQL database it has the following structure:
6
K.Fraczek, M.Plechawska-Wojcik
SELECT user.login login, update.id id, update.date date, update.body body
FROM user_status_updates update
JOIN users user ON user.id = update.userId
JOIN followers f ON f.followedId = user.id
WHERE f.followerId = ?
ORDER BY update.date DESC
LIMIT 20 OFFSET (CURRENT_PAGE - 1) * 20
4.2
MongoDB implementation
For MongoDB data access the official Java driver was used. As for relational
database. Fig. 2 contains data model for MongoDB database. It contains three
document collections (comments documents are nested in status updates documents). Nesting data results in a smaller number of data objects than in the
relational database.
Fig. 2. MongoDB data model
The query which retrives user’s timeline looks as follows:
db.status_updates.
find({"login": {"$in": ["?","?"]}}).
sort({date: -1}).
skip((CURRENT_PAGE - 1) * 20).
limit(20);
Where in place of questions marks we put logins of followed users.
Comparative analysis of relational and non-relational databases
4.3
7
Cassandra implementation
The DataStax driver was used for Cassandra data access. As for SQL and MongoDB databases. Fig. 3 contains the data model schema. Yellow keys stand for
partition keys and red keys for clustering keys. Arrows indicate the direction of
sorting for the column defined during table creation. This data model is based
on the model proposed by Brown [13].
Fig. 3. Cassandra data model
In case of Cassandra, selecting user’s timeline is more complex. For the first
page of data query looks like this:
SELECT statusUpdateLogin, statusUpdateId,
toTimestamp(statusUpdateId) as date, body
FROM user_status_update_timeline WHERE timelineLogin = ?
For every subsequent page we had to add another condition in WHERE
clause:
SELECT statusUpdateLogin, statusUpdateId,
toTimestamp(statusUpdateId) as date, body
FROM user_status_update_timeline
WHERE timelineLogin = ? and statusUpdateId < ?
Where in place of questions marks we put id of last status update from
previous page, for example for second page it would be id of last status update
from the first page (20th status update).
8
K.Fraczek, M.Plechawska-Wojcik
4.4
Comparison of models
Data models for compared databases are entirely different. The SQL data model
was designed to avoid redundancy and use relations between data. Therefore in
queries there are many joins which can be very inefficient for large data sets.
The data model for MongoDB is the simplest one. By using features like
nested documents and arrays, it consists of three collections of documents.
The data model for Cassandra is the most complex one. It was designed
according to the DataStax document [14], where is a one table per query pattern
to avoid reading from multiple partitions. Therefore there is a lot of redundancy
in this data model. For example, one post is stored 1 + number of followers
times - once in the user status updates table and number of followers times
in the user status update timeline table. This allows to get user’s timeline
by querying only one partition. The table storing hashtags is also more complex
than in other databases. It contains three columns - the prefix column contains
the first two characters of a hashtag, the remaining one contains the rest of it
and the hashtag column contains the entire hashtag. Cassandra Query Language
(CQL) does not support the like operator known from SQL and such a structure
allows to perform a full-text search operation in Cassandra.
5
Performance tests
All performance tests were performed on a PC with the specifications involving:
–
–
–
–
Intel Core i5-4460 3,2 GHz processor,
16 GB RAM DDR3,
Western Digital Blue 1TB SATA 3 7200rpm,
Windows 10 Home Edition.
The test uses the following databases:
– PostgreSQL 9.5 for Windows x64,
– MongoDB 3.2 for Windows x64,
– Apache Cassandra 3.7.
For maximum reliability before every test defragmentation was performed.
JMeter was chosen as a tool supporting the tests. For MongoDB and Cassandra
the writing options were set in such a way that a write was successful only after
saving data on the physical disk. To maximise the speed of reading the data in
the databases, indexes were defined on the columns used in the query conditions.
5.1
Simulating users traffic
The first type of tests were those simulating the use of the application by 100
users simultaneously. Each test lasted for 5 minutes. The test plan was as follows:
– login to the application,
Comparative analysis of relational and non-relational databases
9
– view the first four pages of posts sent by current user,
– view the first four pages of posts from current user’s timeline,
– send new post marked with two hashtags.
Each database was tested on four different data sets. Each data set contained a
different number of users and posts: 1000 users and 1 million posts, 5000 users
and 5 million posts, 10 000 users and 10 million posts, 15 000 and 15 million
posts. For each data set, every user followed 100 other users.
Fig. 4 contains information about the number of test cycles executed during
a 5-minute test. For a small data set PostgreSQL is the fastest database, but
its efficiency drops dramatically with an increasing amount of data. For largest
data sets the number of executed test cycles is several times smaller than for
the other databases. Tab. 1 shows that the slowest operation of PostgreSQL is
reading posts from timeline. MongoDB is the most efficient for large data sets. Its
performance slightly decreased only for the largest dataset. Cassandra recorded
a significant drop in performance only for 15 000 users.
Fig. 4. Number of cycles performed during the 5-minute test
Table 1. Average execution time of individual operations
Average execution time [ms]
Number of users [*103]
Log in
Read one page of posts
Read one page of timeline
Send new post
PostgreSQL
1 5 10 15
30 49 36 24
36 66 47 24
43 99 828 1236
122 214 196 264
MongoDB
1 5 10 15
4 3 4 2
3 2 3 2
6 5 6 5
535 552 537 570
Cassandra
1 5 10 15
29 27 26 31
24 23 23 30
24 24 25 37
421 414 424 520
10
5.2
K.Fraczek, M.Plechawska-Wojcik
Data inserting
Another operation examined was inserting data. A single test consisted of inserting 1000 records to a table/collection that stores posts. Fig. 5 contains the
results. It shows that MongoDB is the slowest when adding data. This is the
effect of using the journalled write concern which causes database return success
status only after saving data on the physical disk.
Fig. 5. Results of data inserting
Fig. 6. Results of full-text searching
Comparative analysis of relational and non-relational databases
5.3
11
Full-text search
The last test was searching for hashtags that start with a specified pattern. For
each database three tests were performed, each for different number of hashtags.
Fig. 6 shows how long it takes to perform 1000 full-text searches. It turns out that
the slowest is Cassandra which needs several times more time to perform this
task than the other databases. MongoDB is about twice as fast as PostgreSQL.
6
Summary
The aim of this paper was to compare relational and non-relational databases.
For the purpose of this work a social-media web application was created. The
application was used to examine the performance of the selected databases.
All the databases provide a convenient interface for Java. Implementation of
certain functions, such as pagination and full-text search is more complicated
for Cassandra due to the fact that the query language is not as rich the SQL or
MongoDB data access interface.
For selected data models the results show the performance advantages of nonrelational databases over relational ones. For sufficiently large sets, the number of
operations performed by a relational database is several times smaller. MongoDB
was the fastest database in the context of reading. Only in the case of writing
data was SQL the fastest.
The status of relational databases on the market is not at risk and it is
hard to imagine that this will soon change. NoSQL databases are currently still
a new and little-known solution. However, further development of the Internet
and mobile devices will force software developers into increasing use of NoSQL
databases.
Our future plans cover performance analysis of application NoSQL databases
in BigData. This area grows rapidly and recent research [22] show that this trend
is promising.
References
1. Codd, E., F.: A Relational Model of Data for Large Shared Data Banks. In: Comun.
ACM 13/6 (1970) 377–387
2. NVidia Corporation: DB engines ranking,http://db-engines.com/en/ranking.
3. Lourenco, J., R., Cabral, B., Carreiro, P., Vieira, M., Bernardino, J.: Choosing the
right NoSQL database for the job: a quality attribute evaluation. In: Journal of Big
Data 2 (2015) 1–26
4. Chandra D. G.: BASE analysis of NoSQL database. Future Generation Computer
Systems 52 (2015) 13–21
5. Choi, Y. L., Jeon, W. S., Yoon, S. H.: Improving Database System Performance by
Applying NoSQL. JIPS 10 (2014) 355–364
6. Boicea, A., Radulescu, F., Agapin, L. I.: MongoDB vs Oracle - database comparison.
In: Proceedings of Third International Conference on Emerging Intelligent Data and
Web Technologies (EIDWT) (2012) 330–335
12
K.Fraczek, M.Plechawska-Wojcik
7. Li, Y., Manoharan, S.: A performance comparison of SQL and NoSQL databases.
In: Proceedings of IEEE Pacific Rim Conference on Communications, Computers
and Signal Processing (PACRIM) (2013) 15-19
8. Lee, C. H., Zheng, Y. L.: SQL-to-NoSQL Schema Denormalization and Migration:
A Study on Content Management Systems. In: Proceedings of IEEE International
Conference on Systems, Man, and Cybernetics (SMC) (2015) 2022–2026
9. Truica, C. O., Radulescu, F., Boicea, A., Bucur, I.: Performance evaluation for
CRUD operations in asynchronously replicated document oriented database. In:
Proceedings of 20th International Conference on Control Systems and Computer
Science (2015) 191–196
10. Sullivan, D.: NoSQL for Mere Mortals. Addison-Wesley (2015)
11. Brewer, E.: CAP twelve years later: How the rules have changed. In: Computer
45,2 (2012) 23–29
12. Li, X., Ma Z., Chen, H.: QODM: A query-oriented data modeling approach for
NoSQL databases. In: Advanced Research and Technology in Industry Applications
(WARTIA) (2014) 338–345
13. Brown M.: Learning Apache Cassandra. Packt Publishing (2015)
14. Hobbs T.: Basic Rules of Cassandra Data Modeling, http://www.datastax.com/
dev/blog/basic-rules-of-cassandra-data-modeling.
15. Spring Boot, https://projects.spring.io/spring-boot/
16. AngularJS, https://angularjs.org
17. Spring
JDBC,
https://docs.spring.io/spring/docs/current/
spring-framework-reference/html/jdbc.html
18. Plechawska-Wojcik, M., Rykowski, D.: Comparison of relational, document and
graph databases in the context of the web application development. In: Information
Systems Architecture and Technology: Proceedings of 36th International Conference
on Information Systems Architecture and Technology ISAT (2016) 3–13
19. Vokorokos, L., Uchnar, M., Lescisin, L.: Performance optimization of applications based on non-relational databases. In: International Conference on Emerging
eLearning Technologies and Applications (ICETA) (2016) 371–376
20. Jatana, N., Puri, S., Ahuja, M., Kathuria, I., Gosain, D.: A survey and comparison of relational and non-relational database. International Journal of Engineering
Research & Technology, 1(6) (2012)
21. Han, J., Haihong, E., Le, G., Du, J.: Survey on NoSQL database. In 6th International Conference on Pervasive Computing and Applications (ICPCA) (2011)
363-0366
22. Gupta, S., Narsimha, G.: Efficient Query Analysis and Performance Evaluation of
the Nosql Data Store for BigData. In: Proceedings of the First International Conference on Computational Intelligence and Informatics. Springer Singapore (2017)
549–558
23. Chodorow, K., Dirolf, M.: MongoDB: The Definitive Guide (1st ed.), O’Reilly
Media (2010)
24. Hewitt, E.: Cassandra: The Definitive Guide (1st ed.). O’Reilly Media (2010)
View publication stats
Download