0% found this document useful (0 votes)
132 views37 pages

Performance Benchmarking and Comparison of Cloud-Based Databases Mongodb (Nosql) Vs Mysql (Relational) Using Ycsb

This document summarizes a technical report that benchmarks the performance of the MongoDB (a NoSQL database) and MySQL (a relational database) databases using the YCSB framework. It describes the experimental setup used to perform the tests and evaluate the results. The report analyzes the performance of each database under different workloads (A, F, and C). The results provide insights into how each database handles CRUD operations and scales to varying loads. In general, the report examines the performance differences between the two database types.

Uploaded by

Daniel Silva
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
132 views37 pages

Performance Benchmarking and Comparison of Cloud-Based Databases Mongodb (Nosql) Vs Mysql (Relational) Using Ycsb

This document summarizes a technical report that benchmarks the performance of the MongoDB (a NoSQL database) and MySQL (a relational database) databases using the YCSB framework. It describes the experimental setup used to perform the tests and evaluate the results. The report analyzes the performance of each database under different workloads (A, F, and C). The results provide insights into how each database handles CRUD operations and scales to varying loads. In general, the report examines the performance differences between the two database types.

Uploaded by

Daniel Silva
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 37

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/344047197

Performance Benchmarking and Comparison of Cloud-Based Databases


MongoDB (NoSQL) Vs MySQL (Relational) using YCSB

Technical Report · September 2020


DOI: 10.13140/RG.2.2.10789.32484

CITATION READS

1 2,596

1 author:

Rachit Pandey
National College of Ireland
3 PUBLICATIONS   1 CITATION   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Benchmarking Databases using YCSB View project

Deep learning models to detect CoVID cases using Chest X-ray Scans View project

All content following this page was uploaded by Rachit Pandey on 02 September 2020.

The user has requested enhancement of the downloaded file.


Performance Benchmarking and Comparison of
Cloud-Based Databases MongoDB (NoSQL) Vs
MySQL (Relational) using YCSB
Rachit Pandey
19175892

Abstract
Databases are backbone of any Business application and it is of the utmost
importance that the database serving the application stands out with respect to
performance, availability, scalability, data integrity and security. Recently we have
seen a sea of new cloud data serving databases which cater to cloud OLTP (online
transaction processing) applications though they do not support ACID ((Atomicity,
Consistency, Isolation, Durability) ) transactions to a very great extent. Examples
of such systems are MongoDB, HBase, Cassandra etc. They are also called as
NoSQL (schema-less) systems. On the other hand we have traditional RDBMS
systems which support ACID transactions and are widely used for a host of ap-
plication types. It is becoming extremely important to measure the performance
of databases with respect to certain parameters and decide which DBMS system
(NoSQL or RDBMS) is best suited for the business needs.
In this report we will try to replicate low and high volume application operations
into MongoDB and MySQL databases using Yahoo! Cloud Serving Benchmark
(YCSB) tool and analyze the performance differences between both the systems
using the quantitative output generated by YCSB. The report describes the exper-
imental setup to perform the test and evaluation of the results.
Keywords: NoSQL, YCSB, MySQL, RDBMS, MongoDB

Contents
1 Introduction 2
1.1 YCSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Key Characteristics of MongoDB and MySQL 4


2.1 Relational-MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 NoSQL-MongDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Database Architectures 6
3.1 Relational-MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 NoSQL MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1
4 Comparison of capabilities of the Database Management Systems 9
4.1 Comparison Scalability in MySQL and MongoDB . . . . . . . . . . . . . . . 10

5 Related Work 11

6 Performance Test Plan 14


6.0.1 Environment setup for the experiment . . . . . . . . . . . . . . . . . . 14
6.0.2 Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.0.3 Test Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.0.4 DB Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

7 Evaluation and Result 17


7.0.1 Workload A Execution and Interpretation . . . . . . . . . . . . . . . . 17
7.0.2 Workload F Execution and Interpretation . . . . . . . . . . . . . . . . 25
7.0.3 Workload C Execution and Interpretation . . . . . . . . . . . . . . . . 33

8 Conclusions 35

References 36

1 Introduction
This is the age of big-data revolution and with this revolution the demand to efficiently
manage huge amount of data is growing. Traditionally the database management systems
were governed and driven by the principles of relational database management systems
(RDBMS) based on the structured query language (SQL). But recently we have been
witnessing a new group data management solutions known as NoSQL (Not-only SQL)
have emerged as a strong contender in the DBMS arena. NoSQL is gaining fast popularity
because of its improved scalability and flexibility over RDMS. Top global organizations
like Google, Facebook which deal with huge amounts of online data daily are adopting
to NoSQL-style database.

SQL systems have their own unique characteristics and their ability to adhere to
ACID compliance along with greater structure, powerful interface and complex operation
support makes them still the most popular choice for mid and large organizations who
predominantly deal in structured data. Both SQL and NoSQL databases have their own
set of advantages and usage depending on where they are going to be used. RDBMS
systems are widely adapted by applications and they can handle limited amount of data
with good performance, but to handle large volumes of data including internet data and
multimedia the traditional DB falls short. The term NoSQL was created by Carlo Strozzi
[1] in 1998 and refers to nonrelational databases, term which was later reintroduced in
2009 by Eric Evans. The primary benefit of NoSQL DB is that unlike a RDBMS it can
handle unstructured data easily as they use identification keys to locate data.
Strategies to store data in a NoSQL Database

• Key-Value - Key-Value databases

• Document.

2
• Column.

• Graph-Oriented

NoSQL DB provides flexibility to add or remove attributes from the DB.In this paper we
focus bench-marking the performance of a RDBMS (mySQL) and a NoSQL DB (Mon-
goDB). The bench-marking of the databases will be performed using 1.1 framework which
is one of the best and most popular methods of bench-marking and testing performances
of database systems.

1.1 YCSB
In their paper Cooper et al.[2] describe the importance and challenges to understand the
performance of modern day database systems. In this paper they propose the objective
to create a standard bench-marking framework which can assist users to evaluate per-
formance of databases. They propose the creation of a framework called YCSB (Yahoo!
Cloud Serving Benchmark ) which allows users to select and run different kinds of work-
loads against the database systems and measure key parameters like throughput and
latency and understand the behaviour of the system. The workloads can be standard
(A-F) or custom designed to suit ones need and the output can be obtained with running
multiple iterations and a variety of test loads ranging from short to really huge sets of
operations. The below figure 1 shows a typical architecture of YCSB framework.

Figure 1: MySQL Architecture Overview

The main feature of the YCSB framework is its extensibility and the ability to create
custom workloads for testing. YCSB is a java based program and the program generates
test loads for the database. The workload executor drives multiple client threads and
each thread executes as sequential series of calls to the DB interface layer both to load
the database (the load phase) and to execute the workload (the transaction phase).

3
Workloads can be executed by providing details of the workload files such as type of
workload and operations count and DB properties such as threads, scan ratio etc in the
YCSB command.
This paper is divided into two main parts, one part will cover the setup 6 of the per-
formance evaluation exercise for comparing MongoDB and MySQL databases using the
YCSB framework and the second section 7 will analyse the results obtained from the
performance bench-marking. The analysis of results will provide us some good insights
about the behaviour of the two databases performing the CURD (Create, Update, Read,
Delete) operations. Other sections such as related work and conclusion are also provided
as part of this paper.

2 Key Characteristics of MongoDB and MySQL

2.1 Relational-MySQL
MySQL 1 is a leading open source Relational database management systems in the world.
It is suited for applications that demand high performance and solubility and reliability.is
one of the world’s most popular open-source RDBMS. MySQL design works best on the
data whose fields are structured and finite, MySQL is able to search and organize through
it in multiple dimensions. But this strategy cannot be used on non structured data. Below
are some key features of MySQL RDBMS 2 .

1. ACID model -Robust Transaction Support : MySQL claims to be the most


robust transactional DB platform, it claims to be have row level lock feature, server
enforced referential trustworthiness, multi version transaction which helps to to achieve
full ACID (atomic, consistent, isolated, durable transaction) capabilities.
2. Highly Secure : MySQL has secure security framework due to which it reinforces
both client and server verification. It offers support for secure SSH/SSL connections with
object-level privilege framework to accomplish high data security.
3. Replication (Built-in) : MySQL is one of few databases who guarantees nonstop
uptime as a profoundly available solution. It can be implemented by configuring cluster
specialized servers and master-slave replication architecture. There are also additionally
third-party vendors in the market giving out of box availability solution for MySQL.
4. Fully relational : MYSQL is fully relational database management system. It
groups and isolates data in logical structures like table, column, views etc. It allows
linking of logical units of data while the physical data is stored on disks.
5. SQL Compatibility : MySQL supports ANSI standard SQL and a large de-
velopment community uses the same. MySQL also provides a range of connectors and
drivers like ODBC and JDBC to support external application integration and connectiv-
ity seamlessly.
6. Multi-Threaded : MySQL server operates on the principle of multi-threaded
task management which allows the processes to be distributed with multiple threads on
1 MySQL:http://www.mysql.com
2 MySQL Documentation:http://www.dev.mysql.com

4
multiple.
7. Platform Independent : MySQL is platform independent and can be used on
a variety of popular operating systems without compatibility issues.
8. Geographic Information System (GIS) : MySQL supports GIS functions so
that it can process GIS object data. These objects can have spatial attributes such as
geo co-ordinates.

2.2 NoSQL-MongDB
MongoDB 3 is a document style NoSQL DB developed in C++. It was designed for
developing high performing scalable systems wire binaries and audio/video files.It has a
flexible schema and it allows objects to not have fixed schema or type.Queries can be
performed over collections or using map-reduce.

1. High Performance : MongoDB uses Binary JSON (BSON) for storing data into
documents, it improves overall performance with sharding, load balancing and replica
sets.
2. Load Balancing : MongoDB scalable architecture uses load balancing to dy-
namically balance the query operations and manages balanced documents spread over
multiple nodes for read and write.
3. Aggregation Framework (MapReduce): MongoDB provides map-reduce op-
eration support for aggregation for data summarization.
4. Horizontal Scalability (Sharding) : MongoDB implements horizontal scaling
through sharding, it disseminates data over multiple machines which helps in generating
high throughput over high volume of data. Using sharding we can add additional instances
to expand the capacity of the database.
5. Schema-less : MongoDB is a database without schema where we can store any
kind of data without any structured schema. This makes migration of data simple and
efficient.
6. Multiple Storage Engines : MongoDB provides the option of using multiple
storage engines as per the application requirements. This helps in developing highly
robust databases.
7. Capped Collections : MongoDB supports creating broadly utilized capped
collections where the size of the collections can be confined by the users.
8. Documents Indexing: MongoDB provides various types of indices it supports
single, compound, multikey, geospatial and hashed indexes on data. Documents can be
discovered using indexes without executing the entire collection scan.
9. Master Slave Replication :MongoDB makes various copies of data over multiple
locations, this protects the data from losses. Concept replica set in MongoDB gives
consistency for read operations from primary nodes because at-least one redundant copy
shall be available in secondary nodes.
3 MongoDB :http://www.mongodb.org

5
3 Database Architectures

3.1 Relational-MySQL
MySQL uses a networked client server architecture, it is the most adaptable and flexible
RDBMS. A typical MySQL architechure has below components.4

MySQL Server
• This is also called mysqld and manages access to the actual DB hosted on the disk.

• Multithreaded

• Supports concurrent connections

• Supports multiple storage engines – both transactional or non-transactional

• Single host can run multiple mysqld instances.

Client programs
• Connection utility to the MySQL Server.

MySQL is a mesh of task related functions that work to complete the job of a database
server. The below figure describes the overview of the system and their interactions. Each
subsystem is independent from the other.MySQL’s architecture is a web of task-related
functions that work together to completed the responsibility of database server. The
figure 3 below is an overview of the subsystems that interact with each other through
a well-defined function interface. Each subsystem has its own responsibilities and inde-
pendent from each other

Figure 2: MySQL Architecture Overview

4 MySQL:http://www.mysql.com

6
The main components of the MySQL server are:

• Parser: Parses SQL query and generates id for each query.

• Optimizer:Creates efficient execution plan for the query.

• Executor: Executes the parsed query

Query Cache: It is in memory store where the MySQL engine will look for the
query results first, if found the results will be returned from here. .
Storage Engine: This component manages the physical data files and the locations
they are stored. This engine is responsible to fetch the data from data files. The most
widely used and oracle recommended default storage engine is InnoDB for most MySQL
requirements.
The Base Function Library :A common set of functions shared across all MySQL
subsystems.
Process, Thread and resource Management: Facilitating thread based client
server architecture.
Cache and Buffer management : Helps in caching and data retrieval of different
forms of data used by threads executing server process. Caching and buffering reduces
I/O time.
Memory Allocation: .

• Per session : Dynamic allocation and de-allocation of memory for a specific session.

• Per instance: Once allocated per server instance, shared by all server processes and
all threads.

MySQL Physical Architecture


Below block diagram depicts the MySQL physical architecture.

Figure 3: MySQL Physical Architecture

7
3.2 NoSQL MongoDB
MongoDB is a general purpose open source document database. It is fully featured
with capabilities like Map Reduce, Geo-spatial queries, aggregation, text search and
rich queries. It is a horizontally scalable database where capacity can be added as per
need and it automatically balances the load.Below figure 4 depicts the MySQL physical
architecture.

Figure 4: MongoDB Architecture

The main components of MongoDB physical architecture are described briefly below.

Storage Engines:MongoDB has various storage engines and the best one is util-
ized.The four predominant storage engines can work in a single replica set of MongoDB.
The default storage engine is WiredTiger which is an all round performer with local com-
pression support. Encryption protects sensitive data without impacting the performance
and the in-memory storage engine (ISE) provides wide support for real-time analytic
applications.
Sharding :MongoDB implements horizontal scaling to address hardware limitations
for databases by utilizing commodity hardware to distribute data among the hardware.
This technique is called sharding and the hardware nodes are called shards. MongoDB
automatically manages sharding.
Replica Sets:MongoDB implements native application support and performs data
replication into multiple hardware units. These copies are called replica sets. All read
write operations are controlled via the primary replica set and rest of the copies are called
secondary sets. If the primary replica set is down one of the secondary set is selected as
a replica set. These sets are shards which self heal and hence DB downtime is reduced if
failures occur.
Query Model : MongoDB utilizes caching, indexing and query plan to optimize
query performance.MQM has a router which routes the query to the engine regardless of

8
the number of shards. Query optimization is done at run time by utilizing more that one
indexes.

4 Comparison of capabilities of the Database Man-


agement Systems

Comparison Transaction Management features in MySQL and


MongoDB
In this section we will draw a comparison of Transaction management features available
in MongoDB and MySQL databases 5
MongoDB provides great capabilities for transaction management. It follows BASE
(Basically Available, Soft-State and Eventual Consistency) characteristics there is lack
of atomic transactions but in higher versions it offers ACID capabilities which need some
changes in the configurations to be implemented. Below are a few properties of MongDB
transaction management capabilities

• Flexible and rich data modeling with schema validation.


• Joins, powerful search and graph techniques with aggregation and on Demand
materialized views.
• Native horizontal scale-out with sophisticated data routing controls.
• Multi-node durability and tunable semantics.
• Analytics and BI-ready Encryption at all levels and tight security integration.
• Mature management tools for automating the operations at any place and from
any place
• Database as a service in every major public cloud
• ACID transactions in multi document mode

In MongoDB a transaction on a single document is atomic, this single document atom-


icity is required because of the capability of MongDB to use embedded documents and
arrays to capture relationships between the data and single document structure instead
of normalizing it across multiple documents. MongoDB also supports multi-document
transactions. Using distributed transactions we can use them across multiple operations,
collections, databases, shards and documents.

• MongoDB supports multi-document transactions on replica sets from version 4.0


and above
• In version 4.2, MongoDB introduced distributed transactions, it included multi-
documented transactions on sharded clusters
5 MongoDB : https://info-mongodb-com.s3.us-east-1.amazonaws.com/MongoDB_
Architecture_Guide.pdf

9
• In MongoDB either all or nothing is committed hence making the data integrity
more tight, and the data changes are not visible outside the transaction until commit
is completed.

• In case a transaction is writing to multiple shards not all outside read operations
need to wait for committed transaction to be visible.

• In case of abort all changes made in the transaction are discarded without being
visible

MySQL is open source relational database management system which strictly follows
ACID property of transaction (Atomic, Consistent, Isolated and Durable) model. Which
is highly suitable for application highly relays on the transitions completeness. services.6

MySQL transaction management system allows users to execute data manipulation (DM)
operations and ensures that the database does not contain the results of a partial opera-
tion. Thus if in a set of operations if one fails the rest of the operations are rolled back to
restore the earlier state of the database. This way MySQL supports ACID transactions.

4.1 Comparison Scalability in MySQL and MongoDB


MongoDB uses sharding technique to scale.
In MongoDB,a shard key is used to split data into shards, data is defined in chucks of
64 MB and it is the smallest logical block of data to be sharded. One shard can contain
multiple chunks and a shard stores a replica set of the data.MongoDB controls which
replica set is stored on a particular shard. It is unknown and random to the users and
application which chunk of data is stored in which shard. In this way MongoDB is able
to scale horizontally and also secure the data.
If you look directly into an individual shard to see what it contains, you will see a
random subset of the data. Which data replica is stored in which shard, or partition, is
somewhat random to the user, and unknown to the application. That’s why you (the
client) always need to connect to the mongos and let MongoDB find the data you need.
MySQL also uses partitioning and sharding to scale writes but it is not very easy to
implement.
Sharding is automatically implemented by MySQL cluster and tables are partitioned
across nodes so that the DB can scale horizontally. The data can be access via SQL and
NoSQL APIs.Sharding is transparent and the queries can be executed from any node.
MySQL has active/active, multi-master architecture,hence updates can be handled by
any node, and are instantly available to all of the other clients using cluster.
Auto-Sharding : JOIN operations and ACID transactions are still permissible in
sharding environment. The MySQL cluster also performs replication of data across data
centers for global scalability. Each cluster can be active and accept updates while main-
taining consistency.
6 MySQL: https://dev.mysql.com/doc/refman/8.0/en/mysql-enterprise-security.html

10
After reviewing the scalability features of MongoDB and MySQL it is found that Mon-
goDB sharding works better and hence the scalability of MongoDB is more than MySQL
and is also easy to implement as most of the things are taken care by MongoDB itself.
While MySQL maintains the integrity of the data even during scalability and sharding
makes it one of the popular choices for high performing RDBMS.

5 Related Work
In this section we will discuss related work and research presented in various other papers
on NoSQl and relational databases.

MongoDB Vs MySQL -A Comparative Study


In their conference paper present by Győrödi et al. [3] compare a NoSQL Database
(MongoDB) against a retational database (MySQL) and show how NoSQL databases
are better than traditional relational databases.The authors in the paper devise a simple
method to compare the databases on the four basic DB operations viz. Insert, Read,
Update and Delete. The authors inserted more than 5000 rows and different types of
objects in the datasets and compared the time required. Similarly they performed read,
update and delete on both the databases using a large dataset and recorded the total
time required to perform each operation and compared them both. They concluded that
MongoDB database is faster in all the different operations and hence NoSQL databases
should be the ideal choice for most dynamic applications. In this paper the author
has chooses very basic level of bench-marking which is not capable for investigating the
performances on a deeper level.

Hybrid SQL/NoSQL database components


In the journal paper published by Bjeladinovic et al. [4] the authors propose a hybrid
model for databases. They highlight the need of organizations generating and needing
to store a large amount of structured and non structured data.The authors describe
Integration and Usability as the two main drivers behind the need to select a database.
Below 5 describes the proposed architecture for for NoSQL/SQL integration and uniform
use. The authors created several use cases and executed them over the architecture
proposed in the figure 5,tests were conducted in order to show the feasibility of executing
use cases containing CRUD operations and to display a potential impact on performance.

11
Figure 5: The graphic display of the architecture for integration and uniform use of
components of hybrid SQL/NoSQL database [4]

While defining the architecture of this hybrid model the authors try to answer the fol-
lowing questions

• How to organize and manage data on object types, on database types they belong
to and on specific languages they use?.

• How to manage the constraints in the hybrid SQL/NoSQL database?.

• How to translate SQL statement into a hybrid language so that the target DB can
understand.

• How to join the data from databases of different types within the hybrid SQL/NoSQL
database

The authors propose to develop the below components to answer the above questions

• Key Words Search Component (KWS) – For Metadata extraction from user state-
ment.

• Constraint Controller (CC) – To obtain data on intregity rules

• Statement Mapper (SM) – To map the user statement to the specific database

• Integration Controller (IC) – To forward the statements and obtain results.

12
After executing the use cases with an example business application the authors have
concluded that the hybrid database model shows potential and shows improved per-
formance.There is definitely a potential to explore this model and pursue this for future
research and development although the support for DDL statements to be executed in
both the database environments regardless of the language it is written shall be the first
criteria of the future work.

Performance Evaluation of NoSQL Systems Using YCSB


In the paper Yusuf etal.[5] highlight the need of using NoSQL databases to store large
volume of data. As NoSQL databases claim to provide better performance than most
relational databases, in this paper the authors have tried to evaluate a few NoSQL data-
bases using the Yahoo! Cloud Serving Benchmark (YCSB) framework.The authors have
selected MongoDB and other NoSQL databases like Redis and ran the workloads A, B, C
against each of them with five different throughput of varying size. The workloads were
validated with 110 client threads with 1000, 20000, 40000, 60000, 80000 and 100000 re-
cords and each of these combinations repeated 20 times. It was concluded that Redis has
better performance over MongoDB when READ and INSERT operations are considered.

How to Choose a NoSQL Solution


In their paper Khazaei et al. [6] explore some of the popular NoSQL databases and
do a performance evaluation and describe the existing literature.The authors have in
detailed described the characteristics of a NoSQL solution and how the NoSQL Databases
have loosened up on the CAP theorem and resulted in BASE (Basically Available, Soft-
state, Eventually consistent) systems. The authors compared various bench-marking
tools like YCSB, PigMix, GRIDMix, CALDA etc and in the end YCSB was chosen due
to its flexibility for extension and modification. The benchmark tests were executed
on Openstack Cloud VM and the results were interpreted and plotted graphically. The
results after running multiple workloads can be analysed by anyone who wants to compare
and benchmark the performance of NoSQL databases in order to make an informed
decision.

MongoDB and HBASE Performance analysis


In their paper Matallah et al. [7] compare HBASE and MongoDB performance using
YCSB bench-marking tool. Apart from the 6 standard workloads (A-F) the authors used
2 custom workloads (G,H) to study the performance better.The results of the evaluation
concluded that HBASE performed better than MongoDB for all the workloads (insert,
update, delete, read, scan). HBASE DB uses logging and caching to speed up update
operations, the execution time in reading in case of HBASE is higher as HBASE reads
all available copies of the data and returns the most recent one. Hence MongoDB is
more efficient in read. Updates are slower in MongoDB due to the database locking
mechanisms. This comparative study can be extended further with custom workloads
and multi threaded loads to extend it further and make decisions on selecting a NoSQL
database for business applications.

13
MongoDB and NoSQL comparison for eCommerce data
In this paper Aboutorabi et al.[8] describe that how MySQL is the most widely used
RDBMS while MongoDB is one of the most popular choice for NoSQL solutions.The paper
evaluates the performance of MongoDB and MySQL for a large ecommerce application
with huge amount of data.The results show that MongoDB performs better than MySQL
in all aspects.However in a study done by R Panda et al. [9], opens a discussion that the
performance of the databases depend on the design and hence different types of custom
benchmarks should be executed before arriving to any conclusion.

Summary of all related works


The research papers discussed in this section provide us a good overview of how to
evaluate database performance and how to benchmark them. The papers provide good
amount of information on YCSB framework and how to run tests using the tool.As part
of this project work we will benchmark the performance of MySQL and MongoDB using
workloads running in YCSB.

Table 1: Summary of Related work

Reference Description
Győrödi et al. [3] Comparing MongoDB and MySQL using manual DML operations
Bjeladinovic et al. [4] Proposed a new Hybrid Database model using
RDBMS and NoSQL
Yusuf etal.[5] {Evaluation of multiple NoSQL databases using YCSB
Khazaei et al. [6] Comparison of various bench-marking tools
like YCSB, PigMix, GRIDMix, CALDA
Matallah et al. [7] Comparison of MongoDB and Hbase
using YCSB custom workloads
Aboutorabi et al.[8] Evaluate the performance of MongoDB and MySQL
for a large eCommerce application

6 Performance Test Plan


For evaluating the performance of the two database systems (Relational MySQL and
NoSQL MongoDB) we selected YCSB as our performance testing framework because of
its flexibility to run workloads with various options.We have selected Workload A, B,
C and F for this exercise and ran these workloads for both the databases with record
count and operations count varying from 12500, 50000, 100000, 150000, 200000. The
experiment was performed on Amazon Web Services cloud environment and the details
of the experimental setup is provided below.

6.0.1 Environment setup for the experiment


Local Machine

• Processor: 1.6 GHz Dual-Core Intel Core i5

14
• RAM:8 GB

• CPU and cores: 2 cores 1 CPU

• Operating System: Mac OS Catalina 10.15.6

Cloud Machine

• Operating System: Ubuntu-bionic-18.04

• RAM:8 GB

• Disk: 80 GB

• Cloud Host: AWS

• Instance Name: DSM Project -19175892

• Instance type: t2.large

• Instance Id: i-09cbac5ef39647b312

• Public ip Address: 52.3.252.254

MongoDB configuration

• MongoDB Version: MongoDB shell version v4.4.0

• Database used for testing:ycsb

• Table used for testing: usertable

MySQL configuration

• MySQL Version: mysql Ver 14.14 Distrib 5.7.31, for Linux (x86-64)

• Database used for testing:BenchTest

• Table used for testing: usertable

Benchmarking

• Tool: Yahoo! Cloud Serving Benchmark downloaded from Github

• Version:0.17.0

• Test Setup: Unix shell based bash Scripts provided by NCI for automating the
workload execution

Visualization Tools

• Tool: Tableau Desktop Professional Edition 20.2.4

• Tool: Microsoft Excel for Mac Version 16.37

15
6.0.2 Workloads
YCSB provides the option to run various workloads against database to evaluate the
performance. Different types of standard workloads available are described below and
the workloads which are used for this experiment are highlighted in color and marked.

Workload Operations Record Selection Application example Used in experiment Threads

A—Update heavy Read:50% Zipfan Session store recording re- YES 4,8,16,32
Update:50% cent actions in a user ses-
sion
B—Read heavy Read:95% ZipFan Photo tagging; add a tag is NO
Update:5% an update, but most opera-
tions are to read tags

C—Read Only Read:100% ZipFan User profile cache, where YES 1


profiles are constructed
elsewhere (e.g., Hadoop)
D—Read Latest Read:95% ZipFan User status updates; people NO
Insert:5% want to read the latest
statuses
E—Short Ranges Scan:95% ZipFan Threaded conversations, NO
Insert:5% where each scan is for the
posts in a given thread
(assumed to be clustered
by thread id)

F—Read-modify-writes Read:50% ZipFan user database, where user YES 4,8


Read Write records are read and modi-
Modify:50% fied by the user or to record
user activity.

6.0.3 Test Strategy


Performance bench-marking of NoSQL and MongoDB starts from logging to the AWS
console and creating a VM (t2.large) with the properties as described above in the section
6.0.1. After the VM is created the following software packages are installed using standard
install packages available from Ubuntu release.

• Mongo DB installed from 7

• MySQL Installed from central repository using apt install

• YCSB 0.17.0 installed from Github 8 using CURL

• Java installed from central repository using apt install

• Python installed from from central repository using apt install

6.0.4 DB Setup
After installation the DB BenchTestfor MySQL was created and a table called usert-
able with 10 fields was created in the database. The DB.Properties file was updated
with the correct DB and user id and password to login to the root user in MySQL was
updated in the properties file. Other configuration settings such as Java path and Jar
file copy was done.
7 Install MongoDB :https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/
8 Github MongoDB :https://github.com/brianfrankcooper/YCSB

16
For MongoDB a runtime database was created by YCSB and hence there was no need
to create a database or tables. MongoDB was enabled to autostart after system reboot
using Systemctl enable command so that there is no need to start the database for
each run.
For testing purpose NCI provided automated testing scripts where used as it can
execute multiple workloads on one time. Some minor tweaking was done in the script
file to run the workloads in a multi-threaded mode and also -s parameter was added
to display the detailed status of the run on the screen.Also the script was modified to
generate the output files in csv format instead of text format.

7 Evaluation and Result


7.0.1 Workload A Execution and Interpretation
Workload A which is a update heavy workload with an equal mix of (50/50) reads and
writes was executed for both databases, MongoDB and MySQL. The details are below and
the interpretations of the results follow in form of graphs plotted for various load and run
output parameters. These parameters are vital inputs and allow the user/organization
to take informed decision on the choice of the DB they want to use for their business
applications.

• Workload A executed for MySQL with the operation and record count of
12500,50000,100000,150000,200000

• Workload A executed for MongoDB with the operation and record count of
12500,50000,100000,150000,200000

• The above workloads were executed with the thread options of 4,8,16,32.

Below table will shows the run options used.

Figure 6: Different options in Workload A

Run Time (ms) analysis


In this section we will analyze the throughput variation in both MySQL and MongoDB
for load and run phase with multiple operation counts and threads.

17
Overall Run Time Load/Insert
Below Graph shows the overall run time of workload A during Insert(load) operation for
both MongoDB and MySQL for the mentioned operations count and threads. Workload
A is 50/50 read and write during the load phase the YCSB tool inserts the records as
mentioned in the record count parameter,overall run time is recorded and plotted below.

Figure 7: Overall Run time comparison during Insert

From the above chart we can draw the below conclusions

• Overall Run time increases for both MongoDB and MySQL as the load increases.

• For varying workloads the run time of MongoDB is consistently less than that of
MySQL.

• In multi thread mode the runtime decreases when the thread count is increased.

• Multi threading is not very helpful in a single node clustered environment

Overall Run Read/Update


Below Graph shows the overall run time of the workload A during Read/Update (Run) op-
eration for both MongoDB and MySQL for the mentioned operations count and threads.
As we know the Workload A is 50/50 read and write during the load phase the YCSB tool
reads and updates the records as mentioned in the ops count and record count parameter
and overall run time is recorded and plotted below.

18
Figure 8: Overall Run time comparison during Read/Update

From the above chart we can draw the below conclusions

• Overall Run time increases for both MongoDB and MySQL as the load increases.

• For varying workloads the run time of MongoDB is consistently less than that of
MySQL.

• In multi thread mode the run time for MySQL decreases when the thread count is
increased.

• Multi threading is not very helpful in a single node clustered environment especially
for MongoDB.

Throughput (ops/sec) analysis


In this section we will analyze the throughput variation in both MySQL and MongoDB
for load and run phase with multiple operation counts and threads.

Throughput analysis Load/Insert


Below Graph shows the overall throughput of the workload A during Insert (Load) opera-
tion for both MongoDB and MySQL for the mentioned operations count and threads. The
throughput for each ops count load with options of 4,8,16,32 threads has been executed
and the results recorded and shown in the graph below.

19
Figure 9: Overall Throughput during Load/Insert

From the above chart we can draw the below conclusions

• As the operations count increases the throughput for both MongoDB and MySQL
increases..

• As the thread count increases the throughput for both MongoDB and MySQL
increases.

• MongoDB consistently has better throughput as compared to MySQL with almost


a 50% better throughput.

• Threads have no significant impact on the throughput, in-fact the throughput in


MongoDB is more when the load was run for 4 threads as compared to 32 threads,
In MySQL the throughput increased with the thread count.

Throughput analysis Read/Update


Below Graph shows the overall throughput of the workload A during Read/Update
(Run) operation for both MongoDB and MySQL for the mentioned operations count
and threads. The throughput for each ops count load with options of 4,8,16,32 threads
has been executed and the results recorded and shown in the graph below.

20
Figure 10: Overall Throughput during Read/Update

From the above chart we can draw the below conclusions

• As the operations count increases the throughput for both MongoDB and MySQL
increases..

• As the thread count increases the throughput for both MongoDB and MySQL
increases.

• MongoDB consistently has better throughput as compared to MySQL with almost


a 50% better throughput.

Latency Analysis
In this section we will analyze the Average Latency parameter during Insert, Read and
Update operations for various loads/runs (ops count 12500,50000,100000,150000,200000)
and compare them. We will also look at the Latency vs Throughput performance which
is a key performance indicator for databases.

21
Average Latency during Insert for MongoDB and MySQL

Figure 11: Avg Latency for Inserts

The above chart 11 shows the average insert latency for various operation counts for
both the databases. As we can see from the graph that the Average Latency for Insert
operations increases as we increase the operations count. While average latency of MySQL
is always more than MongoDB which shows that for the same set of loads and same
number of threads MySQL has performed slower in Inserting records in the DB. Infact
MySQL is almost 3 times slower than MongoDB for different ops count with 4, 8 threads
but for 16 and 32 threads the difference is reduced to almost 30% or lower. This shows
that MySQL can perform better in Multi threaded mode with multiple shards but overall
MongoDB is a better performer.

Average Latency during Read/Update for MongoDB and MySQL

The below 2 graphs 12, 13 show the average latency for read and update operations for
multiple values of ops count for MongoDB and MySQL databases. As we can see from
the graph 12 that the read latency difference between MongoDB and MySQL is not very
significant and both perform almost similar for multiple read request with varying number
of threads. Having multiple threaded operation does not impact significantly the read
latency and the variation between MongoDB and MySQL is somewhat constant. For 4
threads the difference between the read latency for MySQL and MongoDB is 20% and
then for threads 4, 8, 16 its 12% approximately.Here also MongoDB is a better performer.
As we can see from the graph 13 that the update latency difference between MongoDB
and MySQL approximately 2.5 times for 4 threads but for threads 8,16,32 the difference is
around 70% which again shows that for update operations also MongoDB has performed
better than MySQL.

22
Figure 12: Average Latency for Read operations

Figure 13: Average Latency for Update operations

23
Average Latency v/s Throughput during Insert, Read, Update for MongoDB
and MySQL

Figure 14: Latency Vs Throughput for Insert operations

The above graph shows the latency plot for insert operation with respect to throughput
for both MongoDB and MySQL. From the graph we can see that the latency decreases as
the throughput increases, both MongoDB and MySQL perform in similar way for insert
operation and the latency tends to follow similar patter.
The below graphs 15a and 15b show the latency v/s throughput chart for read and
update operations. As the Workload A is 50% read and 50% update the below charts
show how MongoDB latency falls sharply when the throughput increases, while MySQL
is consistent in the latency with respect to throughput. This again shows that MongoDB
is better performing DB when loads are heavy and data read and update has to be
performed for huge amount of data.

(a) Latency Vs Throughput for Read (b) Latency Vs Throughput for Update

Figure 15: Latency V/S Throughput

24
Throughput Vs Threads
The below two graphs show throughput during load and run phase and provide a view
with respect the thread count, we can see how the throughput increases when the thread
count is increased

(a) Throughput vs Thread During load (b) Throughput vs Thread During Run

Figure 16: Thread Vs Throughput Analysis

As we can see from the above graph when the thread count increases the throughput
increases. The throughput for MongoDB is higher in both insert and update operations
for multiple record counts.

7.0.2 Workload F Execution and Interpretation


Workload F which is a write and modify heavy workload with an equal mix of (50/50)
reads and writes/updates was executed for both databases, MongoDB and MySQL. The
details are below and the interpretations of the results follow in form of graphs plotted
for various load and run output parameters. These parameters are vital inputs and allow
the user/organization to take informed decision on the choice of the DB they want to use
for their business applications. Usually workload F is used to test databases when a huge
amount of records are written and modified on a regular basis, In such cases Workload
F is run to benchmark databases, he re also we have chosen this workload due to this
reason.

• Workload F executed for MySQL with the operation and record count of
12500,50000,100000,150000,200000.

• Workload F executed for MongoDB with the operation and record count of
12500,50000,100000,150000,200000.

• The above workloads were executed with the thread options of 4 and 8 threads.

Run Time (ms) analysis


The Overall Run Time for for executing Workload F for different operations and
record count was recorded for both load (Insert) and run (Update, read, write) phases.
The data was recorded by running the workloads with 4 and 8 thread options. Below
graphs were plotted from the data and analyzed, the analysis is presented below.

25
Overall Run Time Load/Insert

Figure 17: Overall Run time comparison during Insert

From the above chart we can draw the below conclusions

• Overall Run time increases for both MongoDB and MySQL as the load increases.

• For varying workloads the run time of MongoDB is consistently less than that of
MySQL.

• In multi thread mode the run time decreases when the thread count is increased
for MySQL but not for MongoDB.

• The run time for MongoDB for 4 threads and 8 threads is the same and hence multi
threading does not improve the runtime much

• The run time for MySQL for 4 threads and 8 threads is different and when using
8 threads the run time comes down by 20% for lower ops count and almost to half
for lager ops count

• The run time for MySQL for is almost 4 times more than that of MongoDB for the
same amount of load which is consistently high for all loads. Hence for inserting
data MongoDB seems to be performing much much better.

Overall Run Time Read/Modify


From the above chart we can draw the below conclusions

• Overall Run time increases for both MongoDB and MySQL as the load increases.

26
Figure 18: Overall Run time comparison during Read/Modify

• For varying workloads the run time of MongoDB is consistently less than that of
MySQL.

• In multi thread mode the run time decreases when the thread count is increased
for MySQL but not for MongoDB

• The run time for MongoDB with the same amount of load is almost 20% less
than MySQL and when the load is increased (100000 and above) the run time for
MySQL almost doubles for 4 threads but for 8 threads the difference in run time is
not doubled but remains less

Throughput (ops/sec) analysis


In this section we will analyze the throughput variation in both MySQL and MongoDB
for load and run phase with multiple operation counts and threads.

Throughput analysis Load/Insert


Below Graph show the overall throughput of the workload F during Insert (Load) op-
erations for both MongoDB and MySQL for various ops count with 4 and 8 threading
options.

27
Figure 19: Overall Throughput During Insert/Load

From the above chart we can draw the below conclusions

• As the operations count increases the throughput for both MongoDB and MySQL
increases.

• MongoDB consistently has better throughput as compared to MySQL with almost


4 times better throughput.

• Threads have no significant impact on the throughput. There is a little improvement


between the loads running with 4 and 8 thread options.

Throughput analysis Run/Read-Write-Modify


Below Graph show the overall throughput of the workload F during Read-Write-Modify
(Run) operations for both MongoDB and MySQL for various ops count with 4 and 8
threading options.

28
Figure 20: Overall Throughput During Run phase

From the above chart we can draw the below conclusions

• As the operations count increases the throughput for both MongoDB and MySQL
increases.

• MongoDB consistently has better throughput as compared to MySQL with almost


2 times better throughput for 4 threads and 1.5 times better with 8 thread option.

• In read-write-modify operations increase in threads have improved the read-write-


modify performance of MySQL DB.

Latency Analysis
In this section we will analyze the Average Latency parameter during Insert, Read and
Update operations for various loads/runs (ops count 12500,50000,100000,150000,200000)
for Workload F and compare them. We will also look at the Latency vs Throughput
performance which is a key performance indicator for databases. The loads have been
executed for 4 and 8 threaded option.

29
Average Latency during Insert for MongoDB and MySQL for Workload F

Figure 21: Workload F Avg Latency for Inserts

The above chart 21 shows read avg latency plot for MongoDB and MySQL for threads 4
and 8. We can see that the latency for MySQL is almost 4 times higher than MongoDB
for 4 thread option. When the thread count is changed to 8 the latency for MongoDB in-
creases almost 2 times for the same ops count. For MySQL the latency remains somewhat
on similar trend with thread count 4,8.

Average Latency during read-write-modify and update operations for Mon-


goDB and MySQL

Figure 22: Average Latency for READ-WRITE-MODIFY operations

30
Figure 23: Average Latency for Update operations

The above 2 graphs 22, 23 show the average latency for update and read-write-modify
operations for multiple values of ops count for MongoDB and MySQL databases in a
multi threaded mode (threads 4,8) As we can see from the graph 22 that the read-write-
modify latency for MySQL is almost 2 times that of MongoDB for various ops counts
and with 4 thread option while with 8 thread option the difference is close to 1.5 times.
As we can see from the graph 13 that the update latency difference between MongoDB
and MySQL approximately 3 times for 4 and 8 threads again shows that for update
operations also MongoDB has performed better than MySQL.

Average Latency v/s Throughput for MongoDB and MySQL


Latency v/s Throughput for Read

Figure 24: Latency Vs Throughput for Read operations

31
The above graph shows the latency plot for read operation with respect to throughput
for both MongoDB and MySQL. From the graph we can see that the latency decreases
as the throughput increases, both MongoDB and MySQL perform in similar way for read
operation and the latency tends to follow similar pattern.
The below graphs 25a and 25b show the latency v/s throughput chart for update and
read-write-modify operations. As the Workload A is 50% read and 50% read-write-modify
the below charts show how MongoDB and MySQL latency falls when the throughput
increases. Comparing the latency parameters we can conclude that MongoDB performs
better than MySQL.

(a) Latency Vs Throughput for Update (b) Latency Vs Throughput for Write-Modify

Figure 25: Latency V/S Throughput

Throughput Vs Threads
The below two graphs show throughput during load and run phase and provide a view
with respect the thread count, we can see how the throughput for MySQL increases when
the thread count is increased but for MongDB there is not much impact in Throughput
with increase in thread count.

(a) Throughput vs Thread During load for (b) Throughput vs Thread During Run for
Workload F Workload F

Figure 26: Thread Vs Throughput Analysis

32
7.0.3 Workload C Execution and Interpretation
Workload C is a 100% read workload and it can provide a good idea about the read latency
and operations, in this section the parameters such as runtime, throughput and average
read latency is compared for Mongo DB. The workload C is run in a single thread mode
because it is a 100% read workload and multi-threading in a single node would not have
substantial difference. Workload C is used to test scenarios where database operations
are read heavy.
• Workload C executed for MySQL with the operation and record count of
12500,50000,100000,150000,200000.

• Workload C executed for MongoDB with the operation and record count of
12500,50000,100000,150000,200000.

• The above workloads were executed with single thread option.

Run Time (ms) analysis


The Overall Run Time for for executing Workload C for different operations and
record count was recorded for run phase. Below graph was plotted from the data and
analyzed, the analysis is presented below.

Overall Run Time for Run phase

Figure 27: Run time comparison for Workload C for various ops count

33
From the above chart we can draw the below conclusions
• As the workload increased the runtime for MySQL increases.
• For higher loads 100000 and above MySQL run time is around 25% more than
MongoDB.

Throughput (ops/sec) analysis


In this section we will analyze the throughput variation in both MySQL and MongoDB
for run (read data) phase.

Throughput analysis Run (read) phase


Below Graph show the overall throughput of the workload C during Run (read) operations
for both MongoDB and MySQL for various ops count.

Figure 28: Overall Throughput During Read operations

From the above chart we can draw the below conclusions


• MongoDB consistently has better throughput as compared to MySQL.
• Throughput for MongoDB is around 25% higher than of MySQL for larger work-
loads (above 100000 ops count).

Latency Analysis
In this section we will analyze the Average Latency parameter during Read phase opera-
tions for various loads/runs (ops count 12500,50000,100000,150000,200000) for Workload
C and compare them.

34
Average Latency during Read for MongoDB and MySQL for Workload C

Figure 29: Workload C Avg Latency for Read phase

The above chart 29 shows read avg latency plot for MongoDB and MySQL. We can see
that for small workload the latency for MongoDB is higher than that of MySQL but for
loads more than 100000 operations count the average latency for MongoDB is less than
that of MySQL. This shows that MongoDB is more scalable for large loads.

8 Conclusions
An organization can face a lot of challenges if it wants to switch from traditional RDBMS
to NoSQL database. It is not easy to let go the strict ACID properties of a RDBMS. As
discussed in the report MongoDB does not offer JOIN operations, but there are work-
arounds for this issue. The advantages of MongoDB can be seen from the tests conducted
using the YCSB framework, where for each type of workload (A,C,F) it is shown that
for all the parameters MongoDB has performed better than MySQL. Especially in terms
of Latency and Throughput MongoDB stands out especially for higher number of oper-
ations.
We have also compared the properties of both MongoDB and MySQL databases and
found that MongoDB outperforms MySQL in terms of features such as sharding, security,
performance and availability. Several related works were reviewed and almost all of them
have shown that NoSQL databases are better choice for large applications that run on

35
cloud. Also YCSB happens to be the best tool to run custom and standard workloads
on databases and benchmark the performances, and that comes handy while making
decisions about the choice of a database for business applications.

References
[1] Carlo Strozzi. Nosql-a relational database management system, 2010.

[2] Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell
Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM
symposium on Cloud computing, pages 143–154, 2010.

[3] Cornelia Győrödi, Robert Gyorodi, George Pecherle, and Andrada Olah. A compar-
ative study: Mongodb vs. mysql. 06 2015.

[4] Srdja Bjeladinovic, Zoran Marjanovic, and Sladjan Babarogic. A proposal of archi-
tecture for integration and uniform use of hybrid sql/nosql database components.
Journal of Systems and Software, page 110633, 2020.

[5] Yusuf Abubakar, Thankgod Sani Adeyi, and Ibrahim Gambo Auta. Performance
evaluation of nosql systems using ycsb in a resource austere environment. Performance
Evaluation, 7(8):23–27, 2014.

[6] Hamzeh Khazaei, Marios Fokaefs, Saeed Zareian, Nasim Beigi-Mohammadi, Brian
Ramprasad, Mark Shtern, Purwa Gaikwad, and Marin Litoiu. How do i choose the
right nosql solution? a comprehensive theoretical and experimental survey. Big Data
& Information Analytics, 1(2&3):185, 2016.

[7] Houcine Matallah, Ghalem Belalem, and Karim Bouamrane. Experimental compar-
ative study of nosql databases: Hbase versus mongodb by ycsb. Comput. Syst. Sci.
Eng, 32(4):307–317, 2017.

[8] S. H. Aboutorabia , M. Rezapour, M. Moradi, and N. Ghadiri. Performance evalu-


ation of sql and mongodb databases for big e-commerce data. In 2015 International
Symposium on Computer Science and Software Engineering (CSSE), pages 1–7, 2015.

[9] Reena Panda, Christopher Erb, Michael Lebeane, Jee Ho Ryoo, and Lizy Kurian
John. Performance characterization of modern databases on out-of-order cpus. In
2015 27th International Symposium on Computer Architecture and High Performance
Computing (SBAC-PAD), pages 114–121. IEEE, 2015.

36

View publication stats

You might also like