The DevOps Guide To Database Backups For MySQL and MariaDB
The DevOps Guide To Database Backups For MySQL and MariaDB
to Database Backups
for MySQL and MariaDB
�B ba���� v�� . 1
1
2
Table of Contents
1. Introduction 5
3. Backup Tools 11
3.1. mysqldump 11
3.1.1. How does it work? 11
3.1.1.1. Non-transactional Tables 12
3.1.1.2. Transactional Tables 13
3.1.1.3. Flush Binary Logs 13
3.1.2. Advantages 15
3.1.3. Disadvantages 15
3.2. Percona Xtrabackup 16
3.2.1. How it works? 16
3.2.2. Advantages 17
3.2.3. Disadvantages 17
3.3. Binary Log 18
3.3.1. How it works? 18
3.3.2. Advantages 20
3.3.3. Disadvantages 20
3.3.4. Restoring with Binary Logs 21
3.3.4.1. Full Restore 21
3.3.4.2. Partial Restore 22
5. Backup Management 36
5.1. Backup Scheduling 36
5.2. Backup Verification and Integrity 37
5.2.1. mysqlcheck 37
5.2.2. mysqldbcompare 37
5.2.3. pt-table-checksum 39
3
Table of Contents
5.3. Backup Availability 40
5.3.1. Onsite Storage 40
5.3.2. Offsite Storage 40
5.3.3. Hybrid Storage 40
5.4. Backup Housekeeping 41
5.5. Backup Failover 41
7. Conclusion 45
8. About Severalnines 46
4
Introduction
A key operational aspect of database management is to ensure that backups are
performed, so that a database can be restored if disaster strikes. Data loss can happen
in a number of circumstances: system crash, hardware failure, power failure, human
error (accidental DELETE or DROP) or even natural disaster (flood, earthquake, fire).
Some of these are almost impossible to prevent from happening. The DBA or System
Administrator is usually the responsible party to ensure that the data is protected,
consistent and reliable. Backups are an important part of a recovery strategy for your
data.
There are a number of ways to backup your database, but it is important to review the
RTO and RPO before deciding on a backup strategy. RTO (Recovery Time Objective) is
how long you have to recover your data, as it affects the length of your outage. RPO
(Recovery Point Objective) is the allowable data loss - how much data can you afford
to lose? A tighter RTO and RPO means you will need to spend more money on your
infrastructure.
This whitepaper gives an overview of the two most popular backup utilities available for
MySQL and MariaDB, namely mysqldump and Percona XtraBackup. We’ll also see how
database features like binary logging and replication can be leveraged in our backup
strategy. We will discuss some best practices that can be applied to high availability
topologies in order to make our backups reliable, secure and consistent.
5
Impact of Storage Engine on
Backup Procedure
MySQL and MariaDB enables storage engines to be loaded into and unloaded from
a running database server. This modular architecture provides benefits to those who
wish to specifically target a particular application need, such as data warehousing,
transaction processing, or high availability. The storage engine implements a more
specific set of features required for a type of workload, therefore there is less system
overhead with the end result being higher database performance.
Connectors
Native C API, JDBC, ODBC, .NET, PHP, Python, Perl, Ruby, VB
MySQL Server
Connection Pool
Authentication - Thread Reuse - Connection limits - Check Memory - Caches
Since the data is stored inside the storage engine, we need to understand how the
storage engines work to determine the best backup tool. In general, MySQL backup
tools perform a special operation in order to retrieve a consistent data- either lock the
tables or establish a transaction isolation level that guarantees data read is unchanged.
On MySQL and MariaDB, the following storage engines are enabled by default:
• MyISAM - Default storage engine up until MySQL 5.5.5.
• InnoDB - Default storage engine since MySQL 5.5.5. XtraDB for Percona and
MariaDB.
• MEMORY - Hash based, stored in memory, useful for temporary tables.
• BLACKHOLE - anything you write to it disappears.
• CSV - Stores data in text files using comma-separated values format.
6
• ARCHIVE - store large amounts of unindexed data with a very small overhead.
• Aria - Crash-safe tables with MyISAM heritage. Available in MariaDB
distributions.
As you may notice, there are lots of storage engines preloaded into the MySQL/
MariaDB server. We are going to look into the most popular ones: MyISAM/Aria,
InnoDB/XtraDB and MEMORY.
2.1. MyISAM/Aria
MyISAM was the default storage engine for MySQL versions prior to 5.5.5. It is based on
the older ISAM code but has many useful extensions. The major deficiency of MyISAM
is the absence of transactions support.
Aria is another storage engine with MyISAM heritage and is a MyISAM replacement
in all MariaDB distributions. The main difference is that Aria is crash safe, whereas
MyISAM is not. Being crash safe means that an Aria table can recover from unexpected
failures in a much better way than a MyISAM table can. In most circumstances, backup
operations for MyISAM and Aria are almost identical.
MyISAM uses table-level locking. It stores indexes in one file and data in another.
MyISAM tables are generally more compact in size on disk when compared to InnoDB
tables. MyISAM uses key buffers for caching indexes and leaves the data caching
management to the operating system. With the nature of table-level locking and no
transaction support, the recommended way to backup MyISAM tables is to acquire the
global read lock by using FLUSH TABLE WITH READ LOCK (FTWRL) to make MySQL
read-only temporarily or use LOCK TABLE statement explicitly. Without that, MyISAM
backups will be inconsistent.
2.2. InnoDB/XtraDB
InnoDB is the default storage engine for MySQL and MariaDB. It provides the standard
ACID-compliant transaction features, along with foreign key support and row-level
locking.
Percona’s XtraDB is an enhanced version of the InnoDB storage engine for MySQL
and MariaDB. It features some improvements that make it perform better in certain
situations. It is backwards compatible with InnoDB, so it can be used as a drop-in
replacement.
There are a number of key components in InnoDB that directly influences the behaviour
of backup and restore operation:
• Transactions
• Crash recovery
• Multiversion concurrency control (MVCC)
2.2.1. Transactions
InnoDB does transactions. A transaction will never be completed unless each
individual operation within the group is successful (COMMIT). If any operation within
the transaction fails, the entire transaction will fail and any changes will be undone
7
(ROLLBACK).
1 BEGIN;
2 UPDATE account.saving SET balance = (balance - 10) WHERE id
= 2;
3 UPDATE account.current SET balance = (balance + 10) WHERE id
= 2;
4 COMMIT;
A transaction starts with a BEGIN and ends with a COMMIT or ROLLBACK. In the above
example, if the MySQL server crashed after the first UPDATE statement completed, that
update would be rolled back.
2.2.3. MVCC
InnoDB is a multiversion concurrency control (MVCC) storage engine which means
many versions of a single row can exist at the same time. Due to this nature, unlike
MyISAM, InnoDB does not require a global read lock to get a consistent read. It utilizes
its ACID-compliant transaction component called isolation. Isolation is the “i” in the
acronym ACID - the isolation level determines the capabilities of a transaction to read/
write data that is accessed by other transactions.
From highest amount of consistency and protection to the least, the isolation levels
supported by InnoDB are:
• SERIALIZABLE
• REPEATABLE READ (default)
• READ COMMITTED
• READ UNCOMMITTED
Covering all isolation levels in this context is unnecessary. In order to get a consistent
snapshot of InnoDB tables, one could simply start a transaction with REPEATABLE
8
READ isolation level. In REPEATABLE READ, a read view is created at the start of the
transaction, and this read view is held open for the duration of the transaction. For
example, if you execute a SELECT statement at 6 AM, and come back in an open
transaction at 6 PM, when you run the same SELECT, then you will see the exact same
resultset that you saw at 6 AM. This is part of MVCC capability and it is accomplished
using row versioning and UNDO information.
Logical backup like mysqldump is using this approach to generate a consistent backup
for InnoDB without explicit table lock that can cause the MySQL server to be read-only.
Details on this as in the Backup Tools chapter.
2.3. MEMORY
The MEMORY storage engine (formerly known as HEAP) creates special-purpose tables
with contents that are stored in memory. Because the data is vulnerable to crashes,
hardware issues, or power outages, only use these tables as temporary work areas or
read-only caches for data pulled from other tables.
Despite the in-memory processing for MEMORY tables, they are not necessarily faster
than InnoDB tables on a busy server, for general-purpose queries, or under a read/write
workload. In particular, the table locking involved with performing updates can slow
down concurrent usage of MEMORY tables from multiple sessions.
Due to the transient nature of data from MEMORY tables (data is not persisted to disk),
only logical backup is capable of backing up these tables. Backup in physical format is
not possible.
9
Feature MyISAM InnoDB MEMORY
Cluster Indexes No Yes Yes
Data Compression No Yes No
Data caches No Yes N/A
One table stored in
3 separate files: Table stored in
table space -
Storage of table • .FRM for table consisting several In memory
format files (or raw disk
• .MYD for data partitions)
• .MYI for indexes
The above comparison shows the differences in storage engines characteristics. This
helps us understand the way backup procedures should work, and ultimately reduce the
risk of recovery failure when it really matters.
10
Backup Tools
A backup tool is an application that specifically designed to perform backup and restore
of your database. In this whitepaper, we will cover mysqldump, Percona Xtrabackup and
binary log. We are not going to cover other tools such as MySQL Enterprise Backup
(mysqldbackup), mydumper or storage snapshots technologies.
3.1. mysqldump
This standard backup tool comes with every MySQL/MariaDB client package. The
mysqldump client utility performs logical backups. It queries the MySQL/MariaDB server
and produces a set of SQL statements that can be executed to reproduce the original
database object definitions and table data.
For each database schema and table, a dump performs these steps:
1. LOCK TABLE table.
2. SHOW CREATE TABLE table.
3. SELECT * FROM table INTO OUTFILE temporary file.
4. Write the contents of the temporary file to the end of the dump file.
5. UNLOCK TABLES
11
Dump file
mysqldump
(lock-tables) Table Table
By default mysqldump doesn’t include routines and events in its output - you have to
explicitly set --routines and --events flags. One must be aware of the contents of
the database and execute mysqldump with parameters that take this into consideration,
as described in the next section.
Dump file
mysqldump Tables
FTWRL Table
(lock-all-tables)
12
FLUSH TABLES WITH READ LOCK is the only way to guarantee a consistent snapshot
of MyISAM tables while the MySQL server is running. This will make the MySQL server
become read-only until UNLOCK TABLES is executed.
Consider the following mysqldump command for a database that only contains InnoDB
tables:
Dump file
mysqldump InnoDB/
(single- XtraDB transaction InnoDB/
transaction) tables XtraDB
tables
13
of the binary log file is incremented by one relative to the previous file. Consider the
following mysqldump command against InnoDB storage engine to include binary log
coordinates:
coordinate
transaction
mysqldump
(single-transaction, InnoDB/
FTWRL Binary log XtraDB
master-data,
flush-logs) tables
FLUSH LOGS
Binary log
With --master-data flag, mysqldump has to acquire a global lock for a short period
of time and release it back once the binary log coordinates are retrieved. It can then
proceed with retrieving data consistently without the need to lock every table.
14
3.1.2. Advantages
mysqldump is probably the most popular backup method for MySQL. Advantages
include the convenience and flexibility of viewing or even modifying the output using
standard text tools before restoring. You can clone databases for development and DBA
work, or produce slight variations of an existing database for testing. It is also more
practical to do partial restore, where you just want to restore only certain rows or tables.
SQL dump files are also compression-friendly. Depending on the compression level
and tool used, you can achieve up to 15 times compression of the backup size. The
mysqldump command can also generate output in CSV, other delimited text or XML
format.
Indirectly, mysqldump also detects any corrupted data files. For example when a
mysqldump is taken, the data must be in a good state or an error would be generated
during the dump process.
3.1.3. Disadvantages
A mysqldump backup is slower than a physical backup, notably on a large dataset
because the server must access the database and convert the physical data into a
logical format. Mysqldump is a single-threaded tool and this is its most significant
drawback - performance is ok for small databases but it quickly becomes unacceptable
if the data set grows to tens of gigabytes.
Mysqldump will do, for each database and for each table, “SELECT * FROM ..” and
write the content to the mysqldump file. The problem with the “SELECT * FROM ..” is
the impact if you have a data set that does not fit in the InnoDB Buffer Pool. The active
data set (that your application uses) will take a hit when the “SELECT * FROM ..”
will load up data from disk, store the pages in the InnoDB Buffer Pool, and to do so,
expunge pages part of the active data set from the InnoDB Buffer pool. Hence you will
get a performance degradation on that node, since the active data set is no longer in
RAM but on disk.
There is no way to do an incremental backup with SQL dump. A full backup is executed
each time. This can be very time-consuming especially in large databases.
When using mysqldump with a non-transactional storage engine like MyISAM, a dump
holds a global read lock on all tables, blocking writes from other connections for the
duration of the full backup. The locking can be optional though. However without table
lock, there will be no guarantee of backup consistency.
The mysqldump backup does not include any MySQL related logs or configuration files,
or other database-related files that are not part of databases.
15
3.2. Percona Xtrabackup
Percona XtraBackup is the most popular, open-source, MySQL/MariaDB hot backup
software that performs non-blocking backups for InnoDB and XtraDB databases. It falls
into the physical backup category, which consists of exact copies of the MySQL data
directory and files underneath it.
Xtrabackup does not lock your database during the backup process, provided the
tables are running on InnoDB or XtraDB storage engine. For large databases (100+ GB),
it provides much better restoration time as compared to mysqldump. The restoration
process involves preparing MySQL data from the backup files, before replacing or
switching it with the current data directory on the target node.
The xtrabackup and innobackupex tools permit to do operations such as streaming and
incremental backups with various combinations of copying the data files, copying the
log files, and applying the logs to the data.
xtrabackup/ Backup
output Destination
innobackupex
InnoDB copy
(datadir)
InnoDB
FTWRL
Stage II
16
Since the backup content is inconsistent, Percona Xtrabackup requires an additional
process for restoration, called prepare process. During this step, Percona XtraBackup
performs crash recovery against the copied data files, using the transaction log file.
After this is done, the database is ready to restore and use. The final step is to overwrite
(or swap) the content of MySQL datadir on the target server with the directory of the
prepared backup.
The innobackupex program adds more convenience and functionality by also allowing
to back up MyISAM tables and .frm files. It starts the xtrabackup process, waits until
it finishes copying files, and then issues FLUSH TABLES WITH READ LOCK to prevent
further changes to MySQL‘s data and flush all MyISAM tables to disk. During that
time, no query will be executed on the host. innobackupex holds this lock, copies the
MyISAM files, and then releases the lock.
The backed-up MyISAM and InnoDB tables will eventually be consistent with each other,
because after the prepare (recovery) process, InnoDB‘s data is rolled forward to the
point at which the backup completed, not rolled back to the point at which it started.
This point in time matches where the FLUSH TABLES WITH READ LOCK was taken, so
the MyISAM data and the prepared InnoDB data are in sync. In other words, the actual
point-in-time is a moving target until the backup process is complete. For example, if
Percona XtraBackup starts at midnight and lasts till 1:15 AM, then the backup’s actual
point-in-time is 1:15 AM.
3.2.2. Advantages
This method of raw backup is quicker than a logical backup (e.g. mysqldump), because
it does not convert the contents of the database into SQL queries. It simply copies data
files and the output is more compact than a logical backup. The main advantage of
xtrabackup over logical backups is its speed - performance is limited by your disk or
network throughput.
Percona Xtrabackup is very flexible. It supports multiple threads to copy the files
quicker, or use compression to minimize size of the backup. It is possible to create
a backup locally or stream it over the network using SSH tunnel or netcat. It is also
possible to create incremental backups which take significantly less disk space, as well as
less time to execute.
For large-scale full recovery, Percona Xtrabackup is usually faster to restore. The restore
step is basically a simple copy of the prepared binary files.
In addition to database data, the backup can include any related files such as log and
configuration files.
3.2.3. Disadvantages
Percona Xtrabackup needs to access the MySQL data directory locally. If you would
like to perform a remote backup, the xtrabackup process must be run on the MySQL
server locally and stream the backup to a separate host where the backup will be
stored - for example via SSH tunnel or netcat. Performing offline backup is not possible
since Percona Xtrabackup needs to access the MySQL server to check the version and
generate a list of tablespaces.
If your tables are primarily InnoDB tables, then you can perform a virtually non-blocking
backup. However, if you have a mix of InnoDB and MyISAM tables, or primarily MyISAM
tables, xtrabackup will impact the non-transactional tables during the FLUSH TABLE
WITH READ LOCK. Depending on the size of those tables, this may take a while. During
17
that time, no query will be executed on the host and MySQL is considered read only.
When restoring incremental backups, the overall restoration process is slower as deltas
have to be applied one after another and it may take a significant amount of time.
Some of the options also make it possible to restore down to table level, however it
cannot go down to row level. If one of the incremental backups is corrupted, the rest
will not be usable.
Percona Xtrabackup is portable only to other machines that have identical or similar
hardware characteristics. If the backup was taken on Linux machine, there is no
guarantee that it will work in Windows or BSD machine.
The is no option to FLUSH LOGS when taking the backup. Data from MEMORY tables
cannot be backed up in physical format because their contents are not stored on disk.
Binary Log
Events
Binlog Index
You can control the format to use when writing to the binary log using the option
binlog-format with following values:
18
• STATEMENT causes logging to be statement based.
• ROW causes logging to be row based.
• MIXED causes logging to use mixed (STATEMENT or ROW) format.
The FLUSH LOGS command writes all logs to disk and creates a new file to continue
writing the binary log. This can be useful when administering recovery images for point-
in-time-recovery. Reading from an active open binlog file can have unexpected results,
so it is advisable to force an explicit flush before trying to use binlog files for recovery.
Flushing binary logs is usually used together with mysqldump where you can instruct it,
with a flag, to flush the binary logs when performing backup. This will make the backup
consistent and the newly generated binary log will start fresh to record new database
changes after the backup.
MySQL provides a tool called mysqlbinlog to work with binary logs. It can be used to
display the contents in text format or display the contents of relay log files written by a
slave server in a replication setup, since relay logs have the same format as binary logs.
You can also use this tool to backup binary logs located locally or remotely.
To backup binary logs using mysqlbinlog, we first have to retrieve the names of the
binary logs currently available in the MySQL server, by using the following command:
Then, you can use the --read-from-remote-server option to connect and create a
copy of the binary logs in the backup destination:
This tool only allows to backup one binary log at a time, so some iteration might be
required to automate the process. Without the --result-file option, MySQL will
default to write in the current directory using the same name as the original log file.
Take note that you can just copy the binary log directory from the filesystem, but keep
in mind to skip the active binary log file (i.e., the one that is currently open by the
MySQL server). That is why FLUSH LOGS is important before each direct copying.
19
8 | binlog.000005 | 217 |
9 | binlog.000006 | 8025 |
10 +---------------+-----------+
11 5 rows in set (0.00 sec)
Then, flush the log files so MySQL creates a new active binary log and keeps the rest
inactive:
You should see a new file has been created, in this example, binlog.00007:
Finally, copy all the binlogs except the recently flushed, binlog.000007 inside MySQL
data directory to the backup location accordingly:
1 $ cd /var/lib/mysql
2 $ cp binlog.000002 binlog.000003 binlog.000004 binlog.000005
binlog.000006 /storage/backups/binlogs
3.3.2. Advantages
Binary log allows point-in-time recovery to happen. A backup reflects the state of the
database at a certain point in time, but the changes between two backup points are
not recorded. What if the server crashes a minute before the next backup should run?
You can restore from the last backup, but what about the transactions until the point
when the server crashed? By replaying the binary log to a server, repeating changes
that were recorded in the binary log, the MySQL server can be brought back to the
most up-to-date state of database right before the server crashed.
Because the binary log keeps a record of all changes, you can also use it for auditing
purposes to see what happened in the database.
3.3.3. Disadvantages
In real world, though, replaying binlogs is a slow and painful process. Of course, your
mileage may vary - it all depends on the amount of modifications to the database. The
replaying process which involves the mysqlbinlog utility can be complicated.
Running a server with binary logging enabled comes with a performance impact. Binary
20
logs can eat up a significant amount of disk space if you have high database traffic, so
setting up an appropriate expire_log_days value is important. Or you have to purge
binary logs more frequently.
If you are using InnoDB tables and the transaction isolation level is READ COMMITTED
or READ UNCOMMITTED, only row-based logging can be used. It is possible to change
the logging format to STATEMENT, but doing so at runtime leads very rapidly to errors
because InnoDB can no longer perform inserts.
1 $ ls -al /var/lib/mysql
2 -rw-rw---- 1 mysql mysql 126 Oct 22 05:18 bin-
log.000001
3 -rw-rw---- 1 mysql mysql 1197 Oct 22 14:46 bin-
log.000002
4 -rw-rw---- 1 mysql mysql 126 Oct 22 14:46 bin-
log.000003
5 -rw-rw---- 1 mysql mysql 6943553 Oct 23 18:38 bin-
log.000004
2. Take note of the binary log file and position for the restored data when Percona
Xtrabackup was executed:
1 $ cat /var/lib/mysql/xtrabackup_binlog_info
2 binlog.000004 5840269
3. Replay the binary log up from the start position and send the output to the
MySQL Server:
The cluster will start to replay the log and catch up until the determined point.
21
3.3.4.2. Partial Restore
If you want to do partial restore for a truncated table, we can replay the binary logs
until right before a TRUNCATE event happened to a server that caused missing rows.
Then, export the table to an SQL dump file and import it back to a running MySQL
server.
1. Use the mysqlbinlog tool with --base64-output=decode-rows to decode the
binlog and send the output to a file called decoded.txt:
3. Look up the position number before the TRUNCATE event. In this case, the
binlog should be replayed up until position 6494999 because position 6495077
indicates the unwanted TRUNCATE event:
4. By tailing the last 15 lines before the TRUNCATE event, we can conclude that
after restoring the backups, we should replay the binlog from the recorded
binlog file and position of the backup set, up until binlog.000004 on position
6494999.
Replay the binary log up until the determined position and send the output to
the MySQL Server:
22
1 $ mysqlbinlog /var/lib/mysql2/binlog.000004
--start-position=5840269 --stop-position=6494999 |
mysql -uroot -p
5. Export the data for the table so we can load it into the running cluster. We
will export all columns except the primary key column because the AUTO_
INCREMENT values have been repeated since the truncate happened. This will
avoid DUPLICATE ENTRY errors:
Now you can import the data into the running MySQL server. Log into one of
the MySQL server and start the import process:
Binary logs restoration is not straightforward, but it is a safe bet for your data. It
increases the probability of your data being restored to a correct state in almost any
kind of data loss scenario.
23
Performing Backup Efficiently
All of the backup methods have their pros and cons. They also have their requirements
when it comes to how they affect regular workloads. As usual, your backup strategy will
depend on the business requirements, the environment you operate in and resources at
your disposal.
Backup should be planned according to the restoration requirement. Data loss can be
full or partial. For instance, you do not always need to recover the whole data. In some
cases, you might just want to do a partial recovery by restoring missing tables or rows.
In the next sections, we’ll look at the different factors that contribute to efficient backup
and restore procedures.
Then, GRANT the user with specific privileges for backup purposes and FLUSH the user
privileges table:
24
By default, mysqldump loads all options under the [mysqldump] directive while
Percona Xtrabackup reads [xtrabackup] directive inside the MySQL configuration file,
or the user’s option file. Setting this up before performing any backup operations will
reduce the complexity of the backup commands, since we do not have to specify the
loaded options anymore.
Inside the MySQL configuration file (my.cnf), adding the following lines will do the trick:
1 [mysqldump]
2 user=backupuser
3 password=backuppassword
4 host=localhost
5
6 [xtrabackup]
7 user=backupuser
8 password=backuppassword
9 host=localhost
Now you can perform a mysqldump command without the need to specify host and
user credentials:
If you have storage engines that do not support transaction (e.g. MyISAM, Aria,
MEMORY), mysqldump and Percona Xtrabackup will likely have to lock the tables
while the backup is taken. Lock tables will make the MySQL server read-only to ensure
consistency during the backup. This is crucial factor to determine the most efficient way
to perform a logical backup, where extra options are necessary in the backup command
line.
If the storage engine supports transaction (e.g. InnoDB), mysqldump does not
require table locking to some extent. For InnoDB, it is sufficient to use “--single-
transaction” to get a consistent backup. When using this option, you should keep
in mind that only InnoDB tables are dumped in a consistent state. For example, any
MyISAM or MEMORY tables dumped while using this option may still change state:
25
1 $ mysqldump --single-transaction --all-databases > backup.
sql
If you do have a hybrid mix of storage engines, Percona Xtrabackup handles this with
more efficiency. The locking will only happen during the MyISAM phase of the backup.
The bottomline is that one should avoid using MyISAM tables if possible, except for the
mysql system tables.
For Aria storage engine, there is a limitation in Percona Xtrabackup. The issue is that
the engine uses recovery log files and an aria_log_control file that are not backed
up by xtrabackup. Starting MariaDB without the aria_log_control file, MariaDB will
mark all the Aria tables as corrupted with this error when doing a CHECK on the table:
This means that the Aria tables from an xtrabackup backup must be repaired before
being usable (this can take quite long time depending on the size of the table). Another
option is to perform a check on all Aria tables present in the backup after the prepare
phase:
This will give you a ballpark figure. The column index_length is not used in
mysqldump because it does not dump indexes, only data.
Bigger dataset usually means longer backup time. Using a simple rule of thumb, if
you have database with less than 10GB in size and it fits into the InnoDB buffer pool,
using mysqldump with binary logs enabled is a safe bet. The reason is that for most
workloads, you will not notice much performance degradation despite the restoration
time might be vary. For a dataset with hundreds of gigabytes of data, mysqldump is
too slow to be useful and it can literally take days to restore a couple of hundred of
26
gigabytes. Usually when you need to restore from a backup, you are in some sort of
emergency, and the restore process that takes days is not an option.
Ensure you see the last line contains “completed OK!”. It indicates the backup is
prepared and is ready to be restored. It is important to note that the MySQL server
needs to be shut down before restore is performed. You can’t restore to a datadir of a
running mysqld instance (except when importing a partial backup). With a single copy
command, you should then be ready to start the MySQL server with the prepared data
(assuming in my.cnf, you have “datadir=/var/lib/mysql”):
Take note that when restoring Xtrabackup incremental backups, the overall restoration
process is slower as deltas have to be applied one after another (using “--apply-log-
only” option).
With a small dataset, many might choose mysqldump instead because it is more
straightforward to restore. However, with large data sizes, even if the mysqldump
process takes a reasonable time, restoring the data can be very slow. Replaying the SQL
statements involves disk I/O for insertion, index creation, and so on.
The following commands suffice to take a full backup on mysqldump and Percona
Xtrabackup:
If Global Transaction Identifier (GTID) with InnoDB (GTIDs aren’t available with MyISAM)
is enabled, one should use the --set-gtid-purged=OFF option for portability:
27
1 $ mysqldump --single-transaction --set-gtid-purged=OFF
--triggers --events --routines --all-databases > full-back-
up.sql
If you are using binary columns to store blobs, it is recommended to use --hex-
blob, to safeguard against special characters that might be there. Mysqldump will use
hexadecimal notation instead, for example, ‘abc’ becomes 0x616263:
In some occasions, you might need to use the backup for partial recovery, restoring
only a single row, table or database. Having mysqldump is more practical since you
can generate a dump file per database and directly view/modify the content of the
dump file via text editor. It is recommended to backup data and schema separately
and disable “--extended-insert” to get a more organized view of SQL statements in
the dump file. The following commands perform mysqldump against InnoDB storage
engine, and generates separate dump files per database:
Percona Xtrabackup also comes with an option called “--export”, which basically
allows restoring individual tables. However the destination server must be running
either Percona Server with XtraDB or MySQL 5.6 with innodb_file_per_table
enabled. Restoring partial backup with Xtrabackup should be done by importing the
tablespace, not by using the --copy-back option. During the prepare stage, one
would perform the following:
You should see three files being created on the exported backup, as per below:
Then, import the table by discarding the current tablespace, copy them to the target
database directory and import the copied tablespace:
28
Once this is executed, data in the imported table will be available.
For MySQL Replication, the backup should be performed on a slave provided it does
not lag behind during the backup time. If you have binary logging enabled on the
slave (e.g. GTID replication), it is recommended to append ”--master-data“ including
”--apply-slave-statements“ in the mysqldump command options. This is to
simplify the process of staging up the new slave. These two options are helpful in
setting up the slave during the restoration of mysqldump, skipping the part that you
have to explicitly execute to run the slave. If you look at the content of the dump file,
you should see the following lines:
1 STOP SLAVE;
2 SET @@GLOBAL.GTID_PURGED= .. ; -- if GTID is enabled
3 CHANGE MASTER .. ;
4 <dump content>
5 START SLAVE;
If the backup is taken using Percona Xtrabackup, the default options will automatically
include a file under the backup directory called xtrabackup_binlog_info (as well
as xtrabackup_info) which contains the binary log file, position and GTID of the last
change (if enabled). Take note that Percona Xtrabackup requires the same major version
of MySQL servers on the new slave. For example, if the backup was taken on MySQL
5.5, the target server must be running on MySQL 5.5 as well. If you would like to mix the
MySQL versions in a single replication chain, you should use mysqldump instead.
In case of Galera Cluster, the backup might occasionally stall the cluster during the
process. Fortunately, you can perform the backup in desynchronization mode with
“wsrep_desync=ON”. When you allow the node to desync from the cluster momentarily,
the cluster performance won’t get degraded during the duration of desync, which is
suitable for backup workloads. However there is a risk that if the node does not get
back in sync before desync is disabled, it still may cause some performance impact on
the cluster.
The above is also true for Percona Xtrabackup, and you can also use --galera-info
with Percona Xtrabackup. It then creates the xtrabackup_galera_info file which
contains information about the local node state at the time of the backup:
29
1 $ innobackupex --galera-info /storage/backups/galera
Another option is to attach an asynchronous slave to the Galera Cluster for a loosely-
coupled setup, which brings additional benefits as explained in the Dedicated Backup
Server section further below. Enabling binary logging might be unnecessary in Galera
Cluster because you have an exact copy of the data on the other cluster nodes.
However, in case if you would like to have an asynchronous slave attached to one of the
Galera nodes, it’s recommended to to enable only on one designated master.
To make an incremental backup, one must begin with a full backup as shown in the
following example:
When restoring the incremental backups, use the --apply-log-only option during
the prepare stage except for the last one:
There are lots of compression tools available out there, namely gzip, bzip2, zip, rar and
7z. These tools can do both compression and archiving (packing multiple files into one).
Here are some typical ratings in terms of speed, availability and typical compression
ratio:
The above ratings are somewhat subjective, but in general terms, they give a good
indication of what to expect. The two most popular tools for compression are gzip and
bzip2, which are widely available in UNIX environment. As you can see, bzip2 offers
better compression ratio but is slower while gzip is overall faster. If having a smaller
backup size is important in your environment, use bzip2. Otherwise, gzip is a good
choice.
Normally, mysqldump can have very good compression rates as it is a flat text file.
Depending on the compression tool and ratio, a compressed mysqldump can be up to
6 times smaller than the original backup size. To compress the backup, you can pipe the
mysqldump output to a compression tool and redirect it to a destination file:
31
1 $ mysqldump --single-transaction --all-databases | gzip > /
storage/backups/all-databases.sql.gz
2 $ mysqldump --single-transaction --all-databases | bzip2 > /
storage/backups/all-databases.sql.bz2
If you want a smaller dump size, you can also skip several things like comments, lock
tables statement (if InnoDB), skip GTID purged and triggers:
With Percona Xtrabackup, you can use the streaming mode (innobackupex), which
sends the backup to STDOUT in special tar or xbstream format instead of copying files
to the backup directory. Having a compressed backup could save you up to 50% of the
original backup size, depending on the dataset. Append the --compress option in the
backup command as per below:
By using the xbstream in streaming backups, you can additionally speed up the
compression process by using the --compress-threads option. This option specifies
the number of threads created by xtrabackup for parallel data compression. The default
value for this option is 1. To use this feature, simply add the option to a local backup, for
example:
Before applying logs during the preparation stage, compressed files will need to be
decompressed using xbstream:
Then, use qpress to extract each file ending with .qp in their respective directory before
running --apply-log command to prepare the MySQL data.
4.8. Encryption
If your MySQL server or backup destination is located in an exposed infrastructure like
public cloud, hosting provider or connected through an untrusted WAN network, it is
probably a good idea to enforce encryption to enhance the security of backup data. A
simple use case to enforce encryption is where you want to push the backup to an off-
site backup storage located in the public cloud.
When creating an encrypted backup, one thing to have in mind is that it usually takes
more time to recover. The backup has to be decrypted prior to any recovery activities.
With a large dataset, this could introduce some delays to the RTO. On the other hand,
if you are using private key for encryption, make sure to store the key in a safe place.
32
If the private key is missing, the backup will be useless and unrecoverable. If the key is
stolen, all created backups that use the same key would be compromised as they are no
longer secured. You can use the popular GnuPG or OpenSSL to generate the private or
public keys.
To perform mysqldump encryption using GnuPG, generate a private key and follow the
wizard accordingly:
1 $ gpg --gen-key
Encrypt the dump file and remove the older plain backup:
GnuPG will automatically append .gpg extension on the encrypted file. To decrypt,
simply run the gpg command with --decrypt flag:
To create an encrypted mysqldump using OpenSSL, one has to generate a private key
and a public key:
This private key (dump.priv.pem) must be kept in a safe place for future decryption. For
mysqldump, an encrypted backup can be created by piping the content to openssl, for
example:
To decrypt, simply use the private key (dump.priv.pem) alongside the -decrypt flag:
Percona XtraBackup can be used to encrypt or decrypt local or streaming backups with
xbstream option in order to add another layer of protection to the backups. Encryption
is done with the libgcrypt library. Both --encrypt-key option and --encrypt-
key-file option can be used to specify the encryption key. Encryption keys can be
generated with commands like:
33
1 $ openssl rand -base64 24
2 bWuYY6FxIPp3Vg5EDWAxoXlmEFqxUqz1
This value then can be used as the encryption key. Example of the innobackupex
command using the --encrypt-key:
The output of the above OpenSSL command can also be redirected to a file and can be
treated as a key file:
Taking a MySQL backup on a dedicated backup server will simplify your backup plans.
Since it uses loosely-coupled asynchronous replication, it will unlikely cause additional
overhead to the production database. However, this server is exposed to a single point
of failure with the possibility of inconsistent backup if the backup server regularly lags
behind. A best practice when automating the process is to ensure the backup server
has caught up with the designated master prior to executing the backup. To check how
behind the slave is, you can use the following statement and look for “Seconds_Behind_
Master” value:
If the backup server is dedicated for backup storage, you can stream the backup over
network to this server using a combination of compression (gzip, tar and xbstream)
alongside network interaction tools like SSH, rsync or netcat. With mysqldump, you can
34
use gzip and SSH to compress and stream the created backup to the another server:
Or, use mysqldump to connect to the target MySQL server remotely and perform the
dump (provided you are running the same MySQL client version as that of the target
server):
With Percona Xtrabackup, you can use --stream option (available in innobackupex)
to send it to another server instead of storing it locally. There are two streaming tools
supported, tar and xbstream:
1 # using tar
2 $ innobackupex --stream=tar ./ | ssh root@storage “cat - > /
storage/backups/fullbackup.tar”
3 # using xbstream
4 $ innobackupex --compress --stream=xbstream /storage/back-
ups/ | ssh root@storage “xbstream -x -C /storage/backups/”
The main advantage with this setup is that it simplifies the management of backup
storage by consolidating backups in one centralized location. By keeping data in one
place, it’s easier to manage both the hardware and the data itself. That means closer
control on data protection, version control and security with a consistent set of data.
It also means better control of hardware configuration, capacity and performance. By
focusing your efforts in one place, it should also reduce expenditure and risk. Other
benefits in having a dedicated backup server is that you can use it as a sandbox to
perform regular backup verifications, create a staging MySQL server to extract partial
data or prepare the restore data before copying the MySQL data directory onto the
target server.
35
Backup Management
Making sure that backups run successfully every day can be a chore – checking whether
each job has completed successfully, re-running jobs that failed, swapping out disks and
removing data off site. All these tasks take up time and add zero business value. Then
there is also the task of restoring data or configurations, when there is a problem with
the database or request from the application developer or QA.
Performing a backup is easy. The harder part is to ensure the backups are organized,
usable, available and manageable from the Ops perspective. The number of backup
files will grow, database sizes will grow over months and years, and backup procedures
will become more complex - especially with all the utilities required to make it all work.
Therefore, we must carefully plan our backup strategy from the beginning in order to
avoid issues further down the line.
Apart from regular backup schedules, you might also need to backup your data
occasionally before making significant changes, for example, schema, software or
hardware changes. In conjunction with binary logging, you will then avoid data loss and
you can at least revert to the position just before the failed change (e.g an erroneous
drop table).
You should also schedule for backup verification, to verify that backups are usable
and restorable. Once in a month, you may try restore any random backup from all
multiplexed devices (i.e., local server/external server/SAN/tape).
36
5.2. Backup Verification and Integrity
To verify that your backup has been successful, restore the backup data on a different
server and run the MySQL daemon (mysqld) on the new data directory. Nothing is
better than testing a restore, and it should be a periodic procedure. You should be able
to start mysqld without problems. Once you have mysqld running, you need to test
each table’s usability. You can then execute SHOW statements to verify the database
and table structures, and execute queries to further verify details of the database.
You may add other verification criterias to trigger alerts. For example, the size of the
backup file should be more than x GB (depends upon standard backup size you get).
An alarm is triggered if it is lesser than that. When copying or moving the backup
files from one location to another, checksum the file to verify its integrity. You can use
the md5sum command to calculate the checksum and compare before and after the
operation, for example:
1 ## On local server
2 $ md5sum backup.sql
3 71e41ff4ebf84db6f07eb73bddcd6073 backup.sql
4 ## Copy to storage server
5 $ scp sbtest.sql root@storage:/backups/dump/
6 ## On storage server
7 $ md5sum /backups/dump/backup.sql
8 71e41ff4ebf84db6f07eb73bddcd6073 /backups/dump/backup.sql
There are also some tools available in the MySQL ecosystem to verify the integrity of a
backup, as shown in the next sections.
5.2.1. mysqlcheck
MySQL provides utility tools to check for database consistency and check for errors.
One of it is mysqlcheck, which uses the SQL statements CHECK TABLE, REPAIR TABLE,
ANALYZE TABLE, and OPTIMIZE TABLE in a convenient way for the user. It determines
which statements to use for the operation you want to perform, and then sends the
statement to the server to be executed. Mysqlcheck is also invoked by the mysql_
upgrade script to check tables and repair them if necessary.
5.2.2. mysqldbcompare
MySQL provides a utility called mysqldbcompare, to compare the objects and data
from two databases to find differences. This tool is only available as part of mysql-
utilities package. It identifies objects having different definitions in the two databases
and presents them in a diff-style format. However, the data must not change during the
comparison as unexpected errors may then occur.
If you are using mysqldump to backup a single database in MySQL replication or Galera
Cluster with asynchronous slave, you can use one of the slave servers for backups and
also periodically test the restore. The process of testing can be done as per example
37
below:
1. Stop the slave process (so database does not get updated)
2. Run the backup command using mysqldump for the selected database
3. Create a new database for restore purpose, for example restore_db1
4. Restore the data from backup into restore_db1
5. Use mysqldbcompare to compare the two databases
6. Drop restore_db1 database
7. Start the slave process again
38
At the end, you can see the result whether the databases are consistent. At this point,
the backup is verified to be working and you can safely store it to an appropriate
backup location.
5.2.3. pt-table-checksum
Another way to verify if the backup is consistent is by setting up the replication and
running pt-table-checksum. This can be used to verify any type of backups, but before
setting up replication, the backup should be prepared and be able to run. This means
that incremental backups should be merged with full backups, encrypted backups
should be decrypted and so on. It performs an online replication consistency check by
executing checksum queries on the master, which produces different results on replicas/
slaves that are inconsistent with the master.
1 $ ./pt-table-checksum
2 TS ERRORS DIFFS ROWS CHUNKS SKIPPED
TIME TABLE
3 04-30T11:31:50 0 0 633135 8 0
5.400 exampledb.aka_name
4 04-30T11:31:52 0 0 290859 1 0
2.692 exampledb.aka_title
5 Checksumming exampledb.user_info: 16% 02:27 remain
6 Checksumming exampledb.user_info: 34% 01:58 remain
7 Checksumming exampledb.user_info: 50% 01:29 remain
8 Checksumming exampledb.user_info: 68% 00:56 remain
9 Checksumming exampledb.user_info: 86% 00:24 remain
10 04-30T11:34:38 0 0 22187768 126 0
165.216 exampledb.user_info
11 04-30T11:38:09 0 0 0 1 0
0.033 mysql.time_zone_name
12 04-30T11:38:09 0 0 0 1 0
0.052 mysql.time_zone_transition
13 04-30T11:38:09 0 0 0 1 0
0.054 mysql.time_zone_transition_type
14 04-30T11:38:09 0 0 8 1 0
0.064 mysql.user
If all the values in the DIFFS column are 0, that means that the backup is consistent with
the current setup. At this point, the backup is verified to be working and you can safely
store it to an appropriate backup location.
39
5.3. Backup Availability
A local backup is performed on the same host where the MySQL/MariaDB server runs,
whereas a remote backup is executed from a different host. Mysqldump can perform a
backup against a local or remote MySQL server and the backup output is stored in the
location where the process is initiated. On the other hand, Percona Xtrabackup needs
to access the filesystem and MySQL data directory. So the backup has to be initiated
locally on the MySQL server, with options to store the backup locally or stream it over
the network to another host.
If the backup should be stored locally on the MySQL server, try to avoid using
compression since this process is CPU intensive and can directly impact the
performance of your MySQL server. However, some environments have a period of low
database traffic and using compression can save you a lot of space. It is a bit of tradeoff
of processing power over storage space.
The major bottleneck here however is the data transfer speed. Unless it operates on
a high speed LAN backbone, the remote backup can be ineffective as it is tied to the
maximum upstream speed of the network. To save bandwidth, one would compress the
backup on the MySQL server before transferring it over the network.
This is handy in some cases where you can get the best of both worlds, onsite and
offsite. However, the bandwidth in data center is usually costly to support such seamless
integration. You do not want the syncing process to eat the allocated bandwidth and
compromise the reliability of your production database server. Security is also another
concern where you have to enforce encryption over the line to the public cloud.
A good use case of hybrid storage is to have a dedicated backup server, with a
dedicated bandwidth line to sync up backup files into the cloud.
40
5.4. Backup Housekeeping
To keep backup storage space at an optimum level, you should regularly delete
backups that are no longer needed for recovery. Ideally, a full backup with associated
incremental backups and binary logs can be purged after you exceed the expire_
log_days value. However, you have to ensure that your backup files are verified and
restorable before purging them. You can also use PURGE BINARY LOGS statement to
remove the older logs once they are copied:
Occasionally, for Percona Xtrabackup, one would decompress and prepare a backup in
the MySQL server after the recovery process has been carried out. The prepared data
could eat up significant disk space and this can lead to operating system instability.
A good approach for this is to perform this exercise (decompress and prepare) in a
“dump” directory, and schedule a garbage collector command or script to clear the data
on a daily basis.
To overcome this, ensure you have a safeguard mechanism to check if the backup
process has completed correctly, or otherwise, move to the next available node for that
particular backup set. This requires some logic extension to the backup command or
script that you are using. It is also a good approach to use another independent server
(for example, monitoring server) to trigger the backup process on another server. You
can also review your backup schedule to avoid overlap with maintenance activities.
41
ClusterControl as Backup
Manager
Backup managers are third party tools to simplify the backup operation and
management. They do not add features to the current backup - they organize, optimize
and use what it is available at operating system or database level. Managing backups
can become complex when you have to deal with large datasets, growing database
workloads, or multiple servers.
42
There are interesting features that are adapted to the database topology used. For
Galera clusters, ClusterControl can desync a node during backup, so it won’t affect
the running database cluster. It can also automatically failover the backup to the other
host, in case the primary backup host fails. Backup files can be stored locally on the
node where the backup is taken, or they can also be streamed to the controller node
and compressed on the fly. Incremental backups are grouped with the appropriate
full backup, into backup sets. This is a neat way to organize backups, and reduce the
complexity of the recovery process.
All incremental backups after a full backup will be part of the same backup set, as seen
in the screenshot below:
43
By clicking on the Restore button, you are two clicks away from a full restoration of the
completed backups. ClusterControl will automatically perform all the necessary backup
preparation processes for Percona Xtrabackup and do the final copy-back before re-
bootstrapping the cluster. You will end up with a running and fully restored cluster,
where you can immediately proceed to do point-in-time recovery using binary logs if
necessary.
ClusterControl also provides operational reports, for all database systems managed by
ClusterControl. The following screenshot is an example of the report:
It contains two sections and basically gives you a short summary of when the last
backup was created, if it completed successfully or failed. You can also check the list of
backups executed on the cluster with their state, type and size. This is as close you can
get to check that backups work correctly without running a full recovery test. However,
we definitely recommend that such tests are regularly performed.
44
Conclusion
Things fail. It is wise to take measures that prevent failures - redundant hardware,
mirrored storage, replication and clustering, failover technology and multi datacenter
architectures are some of them. These can minimize the need for a full recovery of your
data, but no amount of planning can prevent an unexpected failure. A sound backup
and recovery plan is your insurance policy, and one you probably need if you value your
data. The amount of data handled by a database server is also growing. Not too long
ago, it was common to talk in terms of tens or a few hundreds of gigabytes on a single
server. Now it is common with half a terabyte and upwards - on a single server. Business
are generating more data in general, and commodity servers nowadays have plenty
of RAM, CPU and SSD storage to handle higher data volumes. Therefore, an efficient
backup strategy is key to business continuity.
45
About Severalnines
Severalnines provides automation and management software for database clusters. We
help companies deploy their databases in any environment, and manage all operational
aspects to achieve high-scale availability.
Severalnines’ products are used by developers and administrators of all skills levels
to provide the full ‘deploy, manage, monitor, scale’ database cycle, thus freeing them
from the complexity and learning curves that are typically associated with highly
available database clusters. The company has enabled over 8,000 deployments to date
via its popular ClusterControl solution. Currently counting BT, Orange, Cisco, CNRS,
Technicolour, AVG, Ping Identity and Paytrail as customers. Severalnines is a private
company headquartered in Stockholm, Sweden with offices in Singapore and Tokyo,
Japan. To see who is using Severalnines today visit, http://severalnines.com/customers.
46
Related Resources from
Severalnines
Whitepapers
Download here
Download here
Download here
47
Deploy Manage
Monitor Scale
48
© 2016 Severalnines AB. All rights reserved. Severalnines and the Severalnines logo(s) are
trademarks of Severalnines AB. Other names may be trademarks of their respective owners.