PostgreSQL Replication - Second Edition - Sample Chapter
PostgreSQL Replication - Second Edition - Sample Chapter
ee
Sa
pl
e
Preface
Since the first edition of PostgreSQL Replication many new technologies have emerged
or improved. In the PostgreSQL community, countless people around the globe have
been working on important techniques and technologies to make PostgreSQL even
more useful and more powerful.
To make sure that readers can enjoy all those new features and powerful tools, I
have decided to write a second, improved edition of PostgreSQL Replication. Due
to the success of the first edition, the hope is to make this one even more useful to
administrators and developers alike around the globe.
All the important new developments have been covered and most chapters have
been reworked to make them easier to understand, more complete and absolutely
up to date.
I hope that all of you can enjoy this book and benefit from it.
Preface
Chapter 3, Understanding Point-in-time Recovery, is the next logical step and outlines
how the PostgreSQL transaction log will help you to utilize Point-in-time Recovery
to move your database instance back to a desired point in time.
Chapter 4, Setting Up Asynchronous Replication, describes how to configure
asynchronous master-slave replication.
Chapter 5, Setting Up Synchronous Replication, is one step beyond asynchronous
replication and offers a way to guarantee zero data loss if a node fails. You will
learn about all the aspects of synchronous replication.
Chapter 6, Monitoring Your Setup, covers PostgreSQL monitoring.
Chapter 7, Understanding Linux High Availability, presents a basic introduction to
Linux-HA and presents a set of ideas for making your systems more available and
more secure. Since the first edition, this chapter has been completely rewritten and
made a lot more practical.
Chapter 8, Working with PgBouncer, deals with PgBouncer, which is very often used
along with PostgreSQL replication. You will learn how to configure PgBouncer and
boost the performance of your PostgreSQL infrastructure.
Chapter 9, Working with pgpool, covers one more tool capable of handling replication
and PostgreSQL connection pooling.
Chapter 10, Configuring Slony, contains a practical guide to using Slony and shows
how you can use this tool fast and efficiently to replicate sets of tables.
Chapter 11, Using SkyTools, offers you an alternative to Slony and outlines how
you can introduce generic queues to PostgreSQL and utilize Londiste replication
to dispatch data in a large infrastructure.
Chapter 12, Working with Postgres-XC, offers an introduction to a synchronous
multimaster replication solution capable of partitioning a query across many nodes
inside your cluster while still providing you with a consistent view of the data.
Chapter 13, Scaling with PL/Proxy, describes how you can break the chains and
scale out infinitely across a large server farm.
Chapter 14, Scaling with BDR, describes the basic concepts and workings of the
BDR replication system. It shows how BDR can be configured and how it operates
as the basis for a modern PostgreSQL cluster.
Chapter 15, Working with Walbouncer, shows how transaction log can be replicated
partially using the walbouncer tool. It dissects the PostgreSQL XLOG and makes
sure that the transaction log stream can be distributed to many nodes in the cluster.
Setting Up Asynchronous
Replication
After performing our first PITR, we are ready to work on a real replication setup.
In this chapter, you will learn how to set up asynchronous replication and streaming.
The goal is to make sure that you can achieve higher availability and higher
data security.
In this chapter, we will cover the following topics:
Understanding streaming
Managing timelines
At the end of this chapter, you will be able to easily set up streaming replication in
a couple of minutes.
[ 83 ]
In this scenario, streaming replication will be the solution to your problem. With
streaming replication, the replication delay will be minimal and you can enjoy an
extra level of protection for your data.
Let's talk about the general architecture of the PostgreSQL streaming infrastructure.
The following diagram illustrates the basic system design:
You have already seen this type of architecture. What we have added here is the
streaming connection. It is basically a normal database connection, just as you would
use in any other application. The only difference is that in the case of a streaming
connection, the connection will be in a special mode so as to be able to carry the XLOG.
[ 84 ]
Chapter 4
Now that the master knows that it is supposed to produce enough XLOG, handle
XLOG senders, and so on, we can move on to the next step.
For security reasons, you must configure the master to enable streaming replication
connections. This requires changing pg_hba.conf as shown in the previous
chapter. Again, this is needed to run pg_basebackup and the subsequent streaming
connection. Even if you are using a traditional method to take the base backup, you
still have to allow replication connections to stream the XLOG, so this step
is mandatory.
Once your master has been successfully configured, you can restart the database (to
make wal_level and max_wal_senders work) and continue working on the slave.
Now that we have taken a base backup, we can move ahead and configure
streaming. To do so, we have to write a file called recovery.conf (just like before).
Here is a simple example:
standby_mode = on
primary_conninfo= ' host=sample.postgresql-support.de port=5432 '
Note that from PostgreSQL 9.3 onwards, there is a -R flag for pg_basebackup,
which is capable of automatically generating recovery.conf. In other words,
a new slave can be generated using just one command.
We have two new settings:
standby_mode: This setting will make sure that PostgreSQL does not stop
once it runs out of XLOG. Instead, it will wait for new XLOG to arrive. This
setting is essential in order to make the second server a standby, which
replays XLOG constantly.
[ 85 ]
primary_conninfo: This setting will tell our slave where to find the master.
You have to put a standard PostgreSQL connect string (just like in libpq)
here. The primary_conninfo variable is central and tells PostgreSQL to
stream XLOG.
For a basic setup, these two settings are totally sufficient. All we have to do now is to
fire up the slave, just like starting a normal database instance:
iMac:slavehs$ pg_ctl -D / start
server starting
LOG:
LOG:
LOG:
to primary
LOG:
LOG:
The database instance has successfully started. It detects that normal operations have
been interrupted. Then it enters standby mode and starts to stream XLOG from the
primary system. PostgreSQL then reaches a consistent state and the system is ready
for action.
psql: FATAL:
This is the default configuration. The slave instance is constantly in backup mode
and keeps replaying XLOG.
If you want to make the slave readable, you have to adapt postgresql.conf on the
slave system; hot_standby must be set to on. You can set this straightaway, but you
can also make this change later on and simply restart the slave instance when you
want this feature to be enabled:
[ 86 ]
Chapter 4
iMac:slavehs$ pg_ctl -D ./target_directory restart
waiting for server to shut down....
LOG:
FATAL:
LOG:
shutting down
LOG:
done
server stopped
server starting
LOG:
CET
LOG:
LOG:
LOG:
LOG:
LOG:
LOG:
The restart will shut down the server and fire it back up again. This is not too much
of a surprise; however, it is worth taking a look at the log. You can see that a process
called walreceiver is terminated.
Once we are back up and running, we can connect to the server. Logically, we are
only allowed to perform read-only operations:
test=# CREATE TABLE x (id int4);
ERROR:
The server will not accept writes, as expected. Remember, slaves are read-only.
wal_sender
wal_receiver
The wal_sender instances are processes on the master instance that serve XLOG
to their counterpart on the slave, called wal_receiver. Each slave has exactly one
wal_receiver parameter, and this process is connected to exactly one wal_sender
parameter on the data source.
[ 87 ]
How does this entire thing work internally? As we have stated before, the connection
from the slave to the master is basically a normal database connection. The
transaction log uses more or less the same method as a COPY command would do.
Inside the COPY mode, PostgreSQL uses a little micro language to ship information
back and forth. The main advantage is that this little language has its own parser,
and so it is possible to add functionality fast and in a fairly easy, non-intrusive way.
As of PostgreSQL 9.4, the following commands are supported:
TIMELINE_HISTORY tli: This requests the server to send the timeline history
file for a given timeline. The response consists of the filename and content.
logical replication slot, an output plugin for formatting the data returned by
the replication slot is mandatory.
parameters.
What you see is that the protocol level is pretty close to what pg_basebackup offers
as command-line flags.
Chapter 4
It is definitely not very beneficial for the master to have, say, 100 slaves.
An additional use case is as follows: having a master in one location and a couple of
slaves in some other location. It does not make sense to send a lot of data over a long
distance over and over again. It is a lot better to send it once and dispatch it to the
other side.
To make sure that not all servers have to consume the transaction log from a single
master, you can make use of cascaded replication. Cascading means that a master
can stream its transaction log to a slave, which will then serve as the dispatcher and
stream the transaction log to further slaves.
The slaves to the far right of the diagram could serve as dispatchers again. With this
very simple method, you can basically create a system of infinite size.
The procedure to set things up is basically the same as that for setting up a single
slave. You can easily take base backups from an operational slave (postgresql.conf
and pg_hba.conf have to be configured just as in the case of a single master).
Be aware of timeline switches; these can easily cause issues in the
event of a failover. Check out the Dealing with the timelines section to
find out more.
[ 89 ]
PostgreSQL offers some simple ways to do this. The first way, and most likely the
most convenient way, to turn a slave into a master is by using pg_ctl:
iMac:slavehs$ pg_ctl -D /target_directory promote
server promoting
iMac:slavehs$ psql test
psql (9.2.4)
Type "help" for help.
test=# CREATE TABLE sample (id int4);
CREATE TABLE
The promote command will signal the postmaster and turn your slave into a master.
Once this is complete, you can connect and create objects.
If you've got more than one slave, make sure that those slaves are
manually repointed to the new master before the promotion.
Chapter 4
administrator command
LOG:
LOG:
LOG:
LOG:
LOG:
PostgreSQL will check for the file you have defined in recovery.conf every 5
seconds. For most cases, this is perfectly fine, and fast enough by far..
[ 91 ]
max_wal_senders = 5
# we used five here to have some spare capacity
replication
hs
trust
host
replication
hs
127.0.0.1/32
trust
host
replication
hs
::1/128
trust
host
replication
all
192.168.0.0/16
md5
In our case, we have simply opened an entire network to allow replication (to keep
the example simple).
Once we have made these changes, we can restart the master and take a base backup
as shown earlier in this chapter.
[ 92 ]
Chapter 4
In the next step, we can write a simple recovery.conf file and put it into the main
data directory:
restore_command = 'cp /archive/%f %p'
standby_mode = on
primary_conninfo = ' host=sample.postgresql-support.de port=5432 '
trigger_file = '/some_path/start_me_up.txt'
Error scenarios
The most important advantage of a dual strategy is that you can create a cluster that
offers a higher level of security than just plain streaming-based or plain file-based
replay. If streaming does not work for some reason, you can always fall back on
the files.
In this section, we will discuss some typical error scenarios in a dual strategy cluster.
If the streaming connection fails, PostgreSQL will try to keep syncing itself through
the file-based channel. Should the file-based channel also fail, the slave will sit there
and wait for the network connection to come back. It will then try to fetch XLOG and
simply continue once this is possible again.
[ 93 ]
[ 94 ]
Chapter 4
Physical replication slots: Teach the master to recycle the XLOG only when it
has been consumed
Using wal_keep_segments
To make your setup much more robust, we recommend making heavy use of wal_
keep_segments. The idea of this postgresql.conf setting (on the master) is to teach
the master to keep more XLOG files around than theoretically necessary. If you set
this variable to 1000, it essentially means that the master will keep 16 GB of more
XLOG than needed. In other words, your slave can be gone for 16 GB (in terms of
changes to the master) longer than usual. This greatly increases the odds that a slave
can join the cluster without having to completely resync itself from scratch. For a 500
MB database, this is not worth mentioning, but if your setup has to hold hundreds of
gigabytes or terabytes, it becomes an enormous advantage. Producing a base backup
of a 20 TB instance is a lengthy process. You might not want to do this too often, and
you definitely won't want to do this over and over again.
If you want to update a large base backup, it might be beneficial to
incrementally update it using rsync and the traditional method of
taking base backups.
[ 95 ]
What are the reasonable values for wal_keep_segments? As always, this largely
depends on your workloads. From experience, we can tell that a multi-GB implicit
archive on the master is definitely an investment worth considering. Very low values
for wal_keep_segments might be risky and not worth the effort. Nowadays, pace is
usually cheap. Small systems might not need this setting, and large ones should have
sufficient spare capacity to absorb the extra requirements. Personally, I am always in
favor of using at least some extra XLOG segments.
| xlog_position
-------------+--------------repl_slot |
-------------+-----------+--------+----------+--------+------+-----------repl_slot | physical
| f
(1 row)
Once the base backup has happened, the slave can be configured easily:
standby_mode = 'on'
primary_conninfo = 'host=master.postgresql-support.de port=5432 user=hans
password=hanspass'
primary_slot_name = 'repl_slot'
[ 96 ]
Chapter 4
The configuration is just as if there were no replication slots. The only change is
that the primary_slot_name variable has been added. The slave will pass the name
of the replication slot to the master, and the master knows when to recycle the
transaction log. As mentioned already, if a slave is not in use anymore, make sure
that the replication slot is properly deleted to avoid trouble on the master (running
out of disk space and other troubles). The problem is that this is incredibly insidious.
Slaves, being optional, are not always monitored as they should be. As such, it might
be a good idea to recommend that you regularly compare pg_stat_replication
with pg_replication_slots for mismatches worthy of further investigation.
The following script shows how you can clean any XLOG that is older than a day:
#!/bin/sh
find /archive "-type f -mtime +1 -exec rm -f {} \;
Keep in mind that your script can be of any kind of complexity. You have to decide
on a proper policy to handle the XLOG. Every business case is different, and you
have all the flexibility to control your archives and replication behavior.
Again, you can use this to clean the old XLOG, send out notifications, or perform any
other kind of desired action.
Conflict management
In PostgreSQL, the streaming replication data flows in one direction only. The XLOG
is provided by the master to a handful of slaves, which consume the transaction log
and provide you with a nice copy of the data. You might wonder how this could ever
lead to conflicts. Well, there can be conflicts.
Consider the following scenario: as you know, data is replicated with a very small
delay. So, the XLOG ends up at the slave after it has been made on the master. This
tiny delay can cause the scenario shown in the following diagram:
[ 98 ]
Chapter 4
Let's assume that a slave starts to read a table. It is a long read operation. In the
meantime, the master receives a request to actually drop the table. This is a bit of a
problem, because the slave will still need this data to perform its SELECT statement.
On the other hand, all the requests coming from the master have to be obeyed under
any circumstances. This is a classic conflict.
In the event of a conflict, PostgreSQL will issue the Terminating
connection due to conflict with recovery error message.
Don't replay the conflicting transaction log before the slave has terminated
the operation in question
The first option might lead to ugly delays during the replay process, especially if
the slave performs fairly long operations. The second option might frequently kill
queries on the slave. The database instance cannot know by itself what is best for
your application, so you have to find a proper balance between delaying the replay
and killing queries.
To find this delicate balance, PostgreSQL offers two parameters in postgresql.
conf:
max_standby_archive_delay = 30s
# max delay before canceling queries
# when reading WAL from archive;
# -1 allows indefinite delay
max_standby_streaming_delay = 30s
# max delay before canceling queries
# when reading streaming WAL;
# -1 allows indefinite delay
[ 99 ]
[ 100 ]
Chapter 4
The first part of the XLOG files is an interesting thing. You can observe that so far,
there was always a 1 near the beginning of our filename. This is not so anymore.
By checking the first part of the XLOG filename, you can see that the number has
changed over time (after turning the slave into a master, we have reached timeline
number 2).
It is important to mention that (as of PostgreSQL 9.4) you cannot simply pump the
XLOG of timeline 5 into a database instance that is already at timeline 9. It is simply
not possible.
In PostgreSQL 9.3, we are able to handle these timelines a little more flexibly. This
means that timeline changes will be put to the transaction log, and a slave can follow
a timeline shift easily.
Timelines are especially something to be aware of when cascading
replication and working with many slaves. After all, you have to
connect your slaves to some server if your master fails.
[ 101 ]
Delayed replicas
So far, two main scenarios have been discussed in this book:
Both scenarios are highly useful and can serve people's needs nicely. However, what
happens if databases, and especially change volumes, start being really large? What
if a 20 TB database has produced 10 TB of changes and something drastic happens?
Somebody might have accidentally dropped a table or deleted a couple of million
rows, or maybe somebody set data to a wrong value. Taking a base backup and
performing a recovery might be way too time consuming, because the amount of
data is just too large to be handled nicely.
The same applies to performing frequent base backups. Creating a 20 TB base backup
is just too large, and storing all those backups might be pretty space consuming.
Of course, there is always the possibility of getting around certain problems on the
filesystem level. However, it might be fairly complex to avoid all of those pitfalls in
a critical setup.
Since PostgreSQL 9.4, the database platform provides an additional, easy-to-use
feature. It is possible to tell PostgreSQL that the slave is supposed to stay a couple
of minutes/hours/days behind the master. If a transaction commits on the master,
it is not instantly replayed on the slave but applied some time later (for example, 6
hours). The gap between the master and the slave can be controlled easily, and if the
master crashes, the administrator has a convenient 6-hour window (in my example)
to roll forward to a desired point in time. The main advantage is that there is already
a base backup in place (the lagging standby in this case), and in addition to that, the
time frame, which has to be recovered, is fairly small. This leads to less downtime
and faster recovery. Replaying only a small time frame is way more desirable than
having to fiddle around with large base backups and maybe even a larger amount
of XLOG. Having a slave that lags behind is of course no substitute for a proper
backup; however, it can definitely help.
To configure a slave that lags behind, recovery_min_apply_delay can be
added to recovery.conf. Just use replication slots and set the desired value
of recovery_min_apply. Then your system will work as expected.
[ 102 ]
Chapter 4
Handling crashes
It is generally wise to use a transaction log archive when using a lagging slave. In
addition to that, a crash of the master itself has to be handled wisely. If the master
crashes, the administrator should make sure that they decide on a point in time to
recover to. Once this point has been found (which is usually the hardest part of the
exercise), recovery_target_time can be added to recovery.conf. Once the slave
has been restarted, the system will recover to this desired point in time and go live. If
the time frames have been chosen wisely, this is the fastest way to recover a system.
In a way, recovery_min_apply_delay is a mixture of classical PITR and HotStandby-Slaves.
Summary
In this chapter, you learned about streaming replication. We saw how a streaming
connection can be created, and what you can do to configure streaming to your
needs. We also briefly discussed how things work behind the scenes.
It is important to keep in mind that replication can indeed cause conflicts, which
need proper treatment.
In the next chapter, it will be time to focus our attention on synchronous replication,
which is logically the next step. You will learn to replicate data without potential
data loss.
[ 103 ]
www.PacktPub.com
Stay Connected: