0% found this document useful (0 votes)
547 views108 pages

PostgreSQL HA Setup with Pgpool-II

This document provides a tutorial for setting up PostgreSQL high availability (HA) using pgpool-II. It begins with an introduction explaining why PostgreSQL HA implementation with pgpool-II was more complicated than initially expected, despite the author's experience setting up other systems. It then discusses the goals of HA, including replication, failover, and optional load balancing. Pgpool-II is selected to accomplish features beyond PostgreSQL's capabilities. The implementation plan is to use streaming replication between two PostgreSQL servers, with pgpool-II installed on both servers to provide connection pooling, load balancing, and automatic failover.

Uploaded by

hans_lbs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
547 views108 pages

PostgreSQL HA Setup with Pgpool-II

This document provides a tutorial for setting up PostgreSQL high availability (HA) using pgpool-II. It begins with an introduction explaining why PostgreSQL HA implementation with pgpool-II was more complicated than initially expected, despite the author's experience setting up other systems. It then discusses the goals of HA, including replication, failover, and optional load balancing. Pgpool-II is selected to accomplish features beyond PostgreSQL's capabilities. The implementation plan is to use streaming replication between two PostgreSQL servers, with pgpool-II installed on both servers to provide connection pooling, load balancing, and automatic failover.

Uploaded by

hans_lbs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Howto Setup PostgreSQL High Availability With Pgpool-II

[Link]

0. Introduction
About two weeks ago I've decided to spend day or two to implement PostgreSQL high availability
(HA) implementation by using pgpool-II. Today I still don't have it implemented. It turned out to be
much more complicated than I've initially expected. Well, I often underestimate a work that needs to
be done, but in this case it was especially painful since it happened when I'm already pretty short
with time. It is true that I don't have any serious experience with PostgreSQL besides basic usage
(apt-get and basic settings in [Link] and pg_hba.conf). However, it is also true
that I'm kinda good with computers, and I've succeeded to implement things like ElasticSearch
server and Cassandra cluster in less than two days each, although not having any prior experience.
But PostgreSQL / pgpool-II turned out to be a different kind of monster...
It is important to say that this tutorial is written with all details, without assuming any preexisting
knowledge. It covers all the steps so that you won't need to search for other resources to be able to
understand part of this tutorial. In short, this tutorial is truly Dummy-to-Expert kind of tutorial.

1.1. What is Actually HA?


Don't worry, I won't bother you with theory, but I must be precise about what we are actually trying
to accomplish, and it will also help us understand some basic terms, often used in this area. At the
highest level of abstraction I will define the following expectation:
Database cluster should be implemented in such way that database remains available even if any of
servers goes down.
Fair enough. Let's see what it means at a bit lower, more technical level of abstraction:
 The data should be distributed between cluster members in such way that all the members
have the most recent data. (It's not 100% true, but let's leave it as such for now.) This part is
accomplished by replication.
 When primary server goes down, a standby server should take over its role. Note:
PostgreSQL clustering always include a primary server and standby server(s). This part is
accomplished by (preferably automatic) failover.
 Not mentioned in our original request but assumed: the failed server should be easy
replaceable / recoverable.
 Optional: When the system is in its regular state (all servers are running), overall load should
be distributed. It means that not all queries will be executed on primary server, but some will
be directed to standby server(s). This is accomplished by load balancing.
As wise man said, "knowing where you want to go will significantly increase your chances to
actually get there." For this reason I will be more precise about what I want to accomplish (and I
suggest you to do the same):
My primary aim is to get replication and failover up and running. At the moment I'm not interested
in load-balancing, although the solution that will be implemented by procedure described here will
actually allow load-balancing also.

1.2. Why Two Products and Who Does What?


At the moment of this writing (PostgreSQL version 9.5) it is not possible to implement complete
HA deployment by using only PostgreSQL. Looking at HA parts defined in the previous section we
can say that:
 PostgreSQL does offer a variety of replication options, so replication part is fully covered;
 PostgreSQL implements an easy way to perform failover (as it will be shown later, it is
enough to create a trigger file, and standby server will take over primary server role), but it
does not performs it automatically. Even more, PostgreSQL does not implement any kind of
tracking (awareness) if the primary server is down or not. It means that we need another
product for automatic failover feature.
 PostgreSQL itself does not implement any load-balancing feature. The server that receives a
query will execute that query. It means that we need another product for load-balancing
feature (if we need the feature in the first place).
If you wonder why PostgreSQL does not implement mentioned features - the answer is pretty
simple: Failover part in the whole HA implementation is very risky. A lot of problems (resulting in
data loss) can be caused if there is more than one primary server in the cluster, and it may happen if
one of standby servers falsely concludes that primary server is not working, and that it should take
over its role. For example it may happen if a standby server loses network connection causing it to
conclude "wow, all other servers are down, I must go primary..." Another example would be that it
may happen that the primary server actually goes down, but comes back again, still thinking that it
is primary server although some other standby server has already took over that role...
Anyway, PostgreSQL team obviously didn't wanted to include this risk in their product. This way
they and their product are protected from such problems, being able to say "Well, it is not caused by
our database. You shouldn't allow two primary servers in the same cluster." Honestly, I fully agree
with their decision - it was probably one of the smartest, life-saving decisions they ever made.
During this procedure we need to be aware of the fact that we are actually working with two
products. It is especially important when it comes to configuration - at every moment, for every
configuration step you should be aware to which of the products it relates to.

1.2.1. Selecting "The Other" Product


As you can see at
[Link] ,
there are quite a few products that can jump in and help to accomplish PostgreSQL HA. My reasons
for selecting pgpool-II are:
 I've wanted the product that relies on existing PostgreSQL replication instead of introducing
its own. Many products (for example slony) are actually implementing their own replication
system instead of using embedded one. In my opinion, no one can know better how to
implement PostgreSQL replication than PostgreSQL team itself. pgpool-II relies on the
embedded PostgreSQL replication implementation.
 At the moment of this writing pgpool-II was the only product that implements all three
additional features used in the comparison matrix: connection pooling, load balancing and
query partitioning. Although at the moment I'm not interested in the latest two, it is generally
good idea to be prepared when a need for such features arises.
 Although pgpool-II is not too mature, it seems to be well supported and aggressively
developed.
Nevertheless, I must admit that I haven't investigated all the products in details, meaning that
another factor was important - lucky pick.

1.2.2. Selecting Replication Model


We've already decided to use PostgreSQL embedded replication mechanism, but not exactly which
one of variety of them ([Link]
[Link]). You can literally spend days and weeks in researching about all of them. Things are
getting even more complicated when even the official PostgreSQL documentation starts introducing
a new terms besides the mentioned list (i.e. binary replication in
[Link] If you've clicked on the previous link
you've might noticed the following note: "NB: there is some duplication with the page on Streaming
Replication ([Link] Really? So you can go
towards discovering all nuts and bolts about all replication models, which will ultimately lead you
to an interesting type of lunacy, or you can stick with me and my choice. After some research I've
decided to go with "Transaction Log Shipping" using "streaming replication". Besides my rough
research the fact that influenced my decision is that this replication model is most often mentioned
in other online resources that are dealing with PostgreSQL HA.

1.3. Physical Infrastructure and Implementation Plan


In my case I'll implement two-servers (single standby) cluster, although the procedure is the same if
you want to implement multiple standbys. In fact I will prepare primary server to be able to accept
more than one standby, just in case. The next question that needs to be answered is how pgpool-II
will be implemented. Oftenly used scenario is to have single pgpool-II server in front of
PostgreSQL cluster, as illustrated on the following image:

However, this scenario again introduces single point of failure - pgpool-II server itself. If pgpool-II
server goes down we would lost database connectivity although both database servers are actually
running. Nevertheless, if you are using a decent hardware dedicated to pgpool-II - this risk is not
too big. pgpool-II (thanks to the fact that it does not torture hard drives) is one of turn-on-and-forget
tools that can run for eons on dedicated hardware. But it stands only if pgpool has its own physical
machine. In every other case you should use some redundancy. Actually, who I'm trying to fool
around? It is always better to have more than one instance running. If we have more than one
pgpool-II instance we can deploy them on the same servers PostgreSQL database is deployed on.
Long story short: I'll use two Ubuntu 14.04 servers, each carrying PostgreSQL 9.5 and pgpool-II
3.5.2. The architecture is described in the following diagram (borrowed from
[Link]
Btw., the post the image is borrowed from is OK, but it deals only with pgpool-II, not covering any
configuration needed at PostgreSQL side.

1.3.1. Watchdog
Before starting with the actual implementation, I believe it is important to demystify one pgpool-II
component - watchdog. The purpose of this component is to periodically check any other pgpool-II
instances (especially the active one) if they are still running. If the active instance fails, thanks to
watchdog standby instance will be aware of this failure and take over active role. If there's more
than one standby instance running - the one with the highest priority will become the active one (we
can configure priority of a particular instance in pgpool-II configuration file, as will be mentioned
below). Honestly I don't know what happens if multiple standby instances are configured with the
same priority; hopefully this is handled internally by pgpool-II in the appropriate way. Finally, to
avoid any confusion, I will tell that watchdog checks other pgpool-II instances, not PostgreSQL
databases. Health of the databases is checked by all pgpool-II instances. In a manner of speaking
we can say that watchdog checks pgpool-II which further checks PostgreSQL.
If you take a look at [Link] download page, you may notice that there's also a product called
"pgpool-HA" (or something like that). This product was used with earlier versions of pgpool-II for
the similar purpose as watchdog in new versions. It means that pgpool-HA, thanks to watchdog, is
now obsolete. I'm not exactly sure in which version of pgpool-II is watchdog implemented for the
first time (3.1 or so I think), but if it is important for you should be able to find this information.
Anyway, during last few years watchdog is present, and chances are that you are already using
version with watchdog. On the other side, if you are just starting with pgpool-II you will start with
the newest version like me, of course.

1.3.2. Virtual IP
Another term to explain is "virtual IP". You might heard about a similar term (floating IP) which is
often used server high availability. Virtual IP is actually the same exact thing. For those who don't
know I'll briefly explain how it works, on an example with two servers, but the principle is the same
for any number of servers. Our infrastructure will be installed on two servers with IP addresses ip1
and ip2. But besides these addresses we will introduce another IP address (let's call it ipV) that will
be used by both servers. How can the same IP address be used with multiple servers? Well, it can't.
In reality it is used only by the server where the active pgpool-II instance is running. But if it
happens that this server fails, thanks to previously explained watchdog, another instance will
become the active one, and this newly promoted instance will also take over virtual IP. This way it
cannot happen that two servers are using virtual IP at the same moment.
The benefit of introducing virtual IP is obvious: all other applications and systems in our
infrastructure will continue to use the same IP (virtual IP) for database access, even if active
pgpool-II instance (or primary PostgreSQL instance) fails. It means that no reconfiguration of other
systems is needed on failover. The only important thing related to virtual IP is that we must select
an IP address that is not used by any other system/server in our network, of course.

1.3.3. What about pgpool-II???


Originally it wasn't my intention to describe pgpool-II, but since I've explained some of it, it would
be unfair not to provide any explanation about pgpool-II itself. I won't go into details about all its
features (i.e. load balancing, query partitioning, etc.). Instead I will explain only its basic role.
Basically pgpool-II behaves as PostgreSQL HA proxy. It means that pgpool-II exposes the same
interface to outside world as PostgreSQL does, so all database clients will be actually connected to
pgpool-II instead to PostgreSQL itself, without even being aware of that. On the other side, when
pgpool-II receives a query from the outside world, it decides what to do with it. It will know which
PostgreSQL instance is down, which is primary, and to which it should forward the query, and it
does that completely transparently for outside world clients.
The similar transparency stands from the database's point of view; from PostgreSQL perspective
pgpool-II is nothing more than another database client. Probably the only direct client, but still
nothing more than a client. Basically pgpool-II does a great job, still being completely invisible for
all other participants.

Where to Go Next?
Once we've met our enemy, thus being significantly less afraid, we can continue with installing
PostgreSQL servers and establishing replication in PostgreSQL HA with pgpool-II - Part 2.

Part 2
In this part we'll go through installing PostgreSQL and configuring replication.

2.1. Infrastructure
Just to remind you: we'll use two servers, in my case Ubuntu 14.04:
FQDN IP Address Purpose 1 Purpose 2
[Link] [Link] Primary PostgreSQL instance Active pgpool-II instance
[Link] [Link] Standby PostgreSQL Standby pgpool-II instance
Virtual IP that will be used is [Link].
Keep in mind that PostgreSQL team recommends that all the servers included in replication should
be similar, "at least from database's point of view".
2.2. Installing PostgreSQL
This installation should be done on both servers of course. Official Ubuntu PostgreSQL packages
are stucked to PostgreSQL version 9.3, and we would like to go with a newer version (at least 9.4
since some significant improvements regarding replication are introduced there). For this reason the
first thing to do is to add PostgreSQL apt store. It is well described at PostgreSQL wiki, but for your
convenience I will repeat here:
#sh -c 'echo "deb [Link] $(lsb_release -cs)-
pgdg main" > /etc/apt/[Link].d/[Link]'

Note:
I will assume that you're executing commands as root, so I will not use sudo. If this is not a case -
prefix the commands with sudo where needed.

Install prerequisites, repository key, and PostgreSQL itself:


#apt-get install wget ca-certificates
#wget --quiet -O - [Link] | apt-key add -
#apt-get update
#apt-get upgrade
#apt-get install postgresql-9.5 postgresql-9.5-pgpool2

Few notes:
 If you are using sudo you'll need to prefix all the lines form the previous snippet with
sudo, except the second line where sudo is need by the second command and should be
placed after pipe character (|)
 postgresql-9.5-pgpool2 package is not needed for establishing replication, but it
will be needed in the third part of this tutorial when we will install pgpool-II. Unfortunately
for me, existence of this package is not mentioned in pgpool II manual
([Link] so I've spent a lot of time trying to
compile it from source. But lucky you will skip this pain.

Once the database is installed, it is good practice to change postgres user's password:
#su -u postgres
$psql
postgres=# ALTER USER postgres WITH PASSWORD 'pgsql123';
postgres=# \q
The previous code snippet shows how you can enter interactive PostgreSQL session (the first line),
and how to exit it (the last line). The second line is actual SQL command that needs to be executed.
In the rest of this tutorial I will not repeat entering/exiting step, but only the command that needs to
be executed.
In the rest of the tutorial I will continue with a default cluster created during installation of
PostgreSQL. If you want, you can change/create new cluster by using initdb command
([Link] Note that term cluster used here has
different meaning than one we used so far, and we will in the rest of this tutorial. Here it refers to
"collection of databases that are managed by a single server instance". Unfortunate and confusing
terminology introduced by PostgreSQL, but we have to adopt.
By default during package installation PostgreSQL creates the following directories on Ubuntu:
 /etc/postgresql/9.5/main - configuration files like [Link] are
placed there, so let's name it configuration directory;
 /var/lib/postgresql/9.5/main - where the actual data is (and will be) stored, so
we'll name it data directory;
 /usr/lib/postgresql/9.5 - where PostgreSQL binaries are installed. It is not
important for us, but let's name it installation directory.

2.3. Configuring Replication


The main resource for me in this part was Streaming Replication article at PostgreSQL wiki
([Link] but I've also peeked few times at
[Link] and the [Link]
article ([Link] The latest one is
pretty old, dealing with PostgreSQL 9.0 (way before new replication features), but it was useful for
comparing things. Anyway, you won't need to peek anywhere else besides this very article you're
enjoying in so much right now.
This procedure should also be done on both servers. Let's start with creating an user named
replication with REPLICATION privileges:
postgresql# CREATE ROLE replication WITH REPLICATION PASSWORD
'reppassword' LOGIN;
Obviously replication will be performed by using previously created account. In some cases (i.e.
pg_basebackup command used below) you won't be able to specify the password. For this
reason you need to create .pgpass file ([Link]
[Link]) and store the password there. Password file resides in user's home directory, but in
case of postgres user it is not /home/postgres as you might expect. For security reasons his
home directory is /var/lib/postgresql instead. So you need to create / modify
/var/lib/postgresql/.pgpass file and ensure that it contains the following line:
*:*:*:replication:reppassword
The first three asterisks denote "any host, any port, any database". The last two are username and
password, respectively. Basically we've just allowed postgres user to execute commands as
replication user. Password file requires strict permissions, so we also need to execute:
#chown postgres:postgres /var/lib/postgresql/.pgpass
#chmod 0600 /var/lib/postgresql/.pgpass
The password file is needed on standby server, but it won't harm if you create it on both servers.
Btw. we had to do all this password file thing only because PostgreSQL team want us to suffer;
everything would be much easier if pg_basebackup could simply be called with specified
password. But no. They decided to implement some pretty useless flags (i.e. --password and
--no-password), but not an option to actually specify the password. Why? They would
probably answer "for security reasons", but the truth is that they simply want you suffer.
Next change the following entries in [Link] file:
listen_addresses = '*'
port = 5433

Few notes again:


 PostgreSQL instance does not have to listen to all IP addresses. Precisely, it does not have to
listen virtual IP address, but it has to listen on the main server's IP so that pgpool-II installed
on the other server can access it, and it should listen on localhost address if pgpool-II
instance installed on the same server accesses it this way. Anyway, there's no harm in setting
PostgreSQL to listen all available addresses.
 Note that I've changed default PostgreSQL port (5432) to 5433. The reason for me to do so
is that I want to use 5432 for pgpool-II instance so that all outside world clients can connect
to pgpool-II by using this well known port. Of course if you don't want to set ports in such
way - you don't have to.
Add/change the following entries in pg_hba.conf file:
host replication replication [Link]/32 md5
host replication replication [Link]/32 md5
host all postgres [Link]/16 md5
Notes:
 The first two lines are allowing replication user to access the database from the IP address
specified (you should change actual IP addresses appropriately). Basically not both
pg_hba.conf files (on both servers) have to contain both lines. File on the primary server
can contain only the second line, while the file on the standby server can contain only the
first line, but again there's no harm in having the same file with both lines on both servers.
 The third line is not needed for establishing replication, but I've added it so that I can access
the server with postgres account from my local network, to be able to administer it remotely.
You can skip this line if you want. Of course, if you'll keep this line then change IP network
appropriately.

2.3.1. Configuring Primary Server

[Link]. Replication Slots


This is the point where the tough part starts, and where we must give up on many resources,
including the most important one - PostgreSQL wiki. The reason for this is that we are choosing to
take slightly different and better direction - we'll use so-called replication slots. This feature is
introduced in PostgreSQL 9.4, and it is intended for logical replication (not to be explained here),
but it also can be used with streaming replication we are planning to implement. You can read more
about the technology in [Link]
replication-slots/ and [Link] but you don't have to - I've
already did, and I'll present here the essence.
First let introduce the new technology as short as possible. To do that I first need to shortly explain
how "log-shipping" replication works: basically it transfers transaction log (WAL files) from
primary to standby server, and standby uses these WAL files to reconstruct database state. In this
type of replication standby is basically in constant recovery state (constantly recovering itself by
reading new WAL files). Every once and while primary server frees its pg_xlog by deleting old
WAL files. The problem with such replication arises when standby server gets too far behind the
primary server (for example after long period being down). When standby tries to catch-up again -
it can't get WAL files because they are deleted, meaning that replication would fail. Prior to
replication slots the problem was solved by one of two means (or both combined as, for example in
PostgreSQL wiki):
 By defining minimal amount of WAL files kept (wal_keep_segments parameter in
[Link]). Basically we were able to set this parameter to be high enough so that
primary server keeps WAL files long enough for standby to catch-up.
 Instead of deleting WAL files - to archive and store them in a place where standby can
access them (archive_mode and archive_command parameters in [Link]).
Replication slots are introducing new approach: they are basically allowing primary server to be
aware of each standby and its replication state, and to keep WAL files as long as needed - no more,
no less. With this technology primary server will retain WAL files basically forever waiting for
standby to pick them up (unless itself goes down due to pg_xlog overflow). It means that we can
simply turn on standby server weeks after it went down, and it will catch-up without any additional
intervention on our side. On the other hand it also means that if we gave up on some standby for
good, we have to tell that to primary server; otherwise it will go down sooner or later. For this
reason replication slots are not created nor deleted automatically. We have to create the slot before
connecting standby, and we also have to delete the slot after giving up on particular standby.
Replication slots are making our life easier when it comes to recovery after longer delays (no need
for manual resynchronizing), but they are also taking away one interesting feature that were
available with WAL archiving - so-called moment in time recovery. With WAL archiving we were
able not only to restore new standby to current state of primary server, but also to restore it in a state
of the database in any moment before (for example before you've accidentally deleted some table).
With replication slots it is not possible; standby has to be restored in the current state of primary
server.
In my case I'll go with replication slots, but still I will also provide instructions for those who decide
to go with WAL archiving.
If you'll go with replication slots as I will, you need to create replication slot. To do that, on primary
server execute the following command:
postgresql# SELECT * FROM pg_create_physical_replication_slot('it_rdbms02');
I've named the slot it_rdbms02 (obviously to correspond to hostname of my standby server),
but you can name it as you want. Also on primary server you need to additionally change
[Link] as follows:
wal_level = hot_standby
max_replication_slots = 3
max_wal_senders = 3

Notes:
 The first line tells the primary server that replicas will be working in hot_standby mode,
meaning that it should send data without delay (streaming replication).
 In the second line I've set maximal number of replication slots to 3 although I will use only
one for now.
 The third line defines maximal number of concurrent connections from standby servers.

Finally start (or restart) PostgreSQL.

[Link]. WAL Archiving


This section is only for those who decided to go with WAL archiving! You need to change
[Link] as described in PostgreSQL wiki:
wal_level = hot_standby
max_wal_senders = 3
wal_keep_segments = 32
archive_mode = on
archive_command = 'cp %p /path_to/archive/%f'

Start (or restart) PostgreSQL.


2.3.2. Configuring Standby Server
The first and very important step is to stop PostgreSQL server.
Next thing to do is to delete everything from PostgreSQL data directory, including any tablespace
directories. PostgreSQL wiki explains how you can do this, but instead I prefer shortcut way - to
delete data directory itself. After that you will execute pg_basebackup command in order to get
initial state from primary server. Everything mentioned will be accomplished by executing the
following commands as postgres user:
#sudo -i -u postgres
$cd /var/lib/postgresql/9.5
$rm -rf main
$pg_basebackup -v -D main -R -P -h [Link] -p 5433 -U replication
$logout

Explanation:
 The first command enters postgres user impersonation session;
 The second navigates to data directory's parent folder;
 The third deletes data directory;
 The fourth initiates importing data from primary server to newly created data directory
main;
 The last exits postgres user impersonation session.

Add / change the following line in [Link] file:


hot_standby = on
hot_standby_feedback = on

Explanations:
 The first line tells standby server that it will be used for read-only queries (load balancing).
 The third line prevents "pruning of tuples whose removal would cause replication conflicts"
whatever it means. At the moment I'm not sure if it relates only to replication slots or can be
used with WAL archives also, but I suggest setting it to 'on' in either case.

[Link]. [Link] in Replication Socket


This section is only for those who decided to go with replication socket technology. You need to
create / change [Link] file from data directory so that it contains the following:
standby_mode = 'on'
primary_slot_name = 'it_rdbms02'
primary_conninfo = 'host=[Link] port=5433 user=replication
password=reppassword'
trigger_file = '/etc/postgresql/9.5/main/im_the_master'

Explanations:
 The first line specifies that the server should be started as a standby;
 The second line tells the server that replication sockets will be used, and the socket name it
should use (must be the same as defined above while creating the socket);
 The third line represents a connection string which is used by the standby server to connect
with the primary (change IP address, port and password appropriately);
 The fourth line specifies a trigger file (mentioned in the previous part) whose presence
should cause streaming replication to end - meaning failover. You can define any path and
name for a trigger file. I've selected configuration directory for location (since it is the first
place an administrator usually checks), and descriptive name im_the_master. Since
actual primary server (as defined above) does not contain [Link] file, it does
not have to contain trigger file neither in order to be the primary server. But for consistency I
suggest you to always have this file on the primary server - this way its role is obvious at the
first glance. Of course, you should not allow more than one server from the same cluster to
have this file.
Often mistake with [Link] is that it is placed in the wrong place - into configuration
directory, together with [Link]. Don't do that! Place it in data directory instead.
Start standby server.

[Link]. [Link] in WAL Archiving


This section is only for those who decided to go with WAL archiving! [Link] file in
this case is similar to one used in replication socket scenario, with few changes:
standby_mode = 'on'
primary_conninfo = 'host=[Link] port=5433 user=replication
password=reppassword'
trigger_file = '/etc/postgresql/9.5/main/im_the_master'
restore_command = 'cp /path_to/archive/%f "%p"'

Note that restore_command must point to the same location as archive_command defined
in [Link] on primary server above.
Start standby server.

2.4. Testing Replication


Before actually test the replication you can first check postgresql service status. If the replication is
running you should get the following output on Ubuntu (probably similar on other OS):
# Primary server:
service postgresql status
9.5/main (port 5433): online

# Standby server:
service postgresql status
9.5/main (port 5433): online,recovery
As already mentioned, in replication the standby server is always in recovery state.
The next test is obvious, and actually proves that the replication works. First we'll create temporary
database on the primary server:
#sudo -u postgres psql
postgres=# CREATE DATABASE replicationtest;
CREATE DATABASE
postgres=# \l
(The third line is not a command you should emit, but response gotten from executing the command
from the second line.) The last command (\l) lists existing databases, so you'll get:
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
---------------+---------+----------+-----------+-----------+-----------------------
postgres | postgres| UTF8 |en_US.UTF-8|en_US.UTF-8|
replicationtest| postgres| UTF8 |en_US.UTF-8|en_US.UTF-8|
template0 | postgres| UTF8 |en_US.UTF-8|en_US.UTF-8| =c/postgres
| | | | | postgres=CTc/postgres
template1 | postgres| UTF8 |en_US.UTF-8|en_US.UTF-8| =c/postgres
| | | | | postgres=CTc/postgres
(4 rows)
You can close this list by pressing q key. Now you should get the same list of databases on the
standby server, by executing:
sudo -u postgres psql
postgres=# \l
You can also try to delete the newly created database on the standby server by executing:
postgres=# DROP DATABASE replicationtest;
ERROR: cannot execute DROP DATABASE in a read-only transaction
Obviously we cannot delete the database on the standby server, and this is OK. Let's try the same
command on the primary server:
postgres=# DROP DATABASE replicationtest;
DROP DATABASE
On the primary server deletion obviously succeeded. You can recheck database list on both servers
to confirm that test database is absent.
In the rest of this page we'll deal with some failover / recovery scenarios, but without pgpool-II
(pretending that the replication itself was our final objective). It is useful for you to understand how
failover / recovery works from the replications point of view, although in the next part of tutorial
we'll introduce pgpool-II and deal with failover / recovery in a different way, through pgpool-II.
The rest of this page covers only replication socket scenario. If you've used WAL archiving instead,
things are probably similar, but you should recheck this with another resource.

2.5. How to Recover Standby Server?


Depending on what happened with the standby server after its failure there are two scenarios:
1. If standby server is repaired without loosing its content, meaning that old data is still here,
you can simply connect repaired standby server and turn it on. It will synchronize
automatically after some time.
2. If the old standby server is lost, and a new, freshly installed one will take its place, the first
thing you need to do at primary server is to delete replication socket which was used by
the old standby server, and create a new socket for the new standby server. Then you
need to configure new standby server in the same way you've configured the old one, by
following the same exact steps described here.

2.6. How to Recover Primary Server?


Well, sad news is that you cannot do this. Primary server cannot be recovered.
I'm eager to see your face right now, while yellowing, "But what a hack we were doing so far
then???" You probably think that I'm fooling you, but no, it is true: primary server cannot be
recovered. But it is also true that we wouldn't ever want to recover it anyway. The trick is that
question: "How to recover primary server?" is wrong question. The right one would be: "What to do
when primary server fails?". So let's start again, this time using the right question:

2.6. What to Do when Primary Server Fails?


The first thing to do when primary server fails (if it is not already done by some tool as pgpool-II
is), is to promote standby server to primary role.
So there's the catch: instead of recovering primary server you actually promote survived standby
server to primary server role, and later you'll actually recover a standby server.
You can easily make standby server take over primary role - simply by creating the trigger file.
But you should be aware that there is more to do sooner or later, and I'll refer to these other steps
that need to be done as full promotion.

2.6.1. Failover vs Full Promotion


When the failover is performed (by creating the trigger file), the failover server starts to behave as
primary, but it is still not full primary server; it's more like TDY primary server. Let me explain.
The new server will become writable, and the cluster will behave normally looking from the outside
world. But the new server is not capable of accepting standby servers (existing or new ones). It
means that all other standby servers (if any) won't be used in any way, as if they are failed together
with the old primary server. In order to join them back (as well as any new standby server), we need
to fully promote TDY primary server. Full promotion basically assumes:
 Delete [Link] file;
 Change [Link] file appropriately for the new role (as described in this
page) and restart PostgreSQL service;
 Create replication slots for standby server(s) (as described in this page).
Long story in short - full promotion must take place sooner or later, and we can choose when to
perform it. As usual, there are some pros and cons:
 Failover itself does not require postgresql service restart, meaning that no additional
downtime is introduced. On the other side, full promotion does require postgresql service
restart which will cause minimal additional interrupt (probably no more than one second). In
my case this up-to-one-second interrupt is acceptable, but someone else can decide that it is
better to wait few hours and do this during low load period (night time for example).
Nevertheless, don't forget that we've already had a few seconds downtime - period between
old primary server failure and failover procedure. Even if it is done automatically by pgpool-
II - it is not instantly (some time is needed for pgpool-II to decide that failover should take
place).
 On the other side failover without full promotion has huge disadvantage: as long as you are
running without full promotion and at least one standby server - your system is in so-called
degraded state, meaning that there's no more alternatives - if the failover server fails - you'll
end up with data loss.
To conclude: my decision is to perform full promotion immediately, and to join at least one standby
as soon as possible.
There's another thing you should be aware of: even when the primary server is fully promoted, you
should be cautious with joining standby servers, if you plan to join more than one. Don't forget that
each standby server will actually be restored from scratch (all it's previous data will be deleted). It
means that if you have a lot of data synchronization can take a while, and put some load on your
network infrastructure. I'm not sure if this is handled internally by postgresql in some smart
prioritizing way, but if not you can get performance degradation due to network overload. For this
reason I suggest joining only one standby immediately, and all others later, during low load time,
one by one, waiting for current to fully synchronize before starting the next one.

2.6.2. But I Insist to Keep the Same Server as Primary!


If for any reason you really want to have the same server as primary again after it is repaired (for
example if it has slightly better hardware), you can achieve this. But first you need to create it as
standby and give it some time to synchronize all the data. After that you can promote it again to
primary server by intentionally killing TDY primary server. Then you'll have to repeat all the steps
to join killed server as standby again.

Where to Go Next?
Well, it depends on your success so far. If your replication does not work, start again from the top of
this page, or even from the start of this tutorial. But if your replication works as expected - you're
lucky! Then the next step for us hard working people is BEER TIME!!!
Tomorrow we'll continue with PostgreSQL HA with pgpool-II - Part 3 where we'll automate the
procedure explained here.

Part 3
As mentioned, this part will deal with automating the procedure for creating replication described in
PostgreSQL HA with pgpool-II - Part 2. Everything we'll build here will be very useful later when
we'll implement pgpool-II.
You should be aware that content of this page is not based on the official documentation; instead it
is nothing more than my way to accomplish the task in the most efficient way. I suggest you to
follow, but it's your choice.

Warning
Please don't blindly copy/paste scripts from this page! The scripts are based on the procedure
explained in PostgreSQL HA with pgpool-II - Part 2, and I do allow you to use them, but without
warranty of any kind.

Ubuntu Only
The scripts presented here are created for and tested on Ubuntu. If you are using some other OS you
should adjust the scripts appropriately (i.e. check file paths, managing postgresql service, etc.)

Replication Slots Only


The scripts are created for replication slots scenario. If you are using WAL archiving or some other
method you'll need to adjust the scripts appropriately.
3.1. Objective
In order to be perfectly clear about what I'm trying to accomplish here I'll define my objective as:
To prepare servers, configurations and script files so that replication configuration tasks (installing
and configuring primary and standby server, promoting standby server to primary role, etc.) can be
performed efficiently and easily.

3.2. Preparation
Besides the obvious preparation step - installing PostgreSQL 9.5 package, there are few more things
we can do on any server, no matter the replication role it will have later.

3.2.1. Enabling Passwordless SSH for postgres User


There are few cases when postgres UNIX user on one host should be able to execute some
command on another host through SSH. For this reason, after installing PostgreSQL package, we
need to enable this. In case that you don't know how to accomplish this I've created another post
that will help you: [Link]

3.2.2. PostgreSQL Configuration Files


Although it is true that PostgreSQL configuration files are different for primary and standby server,
we can do the following:
 Create pg_hba.conf file as described in PostgreSQL HA with pgpool-II - Part 2, since it
is the same in both cases (for primary and standby server);
 Create two versions (templates) of [Link] file (one for primary, and the other
for standby role), and store them both on the target server. This way when the actual role of
the server is determined (changed) we can simply copy the appropriate file.
So let's create repltemplates directory where template files will be stored. I'll place these
directories in PostgreSQL configuration directory. Next we'll copy configuration file templates
there, so that we get the following file structure:
 /etc/postgresql/9.5/main/repltemplates (directory)
 [Link] - Created by following the procedure for primary
server described in PostgreSQL HA with pgpool-II - Part 2.
 [Link] - Created by following the procedure for standby
server described in PostgreSQL HA with pgpool-II - Part 2.
Finally we'll ensure that postgres user owns all these files/directories:
#chown postgres:postgres /etc/postgresql/9.5/main/pg_hba.conf
#chown postgres:postgres -R /etc/postgresql/9.5/main/repltemplates

3.3. Introducing Some Conventions


In order to have an easy way to always determine (either from code or by a glance) if the particular
server is primary or standby we'll introduce the following convention:
 Primary server must contain the following trigger file:
/etc/postgresql/9.5/main/im_the_master
 Standby server must contain the following standby file:
/etc/postgresql/9.5/main/im_slave
 No server can contain both trigger and standby file at the same time.
3.4. Automation Scripts
All the scripts shown here are available for download as an attachments to this page.
Before starting with the scripts I need to mention that within the script I'm often using Ubuntu-
embedded service command to start / stop / restart postgresql service. On the other side you may
notice that other documentation mostly uses pg_ctl command for this purpose. In my case there's
no difference between the two. You can learn more about the differences in my other post:
Managing PostgreSQL Process on Ubuntu - service, pg_ctl and pg_ctlcluster
([Link]
+service%2C+pg_ctl+and+pg_ctlcluster).

Finally we can prepare some scripts that will make our life easier latter. I'll place these scripts in
/etc/postgresql/9.5/main/replscripts directory. But I need to remind you: DON'T
PANIC! Although the scripts are rather long, the biggest part is usually some boilerplate code not
related to replication we are dealing with. For example, every script starts with giant while loop
which does nothing more than gathering input arguments provided. Parts of the script that are
related to replication will always be additionally explained.

3.4.1. disable_postgresql.sh
It's already mentioned that we should not allow presence of multiple primary servers in the same
cluster at the same time. For this reason I'll create the script that will disable PostgreSQL and
prevent it to run either as primary or standby. Besides the mentioned while loop, the script is
simple - just go through comments and you'll understand what it is doing.
/etc/postgresql/9.5/main/replscripts/disable_postgresql.sh

#!/bin/sh
# By Fat Dragon, 05/24/2016
# Stopping and disabling postgresql service if running
# NOTE: The script should be executed as postgres user

echo "disable_postgresql - Start"

# Defining default values


trigger_file="/etc/postgresql/9.5/main/im_the_master"
standby_file="/etc/postgresql/9.5/main/im_slave"

while test $# -gt 0; do

case "$1" in

-h|--help)

echo "Disables PostgreSQL"


echo " "
echo "disable_postgresql [options]"
echo " "
echo "options:"
echo "-h, --help show brief help"
echo "-t, --trigger_file=FILE specify trigger file
path"
echo " Optional, default:
/etc/postgresql/9.5/main/im_the_master"
echo "-s, --standby_file=FILE specify standby file
path"
echo " Optional, default:
/etc/postgresql/9.5/main/im_slave"
echo " "
echo "Error Codes:"
echo " 1 - Wrong user. The script has to be executed
as 'postgres' user."
echo " 2 - Argument error. Caused either by bad format
of provided flags and"
echo " arguments or if a mandatory argument is
missing."
exit 0
;;

-t)

shift

if test $# -gt 0; then

trigger_file=$1

else

echo "ERROR: -t flag requires trigger file to be


specified."
exit 2

fi

shift
;;

--trigger-file=*)

trigger_file=`echo $1 | sed -e 's/^[^=]*=//g'`

shift
;;

-s)

shift

if test $# -gt 0; then


standby_file=$1

else

echo "ERROR: -s flag requires standby file to be


specified."
exit 2

fi

shift
;;

--standby-file=*)

standby_file=`echo $1 | sed -e 's/^[^=]*=//g'`

shift
;;

*)

echo "ERROR: Unrecognized option $1"


exit 2
;;

esac

done

# Ensuring that 'postgres' runs the script


if [ "$(id -u)" -ne "$(id -u postgres)" ]; then

echo "ERROR: The script must be executed as 'postgres' user."


exit 1

fi

echo "INFO: Stopping postgresql service..."


service postgresql stop

# Moving [Link] file in order to prevent service to be


started
if [ -f /etc/postgresql/9.5/main/[Link] ]; then

if [ -f /etc/postgresql/9.5/main/[Link] ];
then
rm /etc/postgresql/9.5/main/[Link]
fi

echo "INFO: Renaming [Link] file to prevent future


service start."
mv /etc/postgresql/9.5/main/[Link]
/etc/postgresql/9.5/main/[Link]

fi

# Deleting [Link] file


echo "INFO: Checking if [Link] file exists..."
if [ -f /var/lib/postgresql/9.5/main/[Link] ]; then

echo "INFO: [Link] file found. Deleting..."

rm /etc/postgresql/9.5/main/[Link]
fi

# Deleting trigger file


echo "INFO: Checking if trigger file exists..."
if [ -f $trigger_file ]; then

echo "INFO: Trigger file found. Deleting..."


rm $trigger_file

fi

# Deleting standby file


echo "INFO: Checking if standby file exists..."
if [ -f $standby_file ]; then

echo "INFO: Standby file found. Deleting..."


rm $standby_file

fi

# Deleting primary info file


echo "INFO: Checking if primary info file exists..."
if [ -f /var/lib/postgresql/9.5/main/primary_info ]; then

echo "INFO: primary_info file found. Deleting..."


rm /var/lib/postgresql/9.5/main/primary_info

fi

echo "disable_postgresql - Done!"


exit 0

3.4.2. [Link]
Script [Link] will promote a standby server to primary server role. What the script actually
does (related to the replication) is:
 Checks if trigger / standby files are present. In general the script will refuse to run if trigger
file is missing or standby file is present, but this behavior can be changed by specifying flag
-f. If -f is specified then the script will create new trigger file if it is missing, and will
delete standby file if present.
 If -d flag (representing previous primary server that should be disabled) is specified
(followed by hostname), the script will try to execute disable_postgresql.sh script
at previous primary server through SSH.
 Removes [Link] file if present, since it is not needed on primary server.
 Checks if [Link] file should be changed and changes it (by copying from the
prepared template) if needed, and restarts postgresql service.
 Ensures that replication role exists, with the appropriate password. Replication user and its
password can be set by using -u and -p flags, respectively. Note that here we are defining
the password, not checking against existing.
 Finally it writes primary info file
(/var/lib/postgresql/9.5/main/primary_info). This file will be used later
by recovery_1st_stage.sh script (explained in PostgreSQL HA with pgpool-II - Part 5) for
performing recovery of a standby server.
/etc/postgresql/9.5/main/replscripts/[Link]

#!/bin/sh
# By Fat Dragon, 05/24/2016
# Promoting standby to primary node
# NOTE: The script should be executed as postgres user

echo "promote - Start"

# Defining default values


trigger_file="/etc/postgresql/9.5/main/im_the_master"
standby_file="/etc/postgresql/9.5/main/im_slave"
demote_host=""
replication_user="replication"
replication_password=""
force=false

debug=true

while test $# -gt 0; do

case "$1" in

-h|--help)

echo "Promotes a standby server to primary role"


echo " "
echo "promote [options]"
echo " "
echo "options:"
echo "-h, --help show brief help"
echo "-t, --trigger_file=FILE specify trigger file
path"
echo " Optional, default:
/etc/postgresql/9.5/main/im_the_master"
echo "-s, --standby_file=FILE specify standby file
path"
echo " Optional, default:
/etc/postgresql/9.5/main/im_slave"
echo "-d, --demote=HOST specify old primary to
demote"
echo " Optional, if not specified no demotion will
be performed."
echo "-u, --user specify replication
role"
echo " Optional, default: replication"
echo "-p, --password=PASSWORD specify password for
--user (mandatory)"
echo "-f, --force Forces promotion
regardless of existence"
echo " of trigger / standby
files."
echo " Optional, default: N/A"
echo " Description: Without this flag the
script will require"
echo " presence of trigger file."
echo " With the flag set the
script will create"
echo " trigger file as needed."
echo " "
echo "Error Codes:"
echo " 1 - Wrong user. The script has to be executed
as 'postgres' user."
echo " 2 - Argument error. Caused either by bad format
of provided flags and"
echo " arguments or if a mandatory argument is
missing."
echo " 3 - Inapropriate trigger / standby files. See
-f flag for details."
echo " 4 - Error creating/deleting/copying
configuration files"
echo " ([Link] and [Link])."
echo " Hint: ensure that templates exist and check
permissions."
echo " 5 - Error creating / altering
replication_user."
exit 0
;;

-t)

shift

if test $# -gt 0; then

trigger_file=$1
else

echo "ERROR: -t flag requires trigger file to be


specified."
exit 2

fi

shift
;;

--trigger-file=*)

trigger_file=`echo $1 | sed -e 's/^[^=]*=//g'`


shift
;;

-s)

shift

if test $# -gt 0; then

standby_file=$1

else

echo "ERROR: -s flag requires standby file to be


specified."
exit 2

fi
shift
;;

--standby-file=*)

standby_file=`echo $1 | sed -e 's/^[^=]*=//g'`

shift
;;

-d)

shift

if test $# -gt 0; then

demote_host=$1

else

echo "ERROR: -d flag requires host that will be


demoted to be specified."
exit 2

fi

shift
;;
--demote-host=*)

demote_host=`echo $1 | sed -e 's/^[^=]*=//g'`

shift
;;

-u)

shift

if test $# -gt 0; then

replication_user=$1

else

echo "ERROR: -u flag requires replication user to


be specified."
exit 2

fi

shift
;;
--user=*)

replication_user=`echo $1 | sed -e 's/^[^=]*=//g'`

shift
;;
-p)

shift

if test $# -gt 0; then

replication_password=$1

else

echo "ERROR: -p flag requires replication password


to be specified."
exit 2

fi
shift
;;

--password=*)

replication_password=`echo $1 | sed -e 's/^[^=]*=//g'`

shift
;;

-f|--force)

force=true
shift
;;

*)

echo "ERROR: Unrecognized option $1"


exit 2
;;

esac

done

# Ensuring that 'postgres' runs the script


if [ "$(id -u)" -ne "$(id -u postgres)" ]; then

echo "ERROR: The script must be executed as 'postgres' user."


exit 1

fi

if [ "$replication_password" = "" ]; then

echo "ERROR: --password is mandatory. For help execute 'promote


-h'"
exit 2

fi

if $debug; then

echo "DEBUG: The script will be executed with the following


arguments:"
echo "DEBUG: --trigger-file=$trigger_file"
echo "DEBUG: --standby_file=$standby_file"
echo "DEBUG: --demote-host=$demote_host"
echo "DEBUG: --user=$replication_user"
echo "DEBUG: --password=$replication_password"
if $force; then
echo "DEBUG: --force"
fi

fi

echo "INFO: Checking if standby file exists..."


if [ -e $standby_file ]; then

if $force; then

echo "INFO: Standby file found. Deleting..."


rm $standby_file

else

echo "ERROR: Cannot promote server that contains standby


file: ${standby_file}"
exit 3

fi

fi

echo "INFO: Checking if trigger file exists..."


if [ ! -e $trigger_file ]; then

if $force; then

echo "INFO: Trigger file not found. Creating a new one..."


echo "Promoted at: $(date)" >> $trigger_file

else

echo "ERROR: Cannot promote server that does not contain


trigger file: ${trigger_file}"
exit 3

fi

fi

success=false

# Disabling postgresql on demote host (if specified):


if [ "$demote_host" != "" ]; then

echo "INFO: Trying to disable postgresql at ${demote_host}..."


ssh -T postgres@$demote_host
/etc/postgresql/9.5/main/replscripts/disable_postgresql.sh -t
$trigger_file -s $standby_file && success=true

if ! $success ; then
echo "WARNING: Failed to execute 'disable_postgresql.sh' at
demoted host."
fi

fi

if [ -e /var/lib/postgresql/9.5/main/[Link] ]; then

echo "INFO: Deleting [Link] file..."

success=false
rm /var/lib/postgresql/9.5/main/[Link] && success=true

if ! $success ; then

echo "ERROR: Failed to delete


'/var/lib/postgresql/9.5/main/[Link]' file."
exit 4

fi

fi

echo "INFO: Checking if [Link] file exists..."


if [ -e /etc/postgresql/9.5/main/[Link] ]; then

echo "INFO: [Link] file found. Checking if it is for


primary server..."
if diff /etc/postgresql/9.5/main/[Link]
/etc/postgresql/9.5/main/repltemplates/[Link]
>/dev/null ; then

echo "INFO: [Link] file corresponds to primary


server file. Nothing to do."

else

echo "INFO: [Link] file does not correspond to


primary server file. Deleting..."

success=false
rm /etc/postgresql/9.5/main/[Link] && success=true

if ! $success ; then

echo "ERROR: Failed to delete


'/etc/postgresql/9.5/main/[Link]' file."
exit 4

fi

echo "INFO: Copying new [Link] file..."


success=false
cp
/etc/postgresql/9.5/main/repltemplates/[Link]
/etc/postgresql/9.5/main/[Link] && success=true

if ! $success ; then

echo "ERROR: Failed to copy new [Link] file."


exit 4

fi

if service postgresql status ; then

echo "INFO: Restarting postgresql service..."


service postgresql restart

fi

fi

else

echo "INFO: [Link] file not found. Copying new one..."

success=false
cp
/etc/postgresql/9.5/main/repltemplates/[Link]
/etc/postgresql/9.5/main/[Link] && success=true

if ! $success ; then

echo "ERROR: Failed to copy new [Link] file."


exit 4

fi

if service postgresql status ; then

echo "INFO: Restarting postgresql service..."


service postgresql restart

fi

fi

if service postgresql status ; then

echo "INFO: postgresql already running."

else
echo "INFO: Starting postgresql service..."
service postgresql start

fi

echo "INFO: Ensuring replication role and password..."

success=false
rolecount=$(psql -Atc "SELECT count (*) FROM pg_roles WHERE
rolname='${replication_user}';") && success=true

if ! $success ; then

echo "ERROR: Failed to check existence of '${replication_user}'


role."
exit 5

fi

if [ "$rolecount" = "0" ]; then

echo "INFO: Replication role not found. Creating..."

success=false
psql -c "CREATE ROLE ${replication_user} WITH REPLICATION
PASSWORD '${replication_password}' LOGIN;" && success=true

if ! $success ; then

echo "ERROR: Failed to create '${replication_user}' role."


exit 5

fi

else

echo "INFO: Replication role found. Ensuring password..."

success=false
psql -c "ALTER ROLE ${replication_user} WITH REPLICATION
PASSWORD '${replication_password}' LOGIN;" && success=true

if ! $success ; then

echo "ERROR: Failed to set password for '$


{replication_user}' role."
exit 5

fi

fi
echo "INFO: Creating primary info file..."
if [ -e /var/lib/postgresql/9.5/main/primary_info ]; then
rm /var/lib/postgresql/9.5/main/primary_info
fi

echo "REPL_USER=${replication_user}\nREPL_PASSWORD=$
{replication_password}\nTRIGGER_FILE=$
{trigger_file}\nSTANDBY_FILE=${standby_file}\n" >>
/var/lib/postgresql/9.5/main/primary_info

chown postgres:postgres /var/lib/postgresql/9.5/main/primary_info


chmod 0600 /var/lib/postgresql/9.5/main/primary_info

echo "promote - Done!"


exit 0

3.4.3. create_slot.sh
This script will (re)create replication slot with specified name. Again, if you ignore a boilerplate
code, the actual script is short and simple:
 Checks if trigger file exists, and refuses to run if not (replication slot can be created only on
the primary server).
 Checks if the slot exists, and (re)creates it as needed. If -r flag is specified, the script will
first delete the slot (if exist) and create the new one. If the flag is not specified, the script
won't do anything if the slot with the specified name already exists.
This script is called from the next one (initiate_replication.sh), so you can check there for usage
example.
/etc/postgresql/9.5/main/replscripts/create_slot.sh

#!/bin/sh
# By Fat Dragon, 05/24/2016
# (Re)creates replication slot.
# NOTE: The script should be executed as postgres user

echo "create_slot - Start"

# Defining default values


trigger_file="/etc/postgresql/9.5/main/im_the_master"
slot_name=""
recreate=false

debug=true

while test $# -gt 0; do

case "$1" in

-h|--help)

echo "Creates replication slot"


echo " "
echo "create_slot [options]"
echo " "
echo "options:"
echo "-h, --help show brief help"
echo "-t, --trigger_file=FILE specify trigger file
path"
echo " Optional, default:
/etc/postgresql/9.5/main/im_the_master"
echo "-n, --name=NAME slot name (mandatory)"
echo " Slot name can be also
specified without using"
echo " flags (i.e.
'create_slot myslot')"
echo "-r, --recreate Forces re-creation if
the slot already exists"
echo " Optional, default: N/A"
echo " Description: Without this flag the
script won't do anything if"
echo " the slot with defined name
already exists."
echo " With the flag set, if the
slot with defined name"
echo " already exists it will be
deleted and re-created."
echo " "
echo "Error Codes:"
echo " 1 - Wrong user. The script has to be executed
as 'postgres' user."
echo " 2 - Argument error. Caused either by bad format
of provided flags and"
echo " arguments or if a mandatory argument is
missing."
echo " 3 - Inapropriate trigger / standby files. This
script REQUIRES trigger"
echo " file to be present."
echo " 4 - Error executing a slot-related operation
(query/create/drop)."
exit 0
;;

-t)

shift

if test $# -gt 0; then

trigger_file=$1

else

echo "ERROR: -t flag requires trigger file to be


specified."
exit 2

fi

shift
;;

--trigger-file=*)

trigger_file=`echo $1 | sed -e 's/^[^=]*=//g'`

shift
;;

-n)

if [ "$slot_name" != "" ]; then

echo "ERROR: Invalid command. For help execute


'create_slot -h'"
exit 2

fi

shift

if test $# -gt 0; then

slot_name=$1

else

echo "ERROR: -n flag requires slot name to be


specified."
exit 2

fi

shift
;;

--name=*)

if [ "$slot_name" != "" ]; then

echo "ERROR: Invalid command. For help execute


'create_slot -h'"
exit 2

fi
slot_name=`echo $1 | sed -e 's/^[^=]*=//g'`

shift
;;

-r|--recreate)

recreate=true

shift
;;

*)

if [ "$slot_name" != "" ]; then

echo "ERROR: Invalid command. For help execute


'create_slot -h'"
exit 2

fi

slot_name=$1

shift
;;

esac

done

# Ensuring that 'postgres' runs the script


if [ "$(id -u)" -ne "$(id -u postgres)" ]; then

echo "ERROR: The script must be executed as 'postgres' user."


exit 1

fi

if [ "$slot_name" = "" ]; then

echo "ERROR: Slot name is mandatory. For help execute


'create_slot -h'"
exit 2

fi

if $debug; then

echo "DEBUG: The script will be executed with the following


arguments:"
echo "DEBUG: --trigger-file=${trigger_file}"
echo "DEBUG: --name=${slot_name}"

if $recreate; then
echo "DEBUG: --recreate"
fi

fi

echo "Checking if trigger file exists..."


if [ ! -e $trigger_file ]; then

echo "ERROR: Cannot create replication slot if the server does


not contain trigger file: ${trigger_file}"
exit 3

fi

success=false

echo "INFO: Checking if slot '${slot_name}' exists..."


slotcount=$(psql -Atc "SELECT count (*) FROM pg_replication_slots
WHERE slot_name='${slot_name}';") && success=true

if ! $success ; then

echo "ERROR: Cannot check for '${slot_name}' slot existence."


exit 4

fi

if [ "$slotcount" = "0" ]; then

echo "INFO: Slot not found. Creating..."

success=false
psql -c "SELECT pg_create_physical_replication_slot('$
{slot_name}');" && success=true

if ! $success ; then

echo "ERROR: Cannot create '${slot_name}' slot."


exit 4

fi

elif $recreate ; then

echo "INFO: Slot found. Removing..."

success=false
psql -c "SELECT pg_drop_replication_slot('${slot_name}');" &&
success=true
if ! $success ; then

echo "ERROR: Cannot drop existing '${slot_name}' slot."


exit 4

fi

echo "INFO: Re-creating the slot..."

success=false
psql -c "SELECT pg_create_physical_replication_slot('$
{slot_name}');" && success=true

if ! $success ; then

echo "ERROR: Cannot create '${slot_name}' slot."


exit 4

fi

fi

echo "create_slot - Done!"


exit 0

3.4.4. initiate_replication.sh
The last script we'll create here is the script that initiates replication (initiates standby server).
Again, after ignoring boilerplate code we can say that the script:
 Checks trigger / standby files. Regarding this the script has the same behavior as
[Link] script explained earlier, with only difference that this script demands standby
file and refuses trigger file. Flag -f has the same meaning.
 Ensures that PostgreSQL password file (.pgpass, explained in PostgreSQL HA with pgpool-
II - Part 2) contains replication user / password;
 Tries to recreate replication slot at the specified primary server, and exists if this attempt
fails;
 Stops postgresql service and deletes PostgreSQL data directory;
 Executes pg_basebackup to get initial backup;
 Creates [Link] file and sets its permissions;
 Deletes [Link] file, creates the new one from template, and sets its
permissions;
 Starts postgresql service.
/etc/postgresql/9.5/main/replscripts/initiate_replication.sh

#!/bin/sh
# By Fat Dragon, 05/24/2016
# Promoting standby to primary node.
# NOTE: The script should be executed as postgres user
echo "initiate_replication - Start"

# Defining default values


trigger_file="/etc/postgresql/9.5/main/im_the_master"
standby_file="/etc/postgresql/9.5/main/im_slave"
primary_host=""
primary_port="5432"
slot_name=$(echo "$HOSTNAME" | tr '[:upper:]' '[:lower:]')
slot_name=${slot_name/-/_}
replication_user="replication"
replication_password=""
force=false

debug=true

while test $# -gt 0; do

case "$1" in

-h|--help)

echo "Promotes a standby server to primary role"


echo " "
echo "promote [options]"
echo " "
echo "options:"
echo "-h, --help show brief help"
echo "-t, --trigger_file=FILE specify trigger file
path"
echo " Optional, default:
/etc/postgresql/9.5/main/im_the_master"
echo "-s, --standby_file=FILE specify standby file
path"
echo " Optional, default:
/etc/postgresql/9.5/main/im_slave"
echo "-H, --primary-host=HOST specify primary host
(Mandatory)"
echo "-P, --primary-port=PORT specify primary port"
echo " Optional, default: 5432"
echo "-n, --slot_name=NAME specify slot name"
echo " Optional, defaults to lowercase hostname with
dashes replaced"
echo " by underscores."
echo "-u, --user specify replication
role"
echo " Optional, default: replication"
echo "-p, --password=PASSWORD specify password for
--user"
echo " Optional, default: empty"
echo "-f, --force Forces promotion
regardless to"
echo " trigger / standby
files."
echo " Optional, default: N/A"
echo " Description: Without this flag the
script will require"
echo " presence of standby file."
echo " With the flag set the
script will create"
echo " standby file as needed."
echo " "
echo "Error Codes:"
echo " 1 - Wrong user. The script has to be executed
as 'postgres' user."
echo " 2 - Argument error. Caused either by bad format
of provided flags and"
echo " arguments or if a mandatory argument is
missing."
echo " 3 - Inapropriate trigger / standby files. See
-f flag for details."
echo " 4 - Error creating/deleting/copying
configuration files"
echo " ([Link] and [Link])."
echo " Hint: ensure that templates exist and check
permissions."
echo " 5 - Error in communicating with the primary
server (to create the"
echo " slot or get the initial data)."
echo " 6 - Error deleting old data directory."
exit 0
;;

-t)

shift

if test $# -gt 0; then

trigger_file=$1

else

echo "ERROR: -t flag requires trigger file to be


specified."
exit 2

fi

shift
;;

--trigger-file=*)

trigger_file=`echo $1 | sed -e 's/^[^=]*=//g'`


shift
;;

-s)

shift

if test $# -gt 0; then

standby_file=$1

else

echo "ERROR: -s flag requires standby file to be


specified."
exit 2

fi

shift
;;

--standby-file=*)

standby_file=`echo $1 | sed -e 's/^[^=]*=//g'`

shift
;;

-H)

shift

if test $# -gt 0; then

primary_host=$1

else

echo "ERROR: -H flag requires primary host to be


specified."
exit 2

fi

shift
;;

--primary-host=*)

primary_host=`echo $1 | sed -e 's/^[^=]*=//g'`

shift
;;

-P)

shift

if test $# -gt 0; then

primary_port=$1

else

echo "ERROR: -p flag requires port to be


specified."
exit 2

fi

shift
;;

--primary-port=*)

primary_port=`echo $1 | sed -e 's/^[^=]*=//g'`

shift
;;

-n)

shift

if test $# -gt 0; then

slot_name=$1

else

echo "ERROR: -n flag requires slot name to be


specified."
exit 2

fi

shift
;;

--slot-name=*)

slot_name=`echo $1 | sed -e 's/^[^=]*=//g'`

shift
;;
-u)

shift

if test $# -gt 0; then

replication_user=$1

else

echo "ERROR: -u flag requires replication user to


be specified."
exit 2

fi

shift
;;

--user=*)

replication_user=`echo $1 | sed -e 's/^[^=]*=//g'`

shift
;;

-p)

shift

if test $# -gt 0; then

replication_password=$1

else

echo "ERROR: -p flag requires replication password


to be specified."
exit 2

fi

shift
;;

--password=*)

replication_password=`echo $1 | sed -e 's/^[^=]*=//g'`

shift
;;
-f|--force)

force=true

shift
;;

*)

echo "ERROR: Unrecognized option $1"


exit 2
;;

esac

done

# Ensuring that 'postgres' runs the script


if [ "$(id -u)" -ne "$(id -u postgres)" ]; then

echo "ERROR: The script must be executed as 'postgres' user."


exit 1

fi

if [ "$primary_host" = "" ]; then

echo "ERROR: Primary host is mandatory. For help execute


'initiate_replication -h'"
exit 2

fi

if [ "$replication_password" = "" ]; then

echo "ERROR: --password is mandatory. For help execute


'initiate_replication -h'"
exit 2

fi

if $debug; then

echo "DEBUG: The script will be executed with the following


arguments:"
echo "DEBUG: --trigger-file=$trigger_file"
echo "DEBUG: --standby_file=$standby_file"
echo "DEBUG: --primary-host=$primary_host"
echo "DEBUG: --primary-port=$primary_port"
echo "DEBUG: --slot-name=$slot_name"
echo "DEBUG: --user=$replication_user"
echo "DEBUG: --password=$replication_password"
if $force; then
echo "DEBUG: --force"
fi

fi

echo "INFO: Checking if trigger file exists..."


if [ -e $trigger_file ]; then

if $force; then

echo "INFO: Trigger file found. Deleting..."


rm $trigger_file

else

echo "ERROR: Cannot initiate server as standby while it


contains trigger file: ${trigger_file}"
exit 3

fi

fi

echo "INFO: Checking if standby file exists..."


if [ ! -e $standby_file ]; then

if $force; then

echo "INFO: Standby file not found. Creating new one..."


echo "Initiated at: $(date)" >> $standby_file

else

echo "ERROR: Cannot initiate server as standby if it does


not contain standby file: ${standby_file}"
exit 3

fi

fi

echo "INFO: Ensuring replication user and password in password file


(.pgpass)..."
password_line="*:*:*:${replication_user}:${replication_password}"

if [ ! -f /var/lib/postgresql/.pgpass ]; then

echo $password_line >> /var/lib/postgresql/.pgpass

elif ! grep -q "$password_line" /var/lib/postgresql/.pgpass ; then

sed -i -e '$a\' /var/lib/postgresql/.pgpass


echo $password_line >> /var/lib/postgresql/.pgpass
sed -i -e '$a\' /var/lib/postgresql/.pgpass

fi

chown postgres:postgres /var/lib/postgresql/.pgpass


chmod 0600 /var/lib/postgresql/.pgpass

success=false

echo "INFO: Creating replication slot at the primary server..."


ssh -T postgres@$primary_host
/etc/postgresql/9.5/main/replscripts/create_slot.sh -r $slot_name
&& success=true

if ! $success ; then

echo "ERROR: Creating replication slot at the primary server


failed."
exit 5

fi

service postgresql stop

if [ -d /var/lib/postgresql/9.5/main ]; then

echo "INFO: Deleting old data..."

success=false
rm -rf /var/lib/postgresql/9.5/main && success=true

if ! $success ; then

echo "ERROR: Deleting data directory failed."


exit 6

fi

fi

echo "INFO: Getting the initial backup..."

success=false
pg_basebackup -D /var/lib/postgresql/9.5/main -h $primary_host -p
$primary_port -U $replication_user && success=true

if ! $success; then

echo "ERROR: Initial backup failed."


exit 5

fi
if [ -e /var/lib/postgresql/9.5/main/[Link] ]; then

echo "INFO: Removing old [Link] file..."

success=false
rm /var/lib/postgresql/9.5/main/[Link] && success=true

if ! $success; then

echo "ERROR: Removing old [Link] failed."


exit 4

fi

fi

echo "INFO: Creating [Link] file..."


cat >/var/lib/postgresql/9.5/main/[Link] <<EOL
standby_mode = 'on'
primary_slot_name = '${slot_name}'
primary_conninfo = 'host=${primary_host} port=${primary_port}
user=${replication_user} password=${replication_password}'
trigger_file = '${trigger_file}'
EOL

chown postgres:postgres /var/lib/postgresql/9.5/main/[Link]


chmod 0644 /var/lib/postgresql/9.5/main/[Link]

if [ -e /etc/postgresql/9.5/main/[Link] ]; then

echo "INFO: Removing old [Link] file..."

success=false
rm /etc/postgresql/9.5/main/[Link] && success=true

if ! $success; then

echo "ERROR: Removing old [Link] failed."


exit 4

fi

fi

echo "INFO: Copying new [Link] file..."

success=false
cp
/etc/postgresql/9.5/main/repltemplates/[Link]
/etc/postgresql/9.5/main/[Link] && success=true

if ! $success; then
echo "ERROR: Copying new [Link] failed."
exit 4

fi

chown postgres:postgres /etc/postgresql/9.5/main/[Link]


chmod 0644 /etc/postgresql/9.5/main/[Link]

echo "INFO: Starting postgresql service..."


service postgresql start

echo "initiate_replication - Done!"


exit 0

3.4.5. Permissions
Finally we need to set appropriate permissions for these script files:
chown postgres:postgres -R /etc/postgresql/9.5/main/replscripts
chmod 0744 -R /etc/postgresql/9.5/main/replscripts
3.5. Result
Once everything is prepared as described in this page, the new primary server can be promoted by
executing single line of code at the server that will be the primary (in my case on IT-RDBMS01):
sudo -u postgres /etc/postgresql/9.5/main/replscripts/[Link]
-f -p replicationpassword
Note that the scripts have to be executed as postgres user.
In this call I've mostly relied on default values for arguments. You can see more details about
arguments and their defaults if you execute the script with -h (help) flag.
Similarly you can initiate standby server by executing single line of code at the server that will be
standby (in my case IT-RDBMS02):
sudo -u postgres
/etc/postgresql/9.5/main/replscripts/initiate_replication.sh -f -H
IT-RDBMS01 -P 5433 -p replicationpassword
This time I've had to specify few more arguments (primary host and port), but anyway the
replication is up and running. You can test it in the same way we've did in PostgreSQL HA with
pgpool-II - Part 2.

Where to Go Next?
Continue with PostgreSQL HA with pgpool-II - Part 4 where we'll finally install pgpool-II.

Part 4
This part deals with another painful thing: installing pgpool-II. The ultimate resource for this is
pgpool-II manual. Here I'll extract the essence, and present it in easier step-by-step way. Let's start.

4.1. Installing pgpool-II


For Ubuntu users (like I am), problems start immediately. The official Ubuntu repository is stucked
with pgpool-II version 3.3, and again we want the latest (3.5 at the moment of this writing). Again
we have a good reason to insist on the newest since pgpool-II versions 3.4 and 3.5 have introduced
significant improvements in performances (as you can read here). How to obtain the latest version?
Well, if you are using CentOS - you're lucky - pgpool maintains yum repository. But if you are
Ubuntu guy (as I am), you need to compile from source. I'm not sure why pgpool ignores Debian
family and apt repository, but they do. Luckily the installation from source is not too hard. Pain
comes later, with configuration, so CentOS guys don't celebrate too much. Also, even if you are
using CentOS, read through the next section in order to sync with us, and to have understanding
about files included in the install.

4.1.1. Installing from Source


Let's start with installing packages that will be needed for compilation:
apt-get update
apt-get install libpq-dev make
Besides mentioned packages this will install all dependencies of course. We'll also
need postgresql-9.5-pgpool2 package, but if you followed previous part of the tutorial it is
already installed.
The next thing to do is to download and extract source tarball:
# Download the tarball:
wget [Link]
-O [Link]
# Extract the tarball:
tar -xzf [Link]
# Delete the tarball once extracted:
rm [Link]
# cd to source directory:
cd pgpool-II-3.5.2
After that we'll configure and compile the package. There are more configuration options, but I'll
use only one that defines installation directory, and I'll set it to /usr/share/pgpool2/3.5.2.
If you want you can change the installation directory to some other location (i.e. another good pick
would be /opt/pgpool2/3.5.2). Still in source directory, execute the following:
./configure --prefix=/usr/share/pgpool2/3.5.2

make
make install
This will compile and install binaries (and some other things) in the specified location. Next thing
to do is to create and define configuration directory. Do the following:
 Pick and create configuration directory. To be consistent with PostgreSQL I like my
configuration directory to be /etc/pgpool2/3.5.2, so I'll create it there.
 Move content of /usr/share/pgpool2/3.5.2/etc directory to your new
configuration directory. (Note that /usr/share/pgpool2/3.5.2 used here is actually
installation directory you've selected in compilation process.) Content that will be moved are
actually example configuration files ([Link] and few others). Example move
command (do the same for all other files from the directory):
mv /usr/share/pgpool2/3.5.2/etc/[Link]
/etc/pgpool2/3.5.2/
The files that need to be moved this way are:
 [Link]
 [Link]
 [Link]-master-slave
 [Link]-replication
 [Link]-stream
 pool_hba.[Link]
 Copy, move, or link binary files from your installation directory to /usr/sbin. I prefer
using symbolic links, but you can also move or copy the files. Here's an example for
file pcp_attach_node (you should do the same for all other files):
# Create symbolic link:
ln -s /usr/share/pgpool2/3.5.2/bin/pcp_attach_node
/usr/sbin/pcp_attach_node
# OR move:
mv /usr/share/pgpool2/3.5.2/bin/pcp_attach_node /usr/sbin/
# OR copy:
cp /usr/share/pgpool2/3.5.2b/bin/pcp_attach_node /usr/sbin/
The files that need to be linked / moved / copied this way are:
 pcp_attach_node
 pcp_detach_node
 pcp_node_count
 pcp_node_info
 pcp_pool_status
 pcp_proc_count
 pcp_proc_info
 pcp_promote_node
 pcp_recovery_node
 pcp_stop_pgpool
 pcp_watchdog_info
 pg_md5
 pgpool

Note for CentOS (yum) users


If you are CentOS, and you've installed pgpool-II from yum repository, binary files should already
be in /usr/sbin - please confirm that. Talking about configuration files - find where they are
installed, but I don't recommend you to move them (pgpool service installed is probably configured
for this existing location).

[Link]. Extensions and SQL Scripts


There's one thing I must admit before starting this section: not all scripts that we'll prepare here are
needed in our scenario. You can check original documentation to see when these scripts and
extensions are used. Nevertheless, for the sake of completeness of pgpool-II installation I will cover
all the scripts here, and it won't hurt too much if you also do so.
pgpool-II comes with some PostgreSQL extensions and SQL scripts. You can find these in source
directory (extracted from tarball), in src/sql subdirectory. Original pgpool-II documentation will
instruct you to compile these, but you shouldn't do this - it is already covered by installed
package postgresql-9.5-pgpool2. You only need to confirm that needed binaries
(pgpool_adm, pgpool-recovery, and pgpool-regclass) are already available in
PostgreSQL library (/usr/lib/postgresql/9.5/lib). But extensions and SQL scripts
should be copied anyway.
Extensions should be copied to extension subdirectory of PostgreSQL installation directory
(/usr/share/postgresql/9.5/extension). Each extension consists of two
files: *.control and *.sql, and to copy an extension we need to copy both files. Besides
extensions there are some SQL script files we would also want to copy, but we will copy them to sql
subdirectory of PostgreSQL configuration directory (/etc/postgresql/9.5/main/sql).
Again, both extensions and SQL script files are at the moment in src/sql subdirectory of source
directory (extracted from tarball). So let's start copying:
# Create SQL scripts directory:
mkdir /etc/postgresql/9.5/main/sql
# Navigate to source directory:
cd ~/pgpool-II-3.5.2
# Navigate to src/sql subdirectory:
cd src/sql
# While there let's copy the first script file:
cp insert_lock.sql /etc/postgresql/9.5/main/sql/
# Navigate to pgpool_adm (the first extension) subdirectory:
cd pgpool_adm
# Let's copy pgpool_adm extension:
cp pgpool_adm.control /usr/share/postgresql/9.5/extension/
cp pgpool_adm--[Link] /usr/share/postgresql/9.5/extension/
# While there let's copy SQL script file also (note that I'm
changing extension of the file also):
cp pgpool_adm.[Link] /etc/postgresql/9.5/main/sql/pgpool_adm.sql
# Navigate up in order to select another extension:
cd ..
# Navigate to pgpool-recovery (the next extension) subdirectory:
cd pgpool-recovery
# Let's copy pgpool-recovery extension:
cp pgpool_recovery.control /usr/share/postgresql/9.5/extension/
cp pgpool_recovery--[Link] /usr/share/postgresql/9.5/extension/
# While there let's copy SQL script files also (note that I'm
changing extension of the file also):
cp [Link] /etc/postgresql/9.5/main/sql/pgpool-
[Link]
cp uninstall_pgpool-[Link] /etc/postgresql/9.5/main/sql/
# Navigate up in order to select another extension:
cd ..
# Navigate to pgpool-regclass (the next extension) subdirectory:
cd pgpool-regclass
# Let's copy pgpool-regclass extension:
cp pgpool_regclass.control /usr/share/postgresql/9.5/extension/
cp pgpool_regclass--[Link] /usr/share/postgresql/9.5/extension/
# While there let's copy SQL script files also (note that I'm
changing extension of the file also):
cp [Link] /etc/postgresql/9.5/main/sql/pgpool-
[Link]
cp uninstall_pgpool-[Link] /etc/postgresql/9.5/main/sql/

# Navigate to your home directory and you can delete pgpool source
directory since we've copied everything we need:
cd
rm -r pgpool-II-3.5.2
Now we need to change copied SQL script files. Basically we need to
change MODULE_PATHNAME to the appropriate module path. Also we need to change $libdir with
actual PostgreSQL library directory (/usr/lib/postgresql/9.5/lib).
MODULE_PATHNAME is actually mentioned library path plus actual module name. As an example
here I'll provide [Link] before and after the change:
[Link] - before the change

CREATE OR REPLACE FUNCTION pgpool_recovery(text, text, text, text)


RETURNS bool
AS 'MODULE_PATHNAME', 'pgpool_recovery'
LANGUAGE C STRICT;

CREATE OR REPLACE FUNCTION pgpool_remote_start(text, text)


RETURNS bool
AS 'MODULE_PATHNAME', 'pgpool_remote_start'
LANGUAGE C STRICT;

CREATE OR REPLACE FUNCTION pgpool_pgctl(text, text)


RETURNS bool
AS '$libdir/pgpool-recovery', 'pgpool_pgctl'
LANGUAGE C STRICT;

CREATE OR REPLACE FUNCTION pgpool_switch_xlog(text)


RETURNS text
AS 'MODULE_PATHNAME', 'pgpool_switch_xlog'
LANGUAGE C STRICT;

[Link] - after the change

CREATE OR REPLACE FUNCTION pgpool_recovery(text, text, text, text)


RETURNS bool
AS '/usr/lib/postgresql/9.5/lib/pgpool-recovery',
'pgpool_recovery'
LANGUAGE C STRICT;

CREATE OR REPLACE FUNCTION pgpool_remote_start(text, text)


RETURNS bool
AS '/usr/lib/postgresql/9.5/lib/pgpool-recovery',
'pgpool_remote_start'
LANGUAGE C STRICT;

CREATE OR REPLACE FUNCTION pgpool_pgctl(text, text)


RETURNS bool
AS '/usr/lib/postgresql/9.5/lib/pgpool-recovery', 'pgpool_pgctl'
LANGUAGE C STRICT;

CREATE OR REPLACE FUNCTION pgpool_switch_xlog(text)


RETURNS text
AS '/usr/lib/postgresql/9.5/lib/pgpool-recovery',
'pgpool_switch_xlog'
LANGUAGE C STRICT;

You need to do the same for all other SQL script files (from
/etc/postgresql/9.5/main/sql). As a hint I'll tell you that module name that is used as a
replacement always corresponds to name of the file you are changing (i.e. while changing file
[Link] you'll use module name pgpool-recovery). Another important
thing to check is if actual module exists in PostgreSQL library
(/usr/lib/postgresql/9.5/lib).
Here I need to admit something: in some circumstances PostgreSQL is smart enough to
automatically change MODULE_PATHNAME before executing the script. For example it happens
while we are installing extensions, and thanks to that we don't need to change extension files also,
although they also contain MODULE_PATHNAME. For example, pgpool_recovery--
[Link] file we previously copied to extension directory also contains MODULE_PATHNAME, but
we don't need to change it because PostgreSQL will do this automatically on execution. But with
regular SQL script files I don't know how to accomplish this, and for this reason I've changed
everything manually. Except a bit more work, there's no harm in doing that.
It's probably not necessary, but I like to have all directories and files in /etc/postgresql
owned by postgres user, so I'll execute:
chown postgres:postgres -R /etc/postgresql/9.5/main/sql
Note for CentOS (yum) users
I really don't know if (and where) installing from yum package copies extensions and scripts, so
please try to figure out what is where.

[Link]. Service Script


Of course we would like pgpool-II to run as service, so we need to create and register service script.
I've created one by modifying one that gets installed when you install pgpool-II v3.3 package from
the official Ubuntu repository. Here I'll provide this script, but you can change it if you want.
Actually there are two parts of this script: default file which contains default configuration options,
and the script itself. I'll provide both files here. Note that if you've used different directories, you
need to adjust the files appropriately.
Defaults file:
/etc/default/pgpool2

# Defaults for pgpool initscript


# sourced by /etc/init.d/pgpool2

# set to "yes" if you want to enable debugging messages to the log


PGPOOL_LOG_DEBUG=no

# config file
PGPOOL_CONFIG_FILE=/etc/pgpool2/3.5.2/[Link]

# hba file
PGPOOL_HBA_CONFIG_FILE=/etc/pgpool2/3.5.2/pool_hba.conf

# pcp config file


PGPOOL_PCP_CONFIG_FILE=/etc/pgpool2/3.5.2/[Link]

# PID file. Must be the same as defined in [Link]


(pid_file_name)
PGPOOL_PID_FILE=/var/run/postgresql/[Link]

Service script:
/etc/init.d/pgpool2

#! /bin/sh

### BEGIN INIT INFO


# Provides: pgpool2
# Required-Start: $remote_fs $syslog
# Required-Stop: $remote_fs $syslog
# Should-Start: postgresql
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: start pgpool-II
# Description: pgpool-II is a connection pool server and
replication
# proxy for PostgreSQL.
### END INIT INFO

PATH=/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/sbin/pgpool

test -x $DAEMON || exit 5

# Include pgpool defaults if available


if [ -f /etc/default/pgpool2 ] ; then
. /etc/default/pgpool2
fi

PIDFILE=${PGPOOL_PID_FILE:-/var/run/postgresql/[Link]}

PGPOOL_CONFIG_FILE=$
{PGPOOL_CONFIG_FILE:-/etc/pgpool2/3.5.2/[Link]}
if [ ! -f $PGPOOL_CONFIG_FILE ]; then
echo "Config file not found."
log_failure_msg "Config file not found."
exit 1
fi

if [ x"$PGPOOL_LOG_DEBUG" = x"yes" ]; then


OPTS="-d -f $PGPOOL_CONFIG_FILE"
else
OPTS="-f $PGPOOL_CONFIG_FILE"
fi

PGPOOL_PCP_CONFIG_FILE=$
{PGPOOL_PCP_CONFIG_FILE:-/etc/pgpool2/3.5.2/[Link]}
if [ -f $PGPOOL_PCP_CONFIG_FILE ]; then
OPTS="$OPTS -F $PGPOOL_PCP_CONFIG_FILE"
fi

STOPOPTS=$OPTS

PGPOOL_HBA_CONFIG_FILE=$
{PGPOOL_HBA_CONFIG_FILE:-/etc/pgpool2/3.5.2/pool_hba.conf}
if [ -f $PGPOOL_HBA_CONFIG_FILE ]; then
OPTS="$OPTS -a $PGPOOL_HBA_CONFIG_FILE"
fi

. /lib/lsb/init-functions

is_running() {
pidofproc -p $PIDFILE $DAEMON >/dev/null
}

d_start() {
if ! test -d /var/run/postgresql; then
install -d -m 2775 -o postgres -g postgres
/var/run/postgresql
fi

if ! test -d /var/log/pgpool; then


install -d -m 2775 -o postgres -g postgres /var/log/pgpool
fi

if is_running; then
:
else
echo "FD - Starting pgpool-II by executing:"
echo "$DAEMON -n $OPTS >> /var/log/pgpool/[Link] 2>&1
&"
su -c "$DAEMON -n $OPTS >> /var/log/pgpool/[Link] 2>&1
&" - postgres
fi
}

d_stop() {
echo "FD - Starting pgpool-II by executing:"
echo "$DAEMON $STOPOPTS -m fast stop"
su -c "$DAEMON $STOPOPTS -m fast stop" - postgres
}

d_reload() {
echo "FD - Reloading pgpool-II by executing:"
echo "$DAEMON $OPTS reload"
su -c "$DAEMON $OPTS reload" - postgres
}

case "$1" in
start)
log_daemon_msg "Starting pgpool-II" pgpool
d_start
log_end_msg $?
;;
stop)
log_daemon_msg "Stopping pgpool-II" pgpool
d_stop
log_end_msg $?
;;
status)
is_running
status=$?
if [ $status -eq 0 ]; then
log_success_msg "pgpool-II is running."
else
log_failure_msg "pgpool-II is not running."
fi
exit $status
;;
restart|force-reload)
log_daemon_msg "Restarting pgpool-II" pgpool
d_stop && sleep 1 && d_start
log_end_msg $?
;;
try-restart)
if $0 status >/dev/null; then
$0 restart
else
exit 0
fi
;;
reload)
log_daemon_msg "Reloading pgpool-II" pgpool
d_reload
log_end_msg $?
;;
*)
log_failure_msg "Usage: $0 {start|stop|status|restart|try-
restart|reload|force-reload}"
exit 2
;;
esac

Now we can register the service by executing:


update-rc.d pgpool2 defaults
But since we haven't configured pgpool-II yet, let's temporary disable the service:
update-rc.d pgpool2 disable
You've might noticed in the previous script that I've decided to run pgpool-II service as postgres
user. I've done so because I believe that it'll make my life easier later since a lot of scripts need to be
run as postgres user. Nevertheless, if you install pgpool-II from package you'll see that the original
script also runs the service this way.

Note for CentOS (yum) users


If you've installed pgpool-II from package, chances are that service script is already installed -
please check.

Where to Go Next?
I'm getting tired of this. Hopefully we'll get HA in the next part - PostgreSQL HA with pgpool-II -
Part 5. (I've already told that I'm going with this tutorial and my implementation in parallel, so I still
don't have HA up and running.)

Part 5
In this part we'll deal with configuring pgpool-II, and installing pgpoolAdmin. Again, the main
resource I've used is pgpool-II manual, but here I'll provide the essence.
Unless explicitly noted otherwise, everything described in this page should be implemented on both
nodes.

5.1 Preparing PostgreSQL for pgpool-II


Let's first see which scripts / extensions we'll install and use. In the previous part of this tutorial we
have prepared the following scripts / extensions:
According to the original documentation, this script is used
insert_lock.sql when pgpool2 runs in replication mode, but we will use
master/slave mode instead. As far as I understand it means that
we don't need it, and I won't install it.
[Link] / According to the original documentation, it is needed only if
pgpool_regclass.control you are using PostgreSQL version prior to 9.4, so we won't
extension install it.
[Link] / According to the original documentation, it is needed for
pgpool_recovery.control online recovery. I'm still not 100% sure if it is actually needed
extension with replication slots, but I will install it.
It should be installed on every PostgreSQL instance used by
pgpool-II, so we'll install it. Note: although original
pgpool_adm.sql / documentation says that it should be installed on every
pgpool_adm.control PostgreSQL server, in our case (master/slave streaming
extension replication) it should be installed on the primary server only.
As we already know, we cannot change read-only standby
server anyway.
As you can see, except for insert_lock.sql, we have an option to install particular feature either by
using SQL script (i.e. [Link]), or by using extension (i.e.
pgpool_recovery.control extension). You can use either, but not both. Here I will first
show how we can use SQL scripts, and then how to install an extension. Before starting I'll remind
you about something: when creating a new database PostgreSQL uses existing template1
database as template. It means that by installing particular script / extension on template1
database it will also be applied to any future databases. But if you already have existing databases
created before the feature is installed on template1 database - you should install the feature on
those databases also.
Here I'll show how to install features of interest by using SQL scripts:
# Navigate to SQL scripts directory:
cd /etc/postgresql/9.5/main/sql
# Execute scripts:
sudo -u postgres psql -f [Link] template1
sudo -u postgres psql -f pgpool_adm.sql template1
Or the same thing by using extensions:
sudo -u postgres psql template1
=# CREATE EXTENSION pgpool_recovery;
=# CREATE EXTENSION pgpool_adm;
=# \q
5.2. Preparing Scripts
pgpool-II is capable to decide when the failover should be performed, but it actually doesn't know
how to perform it. For this reason we need to create failover script that will be used by pgpool-II to
actually perform the failover. Similarly pgpool-II needs recovery script. But when it comes to
scripts, there's always infinite number of ways to accomplish the task. Basically, the failover script
should simply create the trigger file (explained in PostgreSQL HA with pgpool-II - Part 2) on the
newly promoted primary server. Similarly recovery script should do all the steps described in
PostgreSQL HA with pgpool-II - Part 2 related to the standby server and establishing the
replication. You can check the following resources to see how it is done:
 pgpool-II manual
 pgpool-II Tutorial [watchdog in master-slave mode]
 Simple Streaming replication setting with pgpool-II(multiple servers version)
Personally I wasn't fully satisfied with any of these, so I'll do the same in my way, and I'll rely on
scripts we've already created in PostgreSQL HA with pgpool-II - Part 3.
There's one important feature of all the scripts we'll create in this section that you need to be aware
of: it does not necessary mean that they will affect the host they are residing in, or they are executed
from. In general they will act upon another host by using SSH. Actually, we should make them
with such behavior.

5.2.1. [Link]
As the name implies this script should perform failover. As we already know it is easy to do - we
should simply create a trigger file on the server which should takeover primary role. Here's an
example script (which I've picked up from some of the resources enumerated above, and we'll not
actually use):
[Link]
#!/bin/bash -x
FALLING_NODE=$1 # %d
OLDPRIMARY_NODE=$2 # %P
NEW_PRIMARY=$3 # %H
PGDATA=$4 # %R

if [ $FALLING_NODE = $OLDPRIMARY_NODE ]; then


if [ $UID -eq 0 ]
then
su postgres -c "ssh -T postgres@$NEW_PRIMARY touch
$PGDATA/trigger"
else
ssh -T postgres@$NEW_PRIMARY touch $PGDATA/trigger
fi
exit 0;
fi;
exit 0;
The script obviously does what needs to be done - creates trigger file at failover server. You may ask
why not using it then? Well, there are two things I don't like about this script:
 It does not deal with the old primary in any way. As we know, it can be dangerous if the old
primary server brings back still thinking that he's the primary. It is true that we can disable
old primary server by using some other script, or some other way, but I believe that the best
place to implement this would be the same script (to prevent forgetting this step).
 Another reason is that this script performs failover, not full promotion (see PostgreSQL HA
with pgpool-II - Part 2) of the server to the primary server role. I want to perform full
promotion immediately.
Let's see the script that we'll actually use:
/etc/pgpool2/3.5.2/[Link] Expand source
At the first glance this script is very similar to the previous one, but the key differences are:
 The new script logs every execution. This will help us to understand when a particular script
is executed and bridge enormous gap in the official documentation this way.
 Instead of simply creating trigger file at the new primary, this script executes [Link]
script we've created in PostgreSQL HA with pgpool-II - Part 3, which performs the full
promotion. Also note that -d flag with the old primary server is specified, meaning that
[Link] script will also try to disable the old primary server.
 The new script cut out PGDATA argument which was used in the first script since it is not
needed here. On the other side the new script introduced two new arguments (REPL_PASS
and TRIGGER_FILE). Later in this page we'll see how we can instruct pgpool-II to send
these parameters also while calling [Link] script.
The last thing to do is to ensure script ownership and permissions:
chown postgres:postgres /etc/pgpool/3.5.2/[Link]
chmod 0700 /etc/pgpool/3.5.2/[Link]
5.2.2. recovery_1st_stage.sh
Frankly speaking, when it comes to recovery of a standby server we would be quite happy even
without pgpool-II help. The only thing we need to do is to execute
initiate_replication.sh script explained in PostgreSQL HA with pgpool-II - Part 3. But
for sake of completeness I will configure recovery through pgpool-II also.
I've already mentioned that the official pgpool-II documentation is poor, but when it comes to
recovery script it becomes even worse, and I'll probably dedicate another post to enumerate some
important omissions of the official documentation. Here I'll list my conclusions based on painful
hours of research:
 Standby recovery script should be specified in recovery_1st_stage_command key of
[Link] file. It is explained below.
 recovery_1st_stage_command script is not customizable in terms of input
arguments. There are few input arguments, and all of them are predetermined.
 According to the documentation recovery_1st_stage_command must reside in
PostgreSQL data directory (/var/lib/postgresql/9.5/main), for security reasons.
 Current primary server is not specified by input arguments. I've spent awhile trying to
understand how to get this information within the script until I'll realized that the script
always executes on the primary server, so we can get primary server hostname by querying
$HOSTNAME environment variable. pgpool-II team, thanks for not sharing this with us! If
you continue this way you'll end up developing a toy for yourself.
Besides my hard efforts there are still some things I don't know at the moment, and that I'll discover
by logging every script execution and then turning on and off different servers. For example:
 When recovery_1st_stage_command is executed? Is it executed automatically by
pgpool-II in some circumstances, or only when triggered by human interaction? pgpool-II
team, please don't tell us that! You'll spoil the surprise if you do.
But my frustrations aside, let's see the actual file I've finally came up with:
/var/lib/postgresql/9.5/main/recovery_1st_stage.sh Expand source
Explanation:
 The script logs execution so that we can bridge the missing documentation gap;
 Checks if $REMOTE_HOST argument refers to the primary host itself, and exits with error if
it does;
 Checks for primary info file created by [Link] script (explained in PostgreSQL HA
with pgpool-II - Part 3), and reads additional data
(REPL_USER, REPL_PASSWORD, TRIGGER_FILE and STANDBY_FILE) from it;
 Checks if trigger and standby files are OK (trigger file must exist, while standby file must
not exist), and exits with error if something is wrong;
 Executes initiate_replication.sh script (explained in PostgreSQL HA with
pgpool-II - Part 3) at $REMOTE_HOST through SSH, impersonating postgres user if
necessary.
You might noticed that I've included several checks in this script before the action should be done.
The reason for me to do so is that executing this script against primary server can be dangerous - it
would destroy the primary server thus destroying HA cluster.
Another thing you might noticed that I haven't used pg_start_backup
and pg_stop_backup which are often used in other similar scripts you can find online. As far as
I know these instructions are needed if we use manual rsync for copying backup, not when
pg_basebackup command is used (in my case it is used internally by
initiate_replication.sh script). I believe that if these are needed PostgreSQL team would
include them in pg_basebackup, right?
Finally let's ensure script ownership and permissions:
chown postgres:postgres
/var/lib/postgresql/9.5/main/recovery_1st_stage.sh
chmod 0700 /var/lib/postgresql/9.5/main/recovery_1st_stage.sh
5.2.3. pgpool_remote_start
It is another not-so-necessary script. It is called by pgpool-II after recovery of a standby server is
finished, and it's purpose is to start the database. In our case postgresql service is automatically
started by initiate_replication.sh script called by recovery_1st_stage.sh script,
so a new script is not necessary. But I will create a trivial script that ensures that postgresql service
is running:
/var/lib/postgresql/9.5/main/pgpool_remote_start Expand source
In this script I've used 'service postgresql start' to ensure that PostgreSQL is started. In
other resources you'll often see pg_ctl used for this purpose. In my case there's no difference
between the two. You can learn more about the differences in my other post Managing PostgreSQL
Process on Ubuntu - service, pg_ctl and pg_ctlcluster.
This script is not configurable in [Link], so it has to be named exactly
pgpool_remote_start (without extension), and it has to be placed in PostgreSQL data
directory (in our case /var/lib/postgresql/9.5/main).
Again we'll ensure file ownership and permissions:
chown postgres:postgres
/var/lib/postgresql/9.5/main/pgpool_remote_start
chmod 0700 /var/lib/postgresql/9.5/main/pgpool_remote_start
5.3. Configuring pgpool-II
Once we have PostgreSQL prepared and all the scripts in place, we can finally start with
configuring pgpool-II.

5.3.1. [Link]
We need to slightly change PostgreSQL's main configuration file, so at the end of
[Link] file add the following line:
pgpool.pg_ctl = '/usr/lib/postgresql/9.5/bin/pg_ctl'
Adding this line will allow us to use pgpool_pgctl function (which actually will call the
function we've specified here). Please confirm that the path I've provided here is valid in your case -
it must point to an existing pg_ctl file.

5.3.2. [Link]
Let's continue with another easy part - [Link] file. This file is used by pgpool-II control
interface for authentication, meaning that in this file you'll specify who can access pgpool-II control
interface. During the installation of pgpool-II (PostgreSQL HA with pgpool-II - Part 4), we've
created sample file (/etc/pgpool2/3.5.2/[Link]). Let's copy the sample file
and create the version which we'll actually use:
cp /etc/pgpool2/3.5.2/[Link] /etc/pgpool2/3.5.2/[Link]
The next thing to do is to add one or more lines in the following format:
username:[password encrypted in md5]
where username should be replaced with actual username, and part in square brackets with md5
encrypted password. You can use pg_md5 command to encrypt passwords. Let me show on an
example: I'll create user "admin" with password "pa55w0rd". The first thing I'll do is to md5
encrypt the password by executing:
pg_md5 pa55w0rd
97bf34d31a8710e6b1649fd33357f783
The second line is the result, of course. Now I'll use this result and I'll add the following line to
[Link] file:
admin:97bf34d31a8710e6b1649fd33357f783
And that's it. You should do the same for your user(s) and password(s).
Although the user set here doesn't have to be PostgreSQL user ("admin" used here is not a
PostgreSQL user), if you want to use some superuser features in pgpoolAdmin later, you need to set
PostgreSQL superuser here (i.e. "postgres"), and set the same password as used in PostgreSQL.
Actually, since you can set multiple users it might be a good idea to set both users here, and then
select the one that will be used to login to pgpoolAdmin based on the task that should be performed.
If you'll add "postgres" user here (or some other superuser), I have to tell you that I couldn't make it
work without adding the following lines to pg_hba.conf file on both servers:
host all postgres [Link]/32
trust
host all postgres [Link]/32
trust
I couldn't make it work with md5 - only with trust method. Keep in mind that this can be
significant security weakness.

5.3.3. [Link]
Well, this will cause us more pain, but let's start anyway. We'll start by copying from template file
we've created while installing pgpool-II:
cp /etc/pgpool2/3.5.2/[Link]-stream
/etc/pgpool2/3.5.2/[Link]
We've selected [Link]-stream template because it is prepared for
master/slave streaming replication (our scenario). Now we'll adjust it, bit by bit. I assume that
config values that are not mentioned in this section are left unchanged, but if you want (and you
know what you are doing) - you can change them also.
Let's start with connection settings:
listen_addresses = '*'
port = 5432
socket_dir = '/var/run/postgresql'
I'll skip commenting values that are obvious, but only ones that are worthwhile noting:
 port - As you might remember, while installing PostgreSQL (PostgreSQL HA with pgpool-
II - Part 2) we've moved its usual port 5432 to 5433 in order to reserve the first one for
pgpool-II. Well now we are using it as planned.
 socket_dir - I've selected /var/run/postgresql not only because it is
recommended in the template file, but also because the same directory is set as default PID
file directory in /etc/default/pgpool2 file (see PostgreSQL HA with pgpool-II - Part
4).
pgpool communication manager connection settings:
pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/var/run/postgresql'
These are all defaults except for the last one (pcp_socket_dir) which is again set this way due
to the same reasons as socket_dir is.
In backend connection settings we'll actually specify our PostgreSQL instances:
backend_hostname0 = 'IT-RDBMS01'
backend_port0 = 5433
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/9.5/main'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'IT-RDBMS02'
backend_port1 = 5433
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/9.5/main'
backend_flag1 = 'ALLOW_TO_FAILOVER'
Configuration option that is worthwhile noting here is backend_weight (0 and 1). It is used in load
balancing, and allows you to specify how the load should be distributed. For example if you set
backend_weight1 = 0, the second node won't be used in load balancing at all. If you, for
example, want the first node to get twice as many queries than the second node, you can
specify backend_weight0 = 2 and backend_weight1 = 1. Effectively it means that
about 66.7% of the queries will be sent to the first node, and about 33.3% to the second.
Nevertheless, don't forget that only read-only queries are subjected to load balancing. All write
queries have to be sent to the primary node anyway.
In "FILE LOCATIONS" section of the file we'll set:
pid_file_name = '/var/run/postgresql/[Link]'
Important
It is important that pid_file_name value defined above is the same as one used for
PGPOOL_PID_FILE in /etc/default/pgpool2 (see PostgreSQL HA with pgpool-II - Part
4).
In "REPLICATION MODE" section of the file we'll leave default values. Recheck the following:
replication_mode = off
Remainder
Don't forget, in pgpool-II terminology we are not using replication mode but master/slave
streaming mode.
In "LOAD BALANCING MODE" section we'll leave defaults, ensuring that:
load_balance_mode = on
Optional
Actually this setting is optional. If you don't want to use load balancing, and want all the queries to
be directed to the primary server, you can set off here.
In "MASTER/SLAVE MODE" once again we'll leave default values. Ensure that:
master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 5
sr_check_user = 'postgres'
sr_check_password = 'changeit'
Notes:
 sr_check_user - I'm not sure if has to be postgres;
 sr_check_password - Well yeah, change it.
In "HEALTH CHECK" section of the file set the following:
helth_check_period = 5
health_check_timeout = 0
helth_check_user = 'postgres'
health_check_password = 'p0579r35'
Explanations:
 helth_check_period - By selecting non-zero value we are turning on health check and
automatic failover. Setting the value to 5 means that health check will be performed every 5
seconds.
 helth_check_user - Does not have to be postgres. If you pick another user make sure
that he has read permissions on "postgres" database, (or the database specified
in health_check_database).
 health_check_password - Put your own super secret password - don't use my
In "FAILOVER AND FAILBACK" section put the following:
failover_command = '/etc/pgpool2/3.5.2/[Link] %d %P %H
myreplicationpassword /etc/postgresql/9.5/main/im_the_master'
Let's explain this piece by piece:
 /etc/pgpool2/3.5.2/[Link] - it's [Link] script file we've created
above;
 %d %P %H - are special characters which are telling pgpool-II that the script needs the
following arguments:
 %d - Backend ID of an attached node;
 %P - Old primary node ID;
 %H - Hostname of the new master node.
 myreplicationpassword - is replication user's password in my case. Again, put your
own here, don't use mine.
 /etc/postgresql/9.5/main/im_the_master - is full path of trigger file.
Obviously we've ordered arguments as we need them in [Link] file created above.
In "ONLINE RECOVERY" section of the file put the following:
recovery_user = 'postgres'
recovery_password = 'pa55w0rd'
recovery_1st_stage_command = 'recovery_1st_stage.sh'
recovery_2nd_stage_command = ''
Explanations:
 recovery_user - this time it has to be postgres user;
 recovery_password - you'll know what to do...;
 recovery_1st_stage_command - is set to recovery_1st_stage.sh script
we've created above. Note that this time only script name is used, without path and
parameters. (To remind you: the script has to be placed in PostgreSQL data directory for
security reasons).
In "WATCHDOG" section of the file ensure:
use_watchdog = on
trusted_servers = 'DC1,DC2'
Here I need to explain trusted_servers setting. For my big surprise there's no single example
online with this option set to anything than empty value, although in my opinion this setting is very
important. As the comment from [Link] file itself says, list of servers specified there are
used to confirm network connectivity.
Let's for a moment forget this setting (leave it empty as everyone else does), and consider what will
happen in our own scenario (two machines, each machine having one pgpool-II and one
PostgreSQL instance) if one of nodes (machines) looses network connection. Both pgpool-II
instances will loose connection to another pgpool-II instance, and to PostgreSQL instance on
another machine, but still they'll have connection with their local PostgreSQL instance. According
to that both pgpool-II instances (even the one that actually have lost connection) can conclude that
the other is down, and to promote itself to active pgpool-II instance. Even worse, both will conclude
that PostgreSQL instance on other machine is dead and that the local one should be promoted to
primary, and to perform failover and promotion! When connection is established again, we'll end up
with very bad situation having two primary PostgreSQL instances. Disaster caused by loosing
network connectivity of one node for just 10 seconds or so!
To prevent this we need to specify trusted_servers option so that both pgpool-II instances
can easily conclude something like: "Hey, I can't connect another pgpool-II and one of backend
databases - they are probably dead, let's promote new primary! But wait, I also can't connect any of
trusted servers, meaning that another pgpool-II might be OK - I'm the one who lost the connection,
so I won't change anything." Assuming that this feature is correctly implemented in pgpool-II - it is
a lifesaver in mentioned scenario.
To conclude: put list of couple of stable, pingable servers in this configuration key. In my case I've
put my domain controllers (DC1 and DC2), but you'll have to put some servers from your network.
There is more to set in the same section:
wd_hostname = 'IT-RDBMS01'
wd_port = 9000
wd_priority = 2
Obviously we've came to the part where the settings will be different on different nodes.
Explanations:
 wd_hostname - Hostname or IP address of this watchdog. Meaning on IT-RDBMS01 this
value will be IT-RDBMS01, on IT-RDBMS02 it will be IT-RDBMS02. Just to remind you:
IT-RDBMS01 and IT-RDBMS02 are hostnames in my case.- in your case they'll be
different.
 wd_port - It'll be the same on both nodes. Let's leave it at default value (9000).
 wd_priority - Priority of this watchdog in leader election. Higher value wins, meaning
that if there are two watchdogs (two pgpool-II instances), the active instance (master) will be
the one with higher wd_priority value. In my case I'll set higher priority to node which
hosts primary PostgreSQL instance (IT-RDBMS01). This way I'll decrease network
communication needed.
I'll repeat the same settings for my other host (IT-RDBMS02):
wd_hostname = 'IT-RDBMS02'
wd_port = 9000
wd_priority = 1
And there's more to set in the same section:
wd_ipc_socket_dir = '/var/run/postgresql'
delegate_IP = '[Link]'
Explanations:
 wd_ipc_socket_dir - Set to this value due to the same reasoning as
with socket_dir and pcp_socket_dir above;
 delegate_IP - Is actually virtual IP that is explained in PostgreSQL HA with pgpool-II -
Part 1, and selected in PostgreSQL HA with pgpool-II - Part 2. In my case it is [Link],
and you should change yours appropriately.
And now I have to admit one thing: we won't finish configuring pgpool-II in this part of the tutorial.
Few things we have to leave for the next, PostgreSQL HA with pgpool-II - Part 6, where we'll finish
with pgpool-II (hopefully), and install pgpoolAdmin. The following configuration values will be
configured in the next part: if_cmd_path and arping_path.
But no, it's not over yet. There's more to set in this same section:
wd_lifecheck_method = 'heartbeat'
wd_interval = 3
wd_heartbeat_port = 9694
Heartbeat settings also requires us to set the other pgpool, and it will be different on different nodes,
of course. In my case, on IT-RDBMS01 it will be:
heartbeat_destination0 = 'IT-RDBMS02'
heartbeat_destination_port0 = 9694
and on IT-RDBMS02 it will be:
heartbeat_destination0 = 'IT-RDBMS01'
heartbeat_destination_port0 = 9694
Still in the same section we also need to set other pgpool-II settings. Again, it will be different on
our two nodes, of course. In my case on IT-RDBMS01 host:
other_pgpool_hostname0 = 'IT-RDBMS02'
other_pgpool_port0 = 5432
other_wd_port0 = 9000
and similarly on IT-RDBMS02:
other_pgpool_hostname0 = 'IT-RDBMS01'
other_pgpool_port0 = 5432
other_wd_port0 = 9000
You might wonder why we need to specify other pgpool-II multiple times, but I can't help you with
that. I'm wondering too. It is again about lack of a good documentation, and even worse - confusing
and contradictory existing documentation. For example you can find two almost identical tutorials
at [Link], both having the same title "pgpool-II Tutorial [watchdog in master-slave mode]"
(here and here), where the first one does not use heartbeat, while the second uses it. Neither the first
one explains why it is not used, neither the second explains why it is used. Again, pgpool-II suffers
a lot of poor documentation problem.

Where to Go Next?
I believe that you're tired of everything, but believe me, I'm sick of everything! You've spent few
hours in this tutorial, while I spent more than a month to gather everything needed and putting it
together. But both of us have to be patient little more. In PostgreSQL HA with pgpool-II - Part 6
we'll hopefully finish with pgpool-II configuration, and install pgpoolAdmin.

Part 6

6.1. Additional Packages


We need to install few more packages:
apt-get install iputils-arping apache2 php5 libapache2-mod-php5
php5-pgsql
First I must admit that I'm not sure if iputils-arping is the right choice. Watchdog needs
arping command, but there are two packages in Ubuntu apt store that are offering it: iputils-
arping (I've used above), and arping. I'm not 100% sure which should be installed, and I
selected the first one only because it understands flags that are used in [Link] template
file (arping_cmd = 'arping -U $_IP_$ -w 1'). If you install arping package,
arping command will complain about -U flag.
Apache and PHP are needed for pgpoolAdmin we'll install later.

6.2. ip and arping


If you remember from the last part, we've left two configuration options from [Link] file
(if_cmd_path and arping_path) for the next (this) part. Let's explain what is the issue with
these: In order to be able to manipulate virtual IP pgpool-II needs to be able to execute ip and
arping commands. But there's a catch: these commands are requiring root access, and as you
might remember pgpool-II service runs under postgres user which doesn't have root permissions. It
is true that we could let the service run as root, but again it wouldn't solve the problem - since we'll
install and use pgpoolAdmin (which runs under Apache), www-data user (it is Apache user on
Ubuntu) also needs to be able to execute these commands.

There are several ways to accomplish this, and many times mentioned pgpool-II Tutorial [watchdog
in master-slave mode] from [Link] uses copying command binaries to user's home and
changing permissions approprietely. Nevertheless, the tutorial also says:
"Note that explained above should be used for tutorial purpose only. In the real world you'd better
create setuid wrapper programs to execute ifconfig and arping. This is left for your exercise."
Well guyz, thanks for exercising me, but it would be much more helpful if you've actually showed
how it should be done. Thanks for the tutorial that shows how it shouldn't be done.
Once again I'm left alone to find a way. I've already mentioned that there's a lot of ways to
accomplish this, and I've select the one (not necessarily the best or the easiest).

6.2.1. sudoers file


The first thing I'll do is to allow postgres and www-data users to sudo execute these commands
without being prompted for root password. I've accomplished this by adding the following lines to
sudoers file:
postgres ALL=(root) NOPASSWD: /bin/ip
www-data ALL=(root) NOPASSWD: /bin/ip
postgres ALL=(root) NOPASSWD: /usr/bin/arping
www-data ALL=(root) NOPASSWD: /usr/bin/arping
Never edit sudoers file with a normal text editor! Always use the visudo command instead! If
you don't know how to do that, here is one resource that may help.
You can add these lines at the end of the file. Of course confirm that paths provided are correct, and
that they are pointing to an existing ip and arping files.

6.2.2. Command Wrappers


The next thing we'll do is creating simple wrapper scripts for mentioned commands. I'll place each
script in the same directory where the command it wraps is, and I'll name the scripts by the wrapped
command with suffix "_w". Let's start with the wrapper for ip command:
/bin/ip_w
#!/bin/bash
# By Fat Dragon, 05/26/2016
# Wraps ip command

if [ $UID -eq 0 ]
then
#echo "Executing: /bin/ip $@"
/bin/ip $@
else
#echo "Executing: sudo /bin/ip $@"
sudo /bin/ip $@
fi

exit 0
Similarly, the script for arping command will be:
/usr/bin/arping_w
#!/bin/bash
# By Fat Dragon, 05/26/2016
# Wraps arping command

if [ $UID -eq 0 ]
then
#echo "Executing: /usr/bin/arping $@"
/usr/bin/arping $@
else
#echo "Executing: sudo /usr/bin/arping $@"
sudo /usr/bin/arping $@
fi

exit 0
Basically what the scripts do is forcing sudo execution of wrapped command if currently
executing user is not root.
After the scripts are saved you'll need to set the permissions:
chmod 0755 /bin/ip_w
chmod 0755 /usr/bin/arping_w
After that you can confirm that postgres user is able to sudo-execute the commands without being
prompted for password:
root@IT-RDBMS01:~# sudo -i -u postgres
postgres@IT-RDBMS01:~$ ip_w a
Executing: sudo /bin/ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state
UNKNOWN group default
link/loopback [Link] brd [Link]
inet [Link]/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state
UP group default qlen 1000
link/ether [Link] brd [Link]
inet [Link]/16 brd [Link] scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::215:5dff:fe05:520/64 scope link
valid_lft forever preferred_lft forever
postgres@IT-RDBMS01:~$ logout
root@IT-RDBMS01:~#
Notice the third line (Executing: sudo /bin/ip a) - obviously we've succeeded executing
ip command with sudo, without being prompted for password. Btw. this line will not appear if
you comment out tracing echo statements in the scripts, as I've already did above. Actually I've
reactivated echo lines only for this test execution, and commented them out again (a good wrapper
should return the exact output gotten form the original command, nothing more).

6.3. Finishing pgpool-II Configuration


Finally we are ready to finish pgpool-II configuration. Set the following values in [Link]
file:
Notes:
 Previous values are very similar to default ones. I've changed only paths (if_cmd_path
and arping_path), and command names (ip_w instead of ip and arping_w instead of arping).
 I want to bring your attention to $_IP_$/24 part: Actual iP address will be set
automatically by pgpool (replacing $_IP_$ placeholder). But you should set the subnet
(/24) part appropriately, depending on your network.

6.4. Authentication, Access Control and pool_hba.conf


When you think that the beast is defeated - pgpool-II strikes back with yet another catch. This time
it is authentication, and this time is really hard to find any sane reason (besides we-want-you-suffer
one) for them to implement authentication in a way they did. Let me explain what is all about.
pgpool-II introduces its own authentication mechanism, so that no client can connect to it if not
authenticated properly. Sounds like reasonable decision, right? But wait, you need to know the
whole story. Unreasonable thing is that pgpool's authentication does not replaces PostgreSQL
authentication (which would also be wrong), but it simply adds another layer of authentication, so
that a user first must to authenticate with pgpool, and then with PostgreSQL itself in order to
execute a query. Having two authentications for a single query (single SELECT statement if you
want) is already pointless and wrong. But actual problem is even bigger due to the fact that pgpool's
authentication is poorly implemented, and does not uses existing PostgreSQL mechanism. It means
that for every database user you'll have to:
 Manage his password in two different systems. The password has to be the same in both
systems, but you must manage it separately, meaning that if you want to change the
password you'll have to do so in PostgreSQL and pgpool.
 You'll have to manage two host-based access (hba) files - pg_hba.conf (well known
PostgreSQL's hba file) and pool_hba.conf (pgpool's hba file). Again, the setting in these
files must be equivalent. For example I initially planned to set trust authentication for all
users in pool_hba.conf and then md5 in pg_hba.conf, to basically disable pgpool
authentication and do the actual authentication at PostgreSQL. But it is not possible - if
different authentication methods are used - authentication will fail.
It makes sense to have authentication for pgpool's administrative tasks, but introducing another
authentication layer for query execution is pointless, and frankly speaking - stupid. When it comes
to query execution pgpool should simply pass-through - it is not its responsibility to authenticate. It
is not a special-ultra-security product, but failover / load-balancing product.
Sorry for criticism! Let's do this. There are few things we need to do, so let's start by setting the
following in [Link] file:

enable_pool_hba = on
pool_passwd = 'pool_passwd'
Next we'll create pool_hba.conf file, by copying from template (remember, while installing
pgpool-II in PostgreSQL HA with pgpool-II - Part 4 we've prepared some templates);
cp /etc/pgpool2/3.5.2/pool_hba.[Link]
/etc/pgpool2/3.5.2/pool_hba.conf
pool_hba.conf is very similar to pg_hba.conf, except for few limitations (see pgpool
manual for details). For the purpose of this tutorial I'll only add one line that allows all users to
access all databases from my network ([Link]/16) by using md5 authentication:
host all all [Link]/16 md5
In order to enable md5 authentication we have to create pool_passwd file. Path and the name of
the file are specified in [Link] file (see above). Another interesting pgpool team's decision is
that path of the file is specified relative to [Link] file itself, meaning that in our case
pool_passwd file has to be placed in the same directory as [Link] file. Content of the
pool_passwd file is (in a way) very similar to content of [Link] file we've created in the
previous part of this tutorial, but pool_passwd file cannot contain comments nor empty lines.
Another difference is that md5 hash of the password cannot be created in the same way as for
[Link] file (no doubt that they want you suffer). Entries in pool_passwd file should be
created in the following way:
pg_md5 -f /etc/pgpool2/3.5.2/[Link] -m -u postgres
postgrespassword
Here I've added user "postgres" with password "postgrespassword" to pool_passwd file. The
command will execute without any command output - it adds user and password to pool_passwd
file automatically (if the file does not exist it'll be created). As you can see it also requires path to
[Link] file as input argument (obviously to see where pool_passwd file is). After the
execution you can check /etc/pgpool2/3.5.2/pool_passwd file content, and you'll find
something like the following there:
postgres:md55cb5bf77d7027e6c4e50fa4112df4d63
If you have multiple users - you'll have multiple line in the file. You can also add lines manually,
but if you do so you need to:
 Ensure that newly added line ends with new-line character. If there's no new-line character at
the next execution pg_md5 command will concatenate the next user and its password in the
same line.
 Find a way to create password hash. As I've already mentioned it is not the same hash as in
[Link] file, and I don't have a clue how this one can be generated.
Finally you need to ensure that md5 access for a particular user is also enabled in pg_hba.conf.
Important
Don't forget: usernames and password have to be exactly the same as in PostgreSQL.

6.5. Starting pgpool-II


We have finally finished with pgpool-II configuration, so we can enable and start the service:
update-rc.d pgpool2 enable
service pgpool2 start

6.6. Testing pgpool-II


If you've done everything right now you should be able to see pgpool-II running. You can test the
following:
 'service pgpool2 status' should report that the service is running on both
machines;
 'ifconfig -a' should show you that one machine has additional IP address (virtual IP)
labeled with eth0:0;
 You should be able to connect to pgpool-II from any other server by using virtual IP and port
5432. You can try with pgAdmin3 for example.

6.7. Installing pgpoolAdmin


It is another part of the procedure that is not well documented, but luckily it is not too hard figure it
out. About the official documentation I'll tell you just two things:
 It dates back to 2006-2008, and its created for version 2.1 (the current version is 3.5.2);
 It is written in almost not-understandable English (i.e. "To execute pgpool and the pcp tool
set up from the Apach user, the right of access is set.")

6.7.1. Preparing .pcppass File


This file is needed by pgpoolAdmin for authentication for pcp commands. As you might remember
in the previous part we've created [Link] file that contains authentication info for executing
pcp commands (i.e. pcp_node_count, pcp_node_info, etc.). The very same file and the
same authentication info is used for authenticating to pgpoolAdmin portal. For example, we've
created there user "admin" with password "pa55w0rd", and now we'll use this combination to login
to pgpoolAdmin portal. But it is not the end of authentication - even when you are logged in to
pgpoolAdmin portal different portal functions are trying to execute pcp commands in background,
and every execution must be authenticated. Although you've submitted username/password while
logging in, the portal does not store this info, and needs username/password for every pcp command
execution. This is where .pcppass file becomes important - the portal will always read
username/password from the file, without bothering you to enter it again and again.
.pcppass file is very similar to .pgpass file we've created in PostgreSQL HA with pgpool-II -
Part 4, with only difference that .pcppass file misses database part. It means that format of
.pcppass file entries is:
hostname:port:username:password
In our case we'll create .pcppass file as:
*:*:admin:pa55w0rd
You can read this as: all hosts, all ports, user admin, password pa55w0rd. There are several things to
note about the file and its content:
 Username / password combination must match to one used while creating [Link] file,
with a difference that in [Link] file we need to use md5 hash of the password, while in
.pcppass file we need to use plain password itself.
 .pcppass file we should be placed in user's home directory. Actually it is possible to place
the file wherever you want and specify its location in an environment variable (as
explained here), but we'll use the first approach.
 When the file is used by pgpoolAdmin portal (we are doing right now), it should be placed
in home directory of the user account under which Apache runs, meaning in www-data's
home directory. Home directory of www-data user is /var/www, so we need to place
.pcppass file there.
 When used with pgpoolAdmin portal the file must contain the same username/password
used when we've logged in to the portal.

[Link]. .pcppass File Beyond Apache


Although it is not related to pgpoolAdmin portal we are dealing with here, it is worthwhile to note
that the file can be useful even when you are executing pcp commands from command line. In this
case it will save you from always being prompted for password. But in this case it has:
 To be placed in home directory of the user who is executing the commands. For example, if
you are executing commands as root user, the file should be placed in /root/.pcppass.
 To contain appropriate username/password combination. When you are executing pcp
commands from command line you can specify the username by using -u flag (i.e.
'pcp_node_info -u admin ...'). If you've specified username this way, the file
must contain this username and its corresponding password. Of course, the same
username/password must also be specified in [Link] file. If you don't specify username
by using -u flag, then your UNIX username will be used, meaning that if you are executing
command as root user, .pcppass file (and again [Link] file also) must contain "root"
username and its corresponding password. In this case your UNIX password does not have
to match to the password specified in [Link] and .pcppass files.
Still talking about command line usage, I'll tell you how you can know if .pcppass file is set as
needed: When you try to execute any pcp command, if you get password prompt it means that the
file is not in the appropriate way.

6.7.2. [Link] and [Link] File Permissions


Among other things, pgpoolAdmin portal provides you an interface for changing [Link]
file, and interface for changing password (meaning changing [Link] file). But to be able to
save the changes, the portal has to have write permissions on mentioned files. For this reason we'll
change file permissions of these files:
chown www-data /etc/pgpool2/3.5.2/[Link]
chown www-data /etc/pgpool2/3.5.2/[Link]
This ownership change won't break anything in our previous setup; postgres user (pgpool2 service
runs under) does not changes these files anyway, only reads them, so it does have to be the owner.

6.7.3. Installing the Portal


Finally we can install the portal itself. As I've mentioned, pgpoolAdmin installation is not too hard,
and I'll simply provide a script that performes it, with explanations in comments:
# Navigate to temporary directory
cd /tmp

# If archive exists delete it


if [ -f [Link] ]; then
rm [Link]
fi

# Download installation archive


wget [Link]
-O [Link]

# If extracted directory exists delete it


if [ -d pgpoolAdmin-3.5.2 ]; then
rm -r [Link]
fi

# Extract the archive


tar -xzf [Link]

# Delete archive file


rm [Link]

# If virtual directory exists delete it


if [ -e /var/www/html/pgpooladmin ]; then
rm -r /var/www/html/pgpooladmin
fi

# Move extracted archive to the new location (under Apache root


directory)
mv pgpoolAdmin-3.5.2 /var/www/html/pgpooladmin

# Change ownership of the directory


chown root:root -R /var/www/html/pgpooladmin

# Adjust file and folder permissions


chmod 0777 /var/www/html/pgpooladmin/templates_c
chown www-data /var/www/html/pgpooladmin/conf/[Link]
chmod 0644 /var/www/html/pgpooladmin/conf/[Link]
After executing the script you should be able to access the portal, so first check if Apache / PHP are
working as expected by opening [Link] (you should
change hostname appropriately). At this location you should find standard phpinfo page starting
with something like:

Two parts are especially important: Multibyte Support should be enabled:

And PostgreSQL Support should be enabled:


If you don't see pgsql section at all, it's probably because Apache server was started before you've
installed php5-pgsql package. Try with restarting Apache (service apache2 restart)
and refreshing the page. If the section still missing ensure that php5-pgsql package is actually
installed.

[Link]. Installation Wizard


Once you've ensured that everything is ok with PHP and Apache, start installation wizard by
opening [Link] (again change the host). You should get
something like:

Select your language and click "Next".


The second step of the wizard is "Directory Check". You should see two green checks. If not - you
probably haven't set the appropriate file permissions (recheck the last three lines form the script
above). Click "Next".
In the third step there are many fields with green checks and red X's. In order to finish the
installation we need to ensure that there's no more red X's, but DON'T PANIC! Probably the only
cause for red X's is wrong path, and we'll fix it. Here's what I've had to change to make it green:
[Link] File /etc/pgpool2/3.5.2/[Link]
[Link] File /etc/pgpool2/3.5.2/[Link]
pgpool Command /usr/sbin/pgpool
PCP directory /usr/sbin
If you've used different directories during pgpool-II installation process - you'll have to
appropriately change the values here also.
After changing the values click "Check" button again and you should get all greens. Click "Next".
You should get something like:
If you see some error messages like "failed to execute pcp_...", "e1002", etc., chances are that you
haven't set .pcppass file appropriately. Look again above and recheck that the file is set as
needed.
Note that since I've used "admin" user (which is not PostgreSQL user) to login to this session,
there's "superuser: unknown (Connection error)" on the previous screenshot. Also majority of
buttons is grayed out. But if you login by using "postgres" user, you'll get something like:
All Problems Solved???
Well, no. Obviously pgpool-II is very immature (probably even too immature) product that won't
ever stop causing some headache. At this moment I'm struggling with a new question: why two
pgpool-II instances (which are targeting the same cluster) are showing me different info??? I'll give
you an example - screenshot of my IT-RDBMS01 pgpool instance (the same screenshot as the
previous one):
As you can see even a single instance provides contradictory info: node 1 is at same time "Down"
and "Up". And there's never and nowhere described button "Return". What the hack this button
does? And what it means in the first place? Term "return" hasn't been used in PostgreSQL
replication terminology, so obviously pgpool team has introduced a new one without any
explanation. Guys, you can name a button "Pray it back" or "Normalize it up", but at least you
should explain what actually this button does. Does this button initializes recovery
(recovery_1st_stage, etc) or not?
And here's the screenshot from my IT-RDBMS02 instance:
The same cluster, the same servers, the same moment, two pgpool-II instances. Disappointing...
Very disappointing.

Where to Go Next?
Well, go wherever you want! I'm not your mom!
Just kidding, of course. Now you know how to implement PostgreSQL High Availability with
pgpool-II so the world is yours! But if you insist on my advice - beer is always a good choice.

You might also like