PostgreSQL Essentials v16
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Course Agenda
Introduction and Architectural Overview Database Security
System Architecture Monitoring and Admin Tools Overview
Installation SQL Primer
User Tools - Command Line Interfaces Backup and Recovery
Database Clusters Routine Maintenance Tasks
Database Configuration Data Loading
Data Dictionary Data Replication and High Availability
Creating and Managing Database Objects
Introduction
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
EDB Portfolio
History of PostgreSQL
Major Features
Architectural Overview
General Database Limits
Common Database Object Names
EDB Supported Databases
Postgres Postgres Extended Postgres Advanced Server
Open source Postgres EDB proprietary distribution for EDB EDB proprietary distribution with
Postgres Distributed use cases Transparent Data Encryption
with Transparent Data Encryption
EDB continues to be committed SQL compatible with Postgres, SQL compatible with Oracle,
to advancing features in extended for stringent reduces effort to migrate
collaboration with the broader availability and advanced applications and data to
community replication needs Postgres
Transparent Data Encryption Transparent Data Encryption
Formerly known as Additional value-add enterprise
2ndQPostgres features
PostgreSQL
The open-source database of choice
Performance Scalability Extensibility Community-driven
Handles enterprise Multiple technical Supported by a wide Multiple companies
workloads with 50% options for operating array of extensions and individuals
improvement in the Postgres at scale plus multiple SQL contribute to the
last 4 years and NoSQL data project and drive
models innovation
Facts about PostgreSQL
The world’s most advanced open sourcopen-source
Designed for extensibility and customization
ANSI/ISO compliant SQL support
Actively developed for more than 30 years
University Postgres (1986-1993)
Postgres95 (1994-1995)
PostgreSQL (1996-current)
PostgreSQL Lineage
PostgreSQL History
SQL Server
Sybase
SQL Server
Microsoft
Ingres Postgres PostgreSQL
UC Berkeley UC Berkeley Community
EnterpriseDB
Illustra To Informix
To IBM
Informix
Ingres Corporation Computer CA-OS Ingres
Associates Corp.
1975 Timeline Today
EDB Postgres Extended Server
EDB Postgres Extended Server
Replication Enhancements
Replication Enhancements
Enables EDB Postgres Distributed functionality such as:
PostgreSQL
Group Commit, Commit at Most Once, and Eager all-node synchronous
replication
Timestamp-based Snapshots
Estimates for Replication Catch-up times
Selective Backup of a Single Database
Hold back freezing to assist resolution of UPDATE/DELETE conflicts
Multi-node PITR
Application Assessment
Only available for use with an additional subscription for
Extreme HA
EDB Postgres Advanced Server
EDB Postgres Advanced Server
Oracle Compatibility - Compatibility for schemas, data types, indexes, users,
Oracle Compatibility roles, partitioning, packages, views, PL/SQL triggers, stored procedures,
functions, and utilities
Additional Security Additional Security - Password policy management, session tag auditing, data
redaction, SQL injection protection, and procedural language code obfuscation
Developer Productivity Developer Productivity - Over 200 pre-packaged utility functions, user-defined
object types, autonomous transactions, nested tables, synonyms, advanced
queueing
DBA Productivity
DBA Productivity - Throttle CPU and I/O at the process level, over 55 extended
catalog views to profile all the objects and processing that occurs in the
Performance database
Performance - Query optimizer hints, SQL session/system wait diagnostics
Replication Enhancements
Replication Enhancements - Enables EDB Postgres Distributed functionality such
as Group Commit, Commit at Most Once and Eager all-node synchronous
PostgreSQL replication, timestamp-based snapshots, estimates for replication catch-up
times, selective backup of a single database, hold back freezing to assist
resolution of UPDATE/DELETE conflicts, multi-node PITR
Database Servers - High Level Overview
EDB Postgres EDB Postgres Advanced EDB Postgres Advanced
Database Server PostgreSQL
Extended Server Server: Berkeley Server: Redwood
SQL Compatibility PostgreSQL PostgreSQL PostgreSQL + Oracle
Binary Compatibility Yes No No No
Advanced PGD Features ✔ 14+ 14+
Transparent Data Encryption 15+ 15+ 15+
Advanced Security ✔ ✔
Advanced SQL ✔ ✔
Advanced Performance ✔ ✔
Resource Manager ✔ ✔
Bulk Data Loader ✔ ✔
Oracle Compatibility ✔
Capabilities And Tools
Management/Monitoring High Availability Backup and Recovery
Postgres Enterprise Manager EDB Postgres Distributed Barman
pgAdmin Failover Manager pgBackRest
Repmgr
Patroni
Migration Integration Kubernetes
Migration Portal Connectors EDB Postgres for Kubernetes
Migration Toolkit Foreign Data Wrappers CloudNativePG
Replication Server Connection Poolers
Major Features
Portable:
Written in ANSI C
Supports Windows, Linux, Mac OS/X and major UNIX platforms
Reliable:
ACID Compliant
Supports Transactions and Savepoints
Uses Write Ahead Logging (WAL)
Scalable:
Uses Multi-version Concurrency Control
Table Partitioning and Tablespaces
Parallel Sequential Scans, DDL(Table and Index Creation)
Major Features (continued)
Secure: Advanced:
Employs Host-Based Access Control, SSL Supports Triggers, Functions and Procedures
Connections and Logging using Custom Procedural Languages
Provides Object-Level Permissions and Row Major Database Version Upgrades using
Level Security pg_upgrade
Recovery and Availability: Unlogged Tables and Materialized Views
Physical and Logical Streaming Replication
Support for Sync, Async and Cascaded
Replication
Supports Hot-Backup using pg_basebackup
and Point-in-Time Recovery
Postgres for Big Data
Postgres enables you to support a
wider range of workloads with
your relational database
An Object-relational design and decades
of proven reliability make Postgres the
most flexible, extensible and performant
database available
Document store capabilities: XML, JSON,
PLV8; HStore (key-value store); non-
durable storage; full text indexing
Architectural Overview
Connectors
PERL DBI
NODE.JS
PYTHON
LIBPQ
ODBC
ECPG
JDBC
.NET
TCL
PostgreSQL
Shared Background OS Kernel
User Process Storage
Memory Processes Cache
General Database Limits
Limit Value
Maximum Database Size Unlimited
Maximum Table Size 32 TB
Maximum Row Size 1.6 TB
Maximum Field Size 1 GB
Maximum Rows per Table Unlimited
Maximum Columns per Table 250-1600 (Depending on Column types)
Maximum Indexes per Table Unlimited
Common Database Object Names
Industry Term Postgres Term
Table or Index Relation
Row Tuple
Column Attribute
Data Block Page (when block is on disk)
Page Buffer (when block is in memory)
Lab Setup Guidelines
All the instructor demos and labs are based on Linux
Rocky Linux machine or virtual machine with at least 1 GB RAM and 20
GB storage space is recommended
Participants using Linux must follow instructor during the installation
module and install PostgreSQL
Module Summary
EDB Portfolio
History of PostgreSQL
Major Features
Architectural Overview
General Database Limits
Common Database Object Names
System
Architecture
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Architectural Summary Commit and Checkpoint
Process and Memory Architecture Statement Processing
Utility Processes Physical Database Architecture
Connection Request-Response Data Directory Layout
Disk Read Buffering Installation Directory Layout
Disk Write Buffering Page Layout
Background Writer Cleaning Scan
Architectural Summary
Postgres uses processes, not threads
The “Postmaster” process acts as a supervisor
Several utility processes perform background work
Postmaster starts them and restarts them if they die
One backend process is created per user session
Postmaster listens for new connections
Process and Memory Architecture
Postmaster
Shared Memory
Shared Buffers WAL Buffers Process Array
BGWRITER LOGGER
Data WAL Archived
Files Segments WAL
CHECKPOINTER ARCHIVER
AUTOVACUUM LOGICAL
REPLICATION
Error Log
Files
WAL WRITER
Utility Processes
Background writer
Writes dirty data blocks to disk
WAL writer
Flushes write-ahead log to disk
Checkpointer
Automatically performs a checkpoint based on config parameters
Logging collector
Routes log messages to syslog, eventlog, or log files
More Utility Process
Autovacuum launcher
Starts Autovacuum workers as needed
Autovacuum workers
Recover free space for reuse
Archiver
Archives write-ahead log files
Logical replication launcher
Starts logical replication apply process for logical replication
Postmaster as Listener
Client requests a
connection
Postmaster
Postmaster is the main process called postgres
Listens on 1, and only 1, tcp port
Receives client connection requests Shared Memory
27
User Backend Process
Postmaster process spawns a new Postmaster
server process for each connection
request detected
Communication is done using
semaphores and shared memory work_mem
Postgres
Authentication - IP, user and password
Shared Memory
Authorization - Verify permissions
28
Respond to Client
Postmaster
work_mem
Postgres
User backend process called postgres
Callback to client
Shared Memory
Waits for SQL
Query is transmitted using plain text
Disk Read Buffering Postgres Postgres Postgres
Postgres buffer cache
(shared_buffers) reduces
OS reads Shared (data) Buffers
Read the block once, then Shared Memory
examine it many times in
cache
Stable
Databases
Disk Write Buffering Postgres Postgres Postgres
Blocks are written to disk
only when needed:
Shared (data) Buffers
To make room for new blocks
Shared Memory
At checkpoint time
CHECKPOINT
Stable
Databases
Background Writer
Cleaning Scan Postgres Postgres Postgres
Background writer scan
attempts to ensure an
adequate supply of clean Shared (data) Buffers
buffers
Back end write dirty
buffers as need BGWRITER
Stable
Databases
Write Ahead Logging (WAL) Postgres Postgres Postgres
Back end write data to
WAL buffers Shared Memory
Flush WAL buffers Shared (data) WAL
periodically (WAL writer), Buffers Buffers
on commit, or when
buffers are full
Group commit Transaction
Log
Stable
Databases
Transaction Log Archiving Postgres Postgres Postgres
Archiver spawns a task
Shared Memory
to copy away pg_wal
log files when full
Shared (data) WAL
Buffers Buffers
Transaction
Stable Log
Databases
Archive
Command
Commit and Checkpoint
Before commit Uncommitted updates are in memory
WAL buffers are written to the disk
After commit (write-ahead log file) and shared
buffers are marked as committed
Modified data pages are written from
After checkpoint shared memory to the data files
Statement Processing
Optimize
Check syntax
Call traffic cop Execute query based on
Identify query type query plan
Command processor if Planner generates a plan
needed Uses database statistics
Break query into tokens Apply Optimizer Hints
Query cost calculation
Choose best plan
Parse Execute
Physical Database Architecture
Database Cluster
Collection of databases managed by single server instance
Each cluster has a separate
Data Directory TCP port Set of Processes
Databases
A cluster can contain multiple databases
Installation Directory Layout
Default Installation Directory Location:
Linux - /usr/pgsql-16
bin – Programs
lib – Libraries
share – Shared data
Default Data directory - /var/lib/pgsql/16/data
Database Cluster Data Directory Layout
DATA
Status Configuration Postmaster
global base pg_tblsc pg_wal pg_log log
Directories Files Info Files
Cluster wide Contains Symbolic link to Write ahead Startup logs Error logs pg_xact, pg_multiexact, postgresql.conf,
database Databases tablespaces logs pg_snapshots, pg_stat, pg_hba.conf,
objects pg_subtrans,pg_notify, pg_ident.conf,
pg_serial, pg_replslot, postgresql.auto.conf
pg_logical,
pg_dynshmem
Physical Database Architecture
File-per-table, file-per-index
A table-space is a directory
Each database that uses that table-space gets a subdirectory
Each relation using that table-space/database combination gets one or more files, in 1GB chunks
Additional files are used to hold auxiliary information (free space map, visibility map)
Each file name is a number (see pg_class.relfilenode)
Sample - Data Directory Layout 14297
14307 14300
Database OID
14405
base
14312 14498
DATA
pg_tblsc
16650
Tablespace OID
14299
/storage/pg_tab
14307 14301
16700
16651 16701
Page Layout
Page header Row/index entry
General information about the page The actual row or index entry data
Pointers to free space Special
24 bytes long Index access method specific data
Row/index pointers Empty in ordinary tables
Array of offset/length pairs pointing to the
actual rows/index entries
4 bytes per item
Free space
Unallocated space
New pointers allocated from the front, new
rows/index entries from the rear
Page Structure
Page
Item Item Item
Header
8K
Tuple
Tuple Tuple Special
Module Summary
Architectural Summary Background Writer Cleaning Scan
Shared Memory Commit and Checkpoint
Inter-processes Communication Physical Database Architecture
Statement Processing Data Directory Layout
Utility Processes Installation Directory Layout
Disk Read Buffering Page Layout
Disk Write Buffering
PostgreSQL
Installation
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Deployment Options
OS User and Permissions
Package Installation
Installation Example and Practice Labs
Setting Environmental Variables
Deployment Options
Deployment methods for PostgreSQL and supported Tools:
BigAnimal: Fully managed database-as-a-service.
CloudNativePG: Operator designed for managing PostgreSQL workloads on Kubernetes clusters.
Native packages or installers: PostgreSQL Yum Repository can be used for YUM and RPM based
installation
Source Code Installation: PostgreSQL source code is open-source and free to use
Note: This training is based on PostgreSQL YUM Repository deployment.
OS User and Permissions
PostgreSQL runs as a daemon (Unix / Linux) or service (Windows)
The PostgreSQL Installation requires superuser/admin access
All processes and data files must be owned by a user in the OS
During installation, a postgres locked user will be created on Linux
On Windows a password is required
SELinux must be set to permissive mode on systems with SELinux
The postgres User Account
It is advised to run Postgres under a separate user account
This user account should only own the data directory that is
managed by the server
The useradd or adduser Unix command can be used to add a user
The user account named postgres is used throughout this training
Practice Lab - Add postgres User
Connect to your Linux machine as root or sudo user
Use useradd command to create a new user:
[root@Base ~]# useradd postgres
[root@Base ~]# passwd postgres
Changing password for user postgres.
New password:
Retype new password:
passwd:
all authentication tokens updated successfully.
Package Installation Options
Wizard Installer RPM Installer YUM/APT Installer
Interactive Method Preferred Installation Attempt to Install
Method on Linux required package
Graphical or Command dependencies
Line Mode, available for Dependencies are
Windows resolved manually Can be used to install
Postgres in Isolated
Easy Download from Environments
www.enterprisedb.com
YUM Installation
Configure Repositories
PostgreSQL can be installed using yum or apt repo:
https://www.postgresql.org/download/linux/redhat/
https://www.postgresql.org/download/linux/ubuntu/
On this page, select the version and the platform on which
PostgreSQL needs to be installed
It will provide you with repository location and the post installation
steps to be performed for setup of the initial database cluster
Example – Download the PostgreSQL YUM Repository
Example – Download the PostgreSQL APT Repository
Practice Lab - Install PostgreSQL on Rocky Linux
Install the repository RPM:
sudo dnf install -y
https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-
x86_64/pgdg-redhat-repo-latest.noarch.rpm
Disable the built-in PostgreSQL module:
sudo dnf -qy module disable postgresql
Install PostgreSQL:
sudo dnf install -y postgresql16-server
Configure a Package Installation using service configuration file(Optional)
# /usr/lib/systemd/system/postgresql-16.service
Create a database cluster and start the cluster using services:
sudo /usr/pgsql-16/bin/postgresql-16-setup initdb
sudo systemctl enable postgresql-16
sudo systemctl start postgresql-16
After Installation
Database Cluster Defaults
Data directory – /var/lib/pgsql/16/data
Default authentication – peer and scram-sha-256
Default database superuser – postgres
Default password of database superuser – blank
Default port – 5432
Practice Lab - Connecting to a Database
Connect to the default database using psql and change password of the superuser postgres:
[root@pgsrv1 ~]# su - postgres
[postgres@pgsrv1 ~]$ /usr/pgsql-16/bin/psql -d postgres -U postgres
postgres=# ALTER USER postgres PASSWORD 'postgres';
ALTER ROLE
postgres=# \q
Change the authentication method to scram-sha-256 in pg_hba.conf file and reload the server
[postgres@pgsrv1 ~]$ vi /var/lib/pgsql/16/data/pg_hba.conf
local all all scram-sha-256
host all all 127.0.0.1/32 scram-sha-256
host all all ::1/128 scram-sha-256
[postgres@pgsrv1 ~]$ /usr/pgsql-16/bin/pg_ctl -D /var/lib/pgsql/16/data/ reload
server signaled
Setting Environmental Variables
Setting environment variables is very important for trouble free
startup/shutdown of the database server
PATH – should point to correct bin directory
PGDATA – should point to correct data cluster directory
PGPORT – should point to correct port on which database cluster is running
PGUSER – specifies the default database user name
PGDATABASE – specify the default database
PGPASSWORD – specify default password
Edit .profile or .bash_profile to set the variables
In Windows set these variables using my computer properties page
Example – Environmental Variables setup
[postgres@pgsrv1 ~]$ vi .bash_profile
Edit User Profile
PATH=/usr/pgsql-16/bin/:$PATH:$HOME/.local/bin:$HOME/bin
export PATH
export PGDATA=/var/lib/pgsql/16/data/
export PGUSER=postgres Logoff and Login
export PGPORT=5432
export PGDATABASE=postgres
[postgres@pgsrv1 ~]$ exit
logout
[root@pgsrv1 ~]# su - postgres Verify
Environmental
[postgres@pgsrv1 ~]$ which psql
Settings
/usr/pgsql-16/bin/psql
[postgres@pgsrv1 ~]$ pg_ctl status
pg_ctl: server is running (PID: 1663)
/usr/pgsql-16/bin/postgres "-D" "/var/lib/pgsql/16/data/"
Module Summary
Deployment Options
OS User and Permissions
Package Installation
Installation Example and Practice Labs
Setting Environmental Variables
Lab Exercise - 1
Choose the platform on which you want to install PostgreSQL
Download the PostgreSQL installer from the postgresql.org for
the chosen platform
Prepare the platform for installation
Install PostgreSQL and connect to a database using psql
User Tools -
Command Line
Interfaces
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Introduction to psql
Connecting to Database
psql Command Line Parameters
psql Meta-Commands
Conditional and Information Commands
psql
Introduction to psql
psql is a command line interface (CLI) to Postgres
Can be used to execute SQL queries and psql meta commands
[postgres@pgsrv1 ~]$ psql -p 5432 -U postgres -d postgres
Password for user postgres:
psql (16.0)
Type "help" for help.
postgres=# \q
Connecting to a Database
psql Connection Options: Environmental Variables
-d <Database Name> PGDATABASE, PGHOST,
PGPORT and PGUSER
-h <Hostname>
-p <Database Port>
-U <Database Username>
Conventions
psql has its own set of commands, all of which start with a backslash (\).
Some commands accept a pattern. This pattern is a modified regex. Key points:
* and ? are wildcards
Double-quotes are used to specify an exact name, ignoring all special characters and
preserving case
On Startup…
During startup, psql considers environment variables for connection
psql will then execute commands from $HOME/.psqlrc file, this can be
skipped using -X option
-f FILENAME will execute the commands in FILENAME, then exit
-c COMMAND will execute COMMAND (SQL or internal) and then exit
--help will display all the startup options, then exit
--version will display version info and then exit
Entering Commands
psql uses the command line editing capabilities that are available
in the native OS. Generally, this means:
Up and Down arrows cycle through command history
On UNIX, there is tab completion for various things, such as SQL commands
History and Query Buffer
\s will show the command history
\s FILENAME will save the command history
\e will edit the query buffer and then execute it
\e FILENAME will edit FILENAME and then execute it
\w FILENAME will save the query buffer to FILENAME
Controlling Output
psql -o FILENAME or meta command \o FILENAME will send query
output (excluding STDERR) to FILENAME
\g FILENAME executes the query buffer sending output to FILENAME
\watch <seconds> can be used to run previous query repeatedly
Advanced Features - Variables
psql provides variable substitution
Variables are simply name/value pairs
Use \set meta command to set a variable
=> \set city Edmonton
=> \echo :city
Edmonton
Use \unset to delete a variable
=> \unset city
Advanced Features - Special Variables
Settings can be changed at runtime by altering special variables
Some important special variables include:
AUTOCOMMIT, ENCODING, HISTFILE, ON_ERROR_ROLLBACK, ON_ERROR_STOP, PROMPT1
and VERBOSITY
Example:
=# \set AUTOCOMMIT off
Once AUTOCOMMIT is set to off use COMMIT/ROLLBACK to complete the running transaction
Conditional Commands
Conditional commands primarily helpful for scripting
\if EXPR begin conditional block
\elif EXPR alternative within current conditional block
\else final alternative within current conditional block
\endif end conditional block
Information Commands
\d[(i|s|t|v|b|S)][+] [pattern]
List of objects (indexes, sequences, tables, views, tablespaces and dictionaries)
\d[+] [pattern]
Describe structure details of an object
\l[ist][+]
Lists of databases in a database cluster
Information Commands (continued)
\dn+ [pattern]
Lists schemas (namespaces)
+ adds permissions and description to output
\df[+] [pattern]
Lists functions
+ adds owner, language, source code and description to output
Common psql Meta Commands
\q or ^d or quit or exit
Quits the psql program
\cd [ directory ]
Change current working directory
Tip - To display your current working directory, use \! pwd
\! [ command ]
Executes the specified Unix or Windows command
If no command is specified, escapes to a separate Unix shell (CMD.EXE in Windows)
Help
\conninfo
Current connection information
\?
Shows help information about psql commands
\h [command]
Shows information about SQL commands
If command isn't specified, lists all SQL commands
psql --help
Lists command line options for psql
Module Summary
Introduction to psql
Connecting to Database
psql Command Line Parameters
psql Meta-Commands
Conditional and Information Commands
Prepare Lab Environment
In the training materials provided by EnterpriseDB there is a script file edbstore.sql that can be
executed using psql to create a sample edbstore database. Here are the steps:
Download the edbstore.sql file and place in a directory which is accessible to the postgres user
Login as postgres OS user
Run the psql command with the -f option to execute the edbstore.sql file and install all the sample
objects required for this training
psql -p 5432 -f edbstore.sql –d postgres –U postgres
Enter postgres database user password
After successful execution, a new database named edbstore owned by a new database user edbuser
is created. Default password for edbuser is edbuser
Connect to edbstore database and verify newly created objects using psql meta commands.
psql -p 5432 -h localhost –d edbstore –U edbuser
Lab Exercise - 1
In this lab exercise you will have a chance to practice what you have learned
through using command line interfaces:
1. Connect to a database using psql 9. Do the same thing, just saving data, not the
column headers
2. Switch databases
10. Create a script via another method, and
3. Describe the customers table
execute from psql
4. Describe the customers table
11. Turn on the expanded table formatting mode
including description
12. Lists tables, views and sequences with their
5. List all databases
associated access privileges
6. List all schemas
13. Which meta command displays the SQL text
7. List all tablespaces for a function?
8. Execute a sql statement, saving 14. View the current working directory
the output to a file
Database
Clusters
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Database Clusters
Creating a Database Cluster
Starting and Stopping the
Server (pg_ctl)
Connecting to the Server
Using psql
Database Clusters
A Database Cluster is a collection of databases managed by
a single server instance
Database Clusters are comprised of:
Data directory
Port
Default databases are created named:
template0
template1
postgres
Creating a Database Cluster
Choose the data directory location for new cluster
Initialize the database cluster storage area (data directory) using the initdb
utility
initdb will create the data directory if it doesn’t exist
You must have permissions on the parent directory so that initdb can create
the data directory
The data directory can be created manually by superuser and the ownership
can be given to postgres user
initdb Utility
$ initdb [OPTION]... [DATADIR]
Options:
-D, --pgdata location for this database cluster
-E, --encoding set default encoding for new databases
-U, --username database superuser name
-W, --pwprompt prompt for a password for the new superuser
-X, --waldir location for the write-ahead log directory
--wal-segsize size of wal segments , in megabytes
-k, --data-checksums use data page checksums
-?, --help show this help, then exit
If the data directory is not specified, the environment variable PGDATA is used
Example - initdb
[root@Base ~]# mkdir /edbstore
[root@Base ~]# chown postgres:postgres /edbstore
[root@Base ~]# su – postgres
[postgres@Base ~]$ initdb -D /edbstore --wal-segsize 1024 -W
In the above example the database --wal-segsize 1024 MB
system will be owned by user specifies the write-ahead log
postgres
file segment size
The postgres user is the database
superuser -W is used to force initdb to
prompt for the superuser
The default server config file will be
created in /edbstore named password
postgresql.conf
pg_ctl Utility
pg_ctl is a command line utility provided by Postgres to initialize, start, stop and
control a Postgres instance
It provides options for redirecting start log, controlled startup and shutdown
-D option or environmental variable PGDATA can be used to specify cluster
data directory
pg_ctl -D datadir
start stop restart reload status promote init logrotate kill
Starting a Database Cluster
After initializing a database cluster, a unique port must be assigned
Choose a unique port for postmaster in postgresql.conf
Start the database cluster using pg_ctl utility
Example:
[postgres@Base ~]$ vi /edbstore/postgresql.conf
port = 5434
[postgres@Base ~]$ pg_ctl -D /edbstore/ -l /edbstore/startlog start
waiting for server to start.... done
server started
[postgres@Base ~]$ pg_ctl -D /edbstore/ status
pg_ctl: server is running (PID: 62239)
Connecting To a Database Cluster
The psql and pgAdmin can be used for connections
[postgres@Base ~]$ psql -p 5434 -d edb -U postgres
Type "help" for help.
edb=# show port;
port
------
5434
(1 row)
edb=# show data_directory;
data_directory
----------------
/edbstore
(1 row)
edb=# \q
Reload a Database Cluster
Some configuration parameter changes do not require a restart
Changes can be reloaded using the pg_ctl utility
Changes can also be reloaded using pg_reload_conf()
Syntax:
$ pg_ctl reload [options]
-D location of the database cluster’s data directory
-s only print errors, no informational messages
Stopping a Database Cluster
pg_ctl supports three modes of shutdown
smart quit after all clients have disconnected
Fast quit directly, with proper shutdown (default)
immediate quit without complete shutdown; will lead to recovery
Syntax:
$ pg_ctl stop [-W] [-t SECS] [-D DATADIR] [-s] [-m SHUTDOWN-MODE]
Example:
[postgres@Base ~]$ pg_ctl -D /edbstore/ stop
waiting for server to shut down.... done
server stopped
[postgres@Base ~]$ pg_ctl -D /edbstore/ status
pg_ctl: no server running
View Cluster Control Information
pg_controldata can be used to view the control information for
a database cluster
It can be run with data directory as an option
[postgres@Base ~]$ pg_controldata /edbstore/
……………………………………………………………………………………………….
Database system identifier: 6724770293870218226
Database cluster state: shut down
Latest checkpoint location: 0/41A3AA40
Latest checkpoint's REDO WAL file: 000000010000000000000001
Latest checkpoint's TimeLineID: 1
Backup start location: 0/0
Backup end location: 0/0
wal_level setting: replica
Database block size: 8192
WAL block size: 8192
Data page checksum version: 0
Module Summary
Database Clusters
Creating a Database Cluster
Starting and Stopping the
Server (pg_ctl)
Connecting to the Server
Using psql
Lab Exercise - 1
1. A new website is to be developed for an online music store.
Create a new cluster edbdata with ownership of postgres user
Start your edbdata cluster
Reload your cluster with pg_ctl utility and using pg_reload_conf() function
Stop your edbdata cluster with fast mode
Configuration
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Server Parameter File - postgresql.conf
Viewing and Changing Server Parameters
Configuration Parameters - Security, Resources and WAL
Configuration Parameters - Error Logging, Planner and Maintenance
Viewing Compilation Settings
Using File Includes
Setting Server Parameters
There are many configuration parameters that effect the behavior of the
database system
All parameter names are case-insensitive
Every parameter takes a value of one of five types:
boolean
integer
floating point
string
enum
One way to set these parameters is to edit the file postgresql.conf, which is
normally kept in the data directory
The Server Parameter File - postgresql.conf
Holds parameters used by a cluster
Parameters are case-insensitive
Normally stored in data directory
initdb installs default copy
Some parameters only take effect on server restart (pg_ctl restart)
# used for comments
One parameter per line
Use include directive to read and process another file
Can also be set using the command-line option
Viewing and Changing Server Parameters
Configuration parameters Configuration parameters
can be viewed using: can be modified for:
SHOW command Single session using
the SET command
pg_settings
Database user using
pg_file_settings ALTER USER
Single database using
ALTER DATABASE
Changing Configuration Parameter at Cluster Level
Use ALTER SYSTEM command to edit
[postgres@pgsrv1 ~] psql edb postgres
cluster level settings without editing
postgresql.conf
edb=# ALTER SYSTEM SET work_mem=20480;
ALTER SYSTEM
edb=# SELECT pg_reload_conf(); ALTER SYSTEM writes new setting to
edb=# ALTER SYSTEM RESET work_mem;
postgresql.auto.conf file which is read
ALTER SYSTEM at last during server reload/restarts
edb=# SELECT pg_reload_conf();
Parameters can be modified using
ALTER SYSTEM when required
Connection Settings
listen_addresses (default *) - Specifies the addresses on which the server is to listen for
connections. Use * for all
port (default 5432) - The port the server listens on
max_connections (default 100) - Maximum number of concurrent connections the server can
support
superuser_reserved_connections (default 3) - Number of connection slots reserved for
superusers
reserved_connections (default 0) - Reserved slots for users with
pg_use_reserved_connections role
unix_socket_directory (default /tmp) - Directory to be used for UNIX socket connections to the
server
unix_socket_permissions (default 0777) - access permissions of the Unix-domain socket
Security and Authentication Settings
authentication_timeout (default is 1 minute) – Maximum time to
complete client authentication, in seconds
row_security (default is on) – Controls row security policy behavior
password_encryption (default scram-sha-256) – Determines the
algorithm to use to encrypt password
ssl (default: off) - Enables SSL connections
SSL Settings
ssl_ca_file - Specifies the name of the file containing the SSL server certificate
authority (CA)
ssl_cert_file - Specifies the name of the file containing the SSL server certificate
ssl_key_file - Specifies the name of the file containing the SSL server private key
ssl_ciphers - List of SSL ciphers that may be used for secure connections
ssl_dh_params_file – Specifies file name for custom OpenSSL DH paramters
Memory Settings
maintenance_ autovacuum
shared_buffers temp_buffers work_mem temp_file_limit
work_mem _work_mem
Amount of
Amount of Amount of Amount of Amount of
Size of memory
memory memory memory disk space
shared used
used sorting used for used by used for
buffer pool caching
and hashing maintenance autovacuum temporary
for a cluster temporary
operations commands worker files
tables
Server Session
Query Planner Settings
random_page_cost (default 4.0) - Estimated cost of a random page fetch.
May need to be reduced to account for caching effects
seq_page_cost (default 1.0) - Estimated cost of a sequential page fetch.
effective_cache_size (default 4GB) - Used to estimate the cost of an index
scan.
plan_cache_mode (default auto) – Controls custom or generic plan execution
for prepared statements. Can be set to auto, force_custom_plan and
force_generic_plan
Write Ahead Log Settings
wal_level (default replica) - Determines how much information is written to the WAL. Other
values are minimal and logical
fsync (default on) – Force WAL buffer flush at each commit, Turning this off can cause lead to
arbitrary corruption in case of a system crash
wal_buffers (default -1, autotune) - The amount of memory used in shared memory for WAL
data. The default setting of -1 selects a size equal to 1/32nd (about 3%) of shared_buffers
min_wal_size (default 80 MB) – The WAL size to start recycling the WAL files
max_wal_size (default 1GB) – The WAL size to start checkpoint. Controls the number of WAL
Segments(16MB each) after which checkpoint is forced
checkpoint_timeout (default 5 minutes) - Maximum time between checkpoints
wal_compression (default off) – The WAL of Full Page write will be compressed and written
Where To Log
Controls logging type for a database cluster.
log_destination
Can be set to stderr, csvlog, jsonlog, syslog, and eventlog
Enables logger process to capture stderr and csv logging messages
logging_collector
These messages can be redirected based on configuration settings
log_directory - Directory where log files are written
Log File and log_filename - Format of log file name (e.g. postgresql-%Y-%m-%d_%H%M%S.log)
Directory Settings log_file_mode - permissions for log files
log_rotation_age - Used for file age-based log rotation
log_rotation_size - Used for file size-based log rotation
When To Log
Messages of this severity level or above are sent to
log_min_messages
the server log
Duration and
sampling
When a message of this severity or higher is written to the
log_min_error_statement
server log, the statement that caused it is logged along with it
When a statement runs for at least this long, it is
log_min_duration_statement
written to the server log
Logs any Autovacuum activity running for at
log_autovacuum_min_duration
least this long
Percentage of queries(above
log_statement_sample_rate
log_autovacuum_min_duration) to be logged
Sample a percentage of transactions by logging
log_transaction_sample_rate
statements
What To Log
log_connections Log successful connections to the server log
Log some information each time a session disconnects, including the duration of
log_disconnections the session
log_temp_files Log temporary files of this size or larger, in kilobytes
log_checkpoints Causes checkpoints and restart points to be logged in the server log
log_lock_waits Log information if a session is waits longer then deadlock_timeout to acquire a lock
log_error_verbosity How detailed the logged message is. Can be set to default, terse or verbose
Additional details to log with each line. Default is '%m [%p] ‘ which logs a timestamp
log_line_prefix and the process ID
log_statement Legal values are none, ddl, mod (DDL and all other data-modifying statements), or all
Background Writer Settings
bgwriter_delay (default 200 ms) - Specifies time between activity rounds for
the background writer
bgwriter_lru_maxpages (default 100) - Maximum number of pages that the
background writer may clean per activity round
bgwriter_lru_multiplier (default 2.0) - Multiplier on buffers scanned per round.
By default, if system thinks 10 pages will be needed, it cleans 10 *
bgwriter_lru_multiplier of 2.0 = 20
Primary tuning technique is to lower bgwriter_delay
Statement Behavior
search_path - This parameter specifies the order in which schemas are
searched. The default value for this parameter is "$user", public
default_tablespace - Name of the tablespace in which objects are created by
default
temp_tablespaces - Tablespaces name(s) in which temporary objects are
created
statement_timeout - Postgres will abort any statement that takes over the
specified number of milliseconds A value of zero (the default) turns this off
idle_in_transaction_session_timeout – Terminates any session with an open
transaction that has been idle for longer than the specified duration in
milliseconds
Parallel Query Scan Settings
Advanced Server supports parallel execution of read-only queries
Can be enabled and configured by using configuration parameters
max_parallel_workers_per_gather (default 2): Enables parallel query scan
parallel_tuple_cost (default 0.1): Estimated cost of transferring one tuple from a parallel worker
process to another process
parallel_setup_cost (default 1000): Estimates cost of launching parallel worker processes
min_parallel_table_scan_size (default 8MB): Sets minimum amount of table data that must be
scanned in order for a parallel scan
min_parallel_index_scan_size (default 512 KB): Sets the minimum amount of index data that must
be scanned in order for a parallel scan
force_parallel_mode (default off): Useful when testing parallel query scan even when there is no
performance benefit
Parallel Maintenance Settings
PostgreSQL supports parallel processes for creating an index
Currently this feature is only available for btree index type
max_parallel_maintenance_workers (default 2): Enables parallel index creation
max_parallel_maintenance_workers=0 max_parallel_maintenance_workers=4
Vacuum Cost Settings
vacuum_cost_delay (default 0 ms) - The length of time, in milliseconds, that the process will
wait when the cost limit is exceeded
vacuum_cost_page_hit (default 1) - The estimated cost of vacuuming a buffer found in the
buffer pool
vacuum_cost_page_miss (default 10) - The estimated cost of vacuuming a buffer that must be
read into the buffer pool
vacuum_cost_page_dirty (default 20) - The estimated cost charged when vacuum modifies a
buffer that was previously clean
vacuum_cost_limit (default 200) - The accumulated cost that will cause the vacuuming process
to sleep
vacuum_buffer_usage_limit(default 256kb) - The size of the Buffer Access Strategy used by the
VACUUM and ANALYZE commands
Autovacuum Settings
autovacuum (default on) - Controls whether the autovacuum launcher runs, and
starts worker processes to vacuum and analyze tables
log_autovacuum_min_duration (default -1) - Autovacuum tasks running longer
than this duration are logged
autovacuum_max_workers (default 3) - Maximum number of autovacuum
worker processes which may be running at one time
autovacuum_work_mem (default -1, to use maintenance_work_mem) -
Maximum amount of memory used by each autovacuum worker
Just-in-Time Compilation
Just-in-Time(JIT) is a core
feature of Postgres for JIT configuration parameters:
accomplishing high performance
JIT in Postgres supports
accelerating expression
evaluation and tuple deforming
Preset Options - Read Only Parameters
Postgres sources are compiled using various settings.
Various read-only configuration parameters can be used to view build settings
block_size data_directory_mode
wal_block_size server_encoding
segment_size max_function_args
wal_segment_size max_index_keys
data_checksums ssl_library
Configuration File Includes
The postgresql.conf file can now contain include directives
Allows configuration file to be divided in separate files
Usage in postgresql.conf file:
include ‘filename’
include_dir ‘directory name’
Module Summary
Server Parameter File - postgresql.conf
Viewing and Changing Server Parameters
Configuration Parameters - Security, Resources and WAL
Configuration Parameters - Error Logging, Planner and Maintenance
Viewing Compilation Settings
Using File Includes
Lab Exercise - 1
1. You are working as a DBA. It is recommended to keep a backup copy of the
postgresql.conf file before making any changes. Make the necessary changes
in the server parameter file for the following settings:
Allow up to 200 connected users on the server
Reserve 10 connection slots for DBA users on the server
Maximum time to complete client authentication will be 10 seconds
Lab Exercise - 2
1. Working as a DBA is a challenging job and to track down certain activities
on the database server, logging has to be implemented. Go through the
server parameters that control logging and implement the following:
Save all the error message in a file inside the log folder in your cluster data directory
(e.g. c:\edbdata or /edbdata)
Log all queries which are taking more than 5 seconds to execute, and their time
Log the users who are connecting to the database cluster
Make the above changes and verify them
Lab Exercise - 3
1. Perform the following changes recommended by a senior DBA and
verify them. Set:
Shared buffer to 256MB
Effective cache for indexes to 512MB
Maintenance memory to 64MB
Temporary memory to 8MB
Lab Exercise - 4
1. Vacuuming is an important maintenance activity and needs to be
properly configured. Change the following autovacuum parameters
on the production server. Set:
Autovacuum workers to 6
Autovacuum threshold to 100
Autovacuum scale factor to 0.3
Auto analyze threshold to 100
Autovacuum cost limit to 100
Data Dictionary
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
The System Catalog Schema
System Information Tables
and Views
System Information and
Administration Functions
The System Catalog Schema
Stores information about table and other objects
Created and maintained automatically in pg_catalog schema
pg_catalog is always effectively part of the search_path
Contains:
System Tables like pg_class etc.
System Function like pg_database_size() etc.
System Views like pg_stat_activity etc.
System Information Tables
\dS in psql prompt will give you the list of pg_* tables and views
This list is from pg_catalog schema
pg_tables list of tables
pg_constraints list of constraints
pg_indexes list of indexes
pg_trigger list of triggers
pg_views list of views
More System Information Tables
Provides summary of the contents of the server
pg_file_settings configuration file
pg_policy Stores row level security for tables
Provides access to useful information about
pg_policies each row-level security policy in the database
System Information Functions
current_database() current_schema[()] pg_postmaster_start_time() version()
current_user current_schemas(boolean) pg_current_logfile() txid_status()
pg_conf_load_time() pg_jit_available()
System Administration Functions
current_setting, set_config Return or modify configuration variables
pg_cancel_backend Cancel a backend's current query
pg_terminate_backend Terminates backend process
pg_reload_conf Reload configuration files
pg_rotate_logfile Rotate the server's log file
pg_start_backup, pg_stop_backup Used with point-in time recovery
pg_ls_logdir() Returns the name, size, and last modified time of each file in the log directory
pg_ls_waldir() Returns the name, size, and last modified time of each file in the WAL directory
More System Administration Functions
Disk space used by a tablespace, database, relation or
pg_*_size total_relation (includes indexes and toasted data)
pg_column_size Bytes used to store a particular value
pg_size_pretty Convert a raw size to something more human-readable
File operation functions. Restricted to superuser use and
pg_ls_dir, pg_read_file only on files in the data or log directories
pg_blocking_pids() Function to reliably identify which sessions block others
System Information Views
pg_stat_activity Details of open connections and running transactions
pg_locks List of current locks being held
pg_stat_database Details of databases
pg_stat_user_* Details of tables, indexes and functions
pg_stat_archiver Status of the archiver process
pg_stat_progress_basebackup View pg_basebackup progress
pg_stat_progress_vacuum Provides progress reporting for VACUUM operations
pg_stat_progress_analyze Provides progress details for ANALYZE operations
pg_hba_file_rules Provides a summary of the contents of the client authentication configuration file, pg_hba.conf
pg_stat_io Provides I/O information
Module Summary
The System Catalog Schema
System Information Tables
and Views
System Information and
Administration Functions
Lab Exercise - 1
1. You are working with
different schemas in a
database. After a while you
need to determine all the
schemas in your search
path. Write a query to find
the list of schemas currently
in your search path.
Lab Exercise - 2
1. You need to determine the
names and definitions of all of
the views in your schema.
Create a report that retrieves
view information - the view
name and definition text.
Lab Exercise - 3
1. Create a report of all the users
who are currently connected. The
report must display total session
time of all connected users.
2. You found that a user has
connected to the server for a very
long time and have decided to
gracefully kill its connection. Write
a statement to perform this task.
Lab Exercise - 4
1. Write a query to display the
name and size of all the
databases in your cluster.
Size must be displayed using
a meaningful unit.
Creating and
Managing
Databases
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Object Hierarchy
Users and Roles
Tablespaces
Databases
Access Control
Creating Schemas
Schema Search Path
Object Hierarchy
Database
Cluster
Users/Groups
Database Tablespaces
(Roles)
Catalogs Schema Extensions
Event
Table View Sequence Functions
Triggers
Users and Roles
Database Users
Are global within a database cluster
Are not the operating system users
Are used for connecting to a database
Have a unique name not starting with pg_
postgres is a predefined superuser
Creating Users Using psql
How to create? CREATE USER sql command
How to delete? DROP USER sql command
superuser or createrole privilege is required for creating a database user
Syntax: Example:
CREATE USER name [ [ WITH ] option [ ... ]
where option can be:
SUPERUSER | CREATEDB | CREATEROLE |
INHERIT | LOGIN | NOLOGIN | REPLICATION
| BYPASSRLS |
| CONNECTION LIMIT connlimit | [
ENCRYPTED ] PASSWORD 'password'
| VALID UNTIL 'timestamp'
Creating Users Using createuser
The createuser utility can also be used to create a user
Syntax:
$ createuser [OPTION]... [ROLENAME]
Use --help option to view the full list of options available
Example:
Roles
Role is a collection of cluster and object level privileges
Role makes it easier to manage multiple privileges
How to create? CREATE ROLE statement
How to assign? GRANT statement
Who it can be assigned to? user or a group
Predefined Roles
Provide certain administrative capabilities using these default roles
Example: create a new user with read only access or a new user with
access to view monitoring data only
pg_checkpoint pg_read_server_files
pg_database_owner pg_signal_backend
pg_execute_server_program pg_stat_scan_tables
pg_monitor pg_write_all_data
pg_read_all_data pg_create_subscription
pg_read_all_settings pg_use_reserved_connections
pg_read_all_stats pg_write_server_files
Tablespaces
Tablespaces and Data Files
Data is stored logically in tablespaces and physically in data files
Tablespaces:
Can belong to only one database cluster
Consist of multiple data files
Can be used by multiple databases
Data Files:
Can belong to only one tablespace
Are used to store database objects
Cannot be shared by multiple tables (one or more per table)
Advantages of Tablespaces
Control the disk layout for a database cluster
Store indexes and data physically separated for performance
Indexes
Tablespace A
Database Instance
Fast Storage
Transactional Tables
Historical Tables
Tablespace B
Slow Storage
Seldom Used Partition
Pre-Configured Tablespaces
PGDATA/global directory
pg_global
Tablespace
Database Instance
Cluster-wide tables and catalog objects
PGDATA/base directory
pg_default
Tablespace
Databases, schemas and other objects
Creating Tablespaces
Tablespace Physical Cluster Data
How to create? CREATE Directory Directory
TABLESPACE command
The tablespace directory must
be existing with permissions Directory(Database
pg_tblspc
Catalogue Version)
Syntax:
CREATE TABLESPACE
Database Directory Symbolic
tablespace_name [ OWNER for each Database Link(Tablespace OID)
user_name ]
LOCATION 'directory‘;
Database
Objects(Files)
Example - CREATE TABLESPACE
[training@Base ~]$ sudo mkdir /newtab1
[training@Base ~]$ sudo chown postgres:postgres /newtab1
[training@Base ~]$ su - postgres
[postgres@Base ~]$ psql -p 5432 postgres postgres
postgres=# CREATE TABLESPACE fast_tab LOCATION '/newtab1';
CREATE TABLESPACE
postgres=# \db
List of tablespaces
Name | Owner | Location
------------+--------------+----------
fast_tab | postgres | /newtab1
pg_default | postgres |
pg_global | postgres |
(3 rows)
Using Tablespaces
Use the TABLESPACE keyword while creating databases, tables and indexes
edb=# CREATE TABLE account(acno INT PRIMARY KEY,
ac_hldr_fname VARCHAR(20)) TABLESPACE fast_tab;
CREATE TABLE
Default and Temp Tablespace
default_tablespace server parameter sets default tablespace
default_tablespace parameter can also be set using the SET command at the session level
temp_tablespaces parameter determines the placement of temporary tables and indexes and
temporary files
temp_tablespaces can be a list of tablespace names
edb=# show default_tablespace;
default_tablespace
--------------------
(1 row)
edb=# show temp_tablespaces;
temp_tablespaces
------------------
(1 row)
Altering Tablespaces
ALTER TABLESPACE can be used to rename a tablespace, change ownership
and set a custom value for a configuration parameter
Only the owner or superuser can alter a tablespace
The seq_page_cost and random_page_cost parameters can be altered
for a tablespace
Example - Alter Tablespace
Syntax:
ALTER TABLESPACE name RENAME TO new_name
ALTER TABLESPACE name OWNER TO { new_owner | CURRENT_USER | SESSION_USER }
ALTER TABLESPACE name SET ( tablespace_option = value [, ... ] )
ALTER TABLESPACE name RESET ( tablespace_option [, ... ] )
edb=# ALTER TABLESPACE fast_tab RENAME TO new_tab;
ALTER TABLESPACE
edb=# \db
List of tablespaces
Name | Owner | Location
------------+--------------+----------
new_tab | postgres | /newtab
pg_default | postgres |
pg_global | postgres |
Dropping a Tablespace
DROP TABLESPACE removes a tablespace from the system
Only the owner or superuser can drop a tablespace
The tablespace must be empty
If a tablespace is listed in the temp_tablespaces parameter,
make sure current sessions are not using the tablespace
DROP TABLESPACE cannot be executed inside a transaction
Databases
What Is a Database?
A database is a named collection of SQL objects
A running Postgres instance can manage multiple databases
How to create? CREATE DATABASE command
How to delete? DROP DATABASE command
To determine the set of existing databases:
SQL - SELECT datname FROM pg_database;
psql META COMMAND - \l (backslash lowercase L)
Creating Databases
Database can be created using:
1. createdb utility program
2. CREATE DATABASE SQL command
SQL Command syntax:
CREATE DATABASE name [ [ WITH ] [ OWNER [=] user_name ]
[ TEMPLATE [=] template ]
[ ENCODING [=] encoding ]
[ TABLESPACE [=] tablespace_name ]
[ ALLOW_CONNECTIONS [=] allowconn ]
[ CONNECTION LIMIT [=] connlimit ]
Example - Creating Databases
Accessing a Database
pgAdmin4 or psql can be used to access a database
To use psql, open a terminal and execute:
$ psql –U postgres –d prod
Note: If PATH is not set you can execute psql command from the bin
directory of postgres installation
Privileges
Cluster level
Granted to a user during CREATE or later using ALTER USER
These privileges are granted by superuser
Object Level
Granted to user using GRANT command
These privileges allow a user to perform particular actions on a database object, such as
tables, views, or sequence
Can be granted by owner, superuser or someone who has been given permission to grant
privileges (WITH GRANT OPTION)
GRANT Statement
Grants object level privileges to database users, groups or roles
GRANT can also be used to grant a role to a user
How to view syntax and available privileges?
Type \h GRANT in psql
Example – GRANT Statement
REVOKE Statement
Revokes object level privileges from database users, groups or roles
REVOKE [ GRANT OPTION FOR ] can be used to revoke only the grant
option without revoking the actual privilege
How to view syntax and available privileges?
Type \h REVOKE in psql
Example - REVOKE Statement
Database Schemas
What is a Schema
SCHEMA
Tables Views
Owns
Sequences Functions
USER
Domains
Benefits of Schemas
A database can contain one or more named schemas
By default, all databases contain a public schema
There are several reasons why one might want to use schemas:
To allow many users to use one database without interfering with each other
To organize database objects into logical groups to make them more manageable
Third-party applications can be put into separate schemas so they cannot collide
with the names of other objects
Creating Schemas
Schemas can be added using the CREATE SCHEMA SQL command
Syntax:
CREATE SCHEMA IF NOT EXISTS schema_name [ AUTHORIZATION
role_specification ]
Example:
What is a Schema Search Path
The schema search path determines which schemas are searched
for matching table names
Search path is used when fully qualified object names are not used
in a query
Example:
SELECT * FROM employee;
This statement will find the first employee table from the schemas listed in the
search path
Determine the Schema Search Path
To show the current search path, execute the following command in psql:
SHOW search_path;
Default search_path is "$user",public
Modifying search_path:
Cluster/Instance Level: postgresql.conf or ALTER SYSTEM
Database Level: ALTER DATABASE
User Lever: ALTER USER
Session Level: SET
Object Ownership
Database
Cluster
Owner
Users/Groups
Database Tablespaces
(Roles)
Catalogs Schema Extensions
Event
Table View Sequence Functions
Triggers
Module Summary
Object Hierarchy
Users and Roles
Tablespaces
Databases
Access Control
Creating Schemas
Schema Search Path
Lab Exercise - 1
An e-music online store website
application developer wants to add an
online buy/sell facility and has asked you
to separate all tables used in online
transactions. Here you have suggested to
use schemas. Implement the following
suggested options:
Create an ebuy user with password ‘lion’
Create an ebuy schema which can be used
by user ebuy
Login as the ebuy user, create a table
sample1 and check whether that table
belongs to the ebuy schema or not
Lab Exercise - 2
Retrieve a list of databases
using a SQL query
Retrieve a list of databases
using the psql meta command
Retrieve a list of tables in the
edbstore database and check
which schema and owner they
have
Database
Security
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Database Security Requirements and
Protection Plan
Levels of Security in Postgres
Access Control using pg_hba.conf
Introduction to Row Level Security
Data Encryption
General Security Recommendations
Why Database Security
Databases are a core component
of many computing systems
Confidential data like SIN(Social
Insurance Number), Healthcare,
Banking details is stored and
shared using databases
It is very critical to protect stored
information from hackers,
insiders, and other groups who
intend to steal valuable data
Database Security is a
mechanism to protect the data
against threats
183
Data Security Requirements
Stopping improper disclosure,
modification and denial of access
to information is very important
Who wants an employee finding
out boss’s salary, changing his/her
salary or stopping HR from
printing paychecks
Database Security Requirements
includes:
Confidentiality
Integrity
Availability
Protection Plan –
We all need one
Access Control
Prevent
Authentication and Authorization
Data Control
Views, Row Level Security, Encryptions
Network Control Attacks
SSL Connections, Firewalls
Protect Discover
Auditing
Monitoring
Levels of Security
Server and Check Client IP
Application pg_hba.conf
User/Password
Connect Privilege Database
Schema Permissions
Table Level Privileges
Object
Grant/Revoke
Host Based Access Control
Host Based Access Control
pg_hba.conf
Postmaster Certificate
Client
IP: 10.8.99.30
User: appuser1
pg_hba.conf can be used to restrict the ability to connect to a database
SSL can be forced for selected clients based on hostname or IP address
Different authentication methods can be used
Superuser access can be locked down to certain IPs using pg_hba.conf
Host-Based Access Control File - pg_hba.conf
Location: Cluster data directory
Read Behavior: Loaded at startup; Any changes require a service reload
File Structure: Comprises individual records (one per line); Records are processed from top
to bottom
Record Details: Specifies connection type, database name, user name, client IP(Hostnames,
IPv6, and IPv4), and authentication method
Authentication Methods: trust, reject, password, gss, sspi, krb5, ident, peer, pam, ldap, radius,
bsd, cert, scram, md5
Password Encryption: Availability of password-based authentication methods depends on
the password_encryption
Authentication Methods
trust Allows unconditional connection, no password required.
Reject Unconditionally rejects the connection; useful for blocking specific hosts.
scram-sha-256 Performs SCRAM-SHA-256 authentication for password verification.
md5 Performs SCRAM-SHA-256 or MD5 authentication for password verification.
password Requires an unencrypted password, not recommended for untrusted networks.
gss Uses GSSAPI for user authentication (TCP/IP connections only).
sspi Uses SSPI for user authentication (Windows only).
ident Obtains the client's OS username by contacting the ident server for TCP/IP connections.
peer Obtains the client's OS username and matches it with the requested database user name (local connections).
ldap Authenticates using an LDAP server.
radius Authenticates using a RADIUS server.
cert Authenticates using SSL client certificates.
pam Authenticates using Pluggable Authentication Modules (PAM) provided by the OS.
bsd Authenticates using the BSD Authentication service provided by the OS.
pg_hba.conf Example
# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all peer
# IPv4 local connections:
host all all 127.0.0.1/32 scram-sha-256
# IPv6 local connections:
host all all ::1/128 scram-sha-256
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all peer
host replication all 127.0.0.1/32 scram-sha-256
host replication all ::1/128 scram-sha-256
SQL:
select rule_number,type,database,user_name,address,netmask,auth_method
from pg_hba_file_rules ;
Authentication Problems
FATAL: no pg_hba.conf entry for host "192.168.10.23", user
“edbstore", database “edbuser“
FATAL: password authentication failed for user “edbuser“
FATAL: user “edbuser" does not exist
FATAL: database “edbstore" does not exist
Self-explanatory message is displayed
Verify database name, username and Client IP in pg_hba.conf
Reload Cluster after changing pg_hba.conf
Check server log for more information
Row Level Security
Row Level Security (RLS)
GRANT and REVOKE can be used at table
level Account Balance
PostgreSQL supports security policies Company J $23,925
for limiting access at row level
Company M $133,007
By default, all rows of a table are visible
Company Z $17,092
Once RLS is enabled on a table, all
Company L $997,654
queries must go through the security
policy Company R $72,871
Security policies are controlled by DBA Company A $0.0
rather than application
Company T $50,194
RLS offers stronger security as it is
enforced by the database Company Q $67,892
Example - Row Level Security
For example, to enable row level security for the table accounts :
Create the table first
postgres=# CREATE TABLE accounts (manager text, company text,
contact_email text);
Then alter the table
postgres=# ALTER TABLE accounts ENABLE ROW LEVEL SECURITY;
Syntax:
CREATE POLICY name ON table_name
[ AS { PERMISSIVE | RESTRICTIVE } ]
[ FOR { ALL | SELECT | INSERT | UPDATE | DELETE } ]
[ TO{ role_name | PUBLIC | CURRENT_USER | SESSION_USER}[,...] ]
[ USING ( using_expression ) ]
[ WITH CHECK ( check_expression ) ]
Example - Row Level Security (continued)
To create a policy on the accounts table to allow the managers role to view
the rows of their accounts, the CREATE POLICY command can be used:
postgres=# CREATE POLICY account_managers ON accounts TO managers
USING (manager = current_user);
To allow all users to view their own row in a user table, a simple policy can be
used:
postgres=# CREATE POLICY user_policy ON users USING (user =
current_user);
Data Encryption
Database Level Encryption
Encrypting everything does not make data secure
Resources are consumed when you query encrypted data
pgcrypto provides mechanism for encrypting selected columns
pgcrypto supports one-way and two-way data encryption
Install pgcrypto using CREATE EXTENSION command
CREATE EXTENSION pgcrypto;
General Security
Recommendations
General Recommendations - Database Server
Always keep your system patched to the latest version
Don't put a postmaster port on the Internet
Firewall this port appropriately
If that's not possible, make a read-only Replica database available on the port, not a R/W master
Isolate the database port from other network traffic
Don't rely solely on your front-end application to prevent unauthorized access to
your database
Avoid using trust authentication in pg_hba.conf
General Recommendations - Database Users
Provide each user with their own login
Shared credentials make auditing more complicated and violate HIPAA, PCI, etc.
Allow users the minimum access to do their jobs
Use Roles and classes of privileges
Use Views and View Security Barriers
Use Row Level Security
General Recommendations - Connection Pooling
When not practical to provide each user with their own login (i.e. connection
pooling is in use):
Have one or more logins related to the application
Limit access to the database by the specific IP addresses where the
application is certified to run
Ensure the login(s) have minimum rights needed to do their work (e.g. SELECT
rights and only to specified tables)
General Recommendations - Database Superuser
Only allow the database superuser to log in from the server machine
itself, with local or localhost connection
Reserve use of superuser accounts for tasks or roles where it is
absolutely required
Make as few objects owned by the superuser as necessary
Restrict access to configuration files (postgresql.conf and pg_hba.conf)
and error log files to administrators
Disallow host system login by database superuser roles ('postgres‘)
General Recommendations - Database Superuser
(continued)
Do not allow superuser to log into database server OS. Use personal OS login
and then “sudo” to create an audit trail
Use a separate database login to own each database and own everything in it
General Recommendations - Database Backups
Keep backups and have a tested recovery plan. No matter how well you secure
things, it's still possible an intruder could get in and delete or modify your data
Have scripts perform backups and immediately test them and alert DBA on any
failures
Keep backups physically separate from the database server. A disaster can
strike and take out an entire location, whether that’s environmental (e.g.
earthquake), malicious (e.g. hacker, insider), or human error
General Recommendations - Think AAA
Authenticate Authorize Audit
Verify the user is Verify the user is Record which
who she claims allowed access user did what and
to be when they did it
Module Summary
Database Security Requirements and Protection Plan
Levels of Security in Postgres
Access Control using pg_hba.conf
Introduction to Row Level Security
Data Encryption
General Security Recommendations
Lab Exercise - 1
1. You are working as a Postgres DBA. Your server box has 2 network cards with
ip addresses 192.168.30.10 and 10.4.2.10. 192.168.30.10 is used for the
internal LAN and 10.4.2.10 is used by the web server to connect users from an
external network. Your server should accept TCP/IP connections both from
internal and external users.
Configure your server to accept connections from external and internal networks.
Lab Exercise - 2
1. A new developer has joined the team with ID number 89
Create a new user by name dev89 and password password89
Then assign the necessary privileges to dev89 so they can
connect to the edbstore database and view all tables
Lab Exercise - 3
1. A new developer joins e-music corp. They have an ip address
192.168.30.89. They are not able to connect from their machine to
the Postgres and gets the following error on the server:
FATAL: no pg_hba.conf entry for host “192.168.30.89", user
“dev89", database “edbstore", SSL off
2. Configure your server so that the new developer can connect from
their machine
Monitoring and
Admin Tools
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Overview and Features of pgAdmin
Access pgAdmin
Register and Connect to a Database Server
General Database Administration
Object Browser - View Data, Query Tool, Server Status
Overview of Postgres Enterprise Manager
Introduction to pgAdmin
Open-source graphical user interface for Postgres
Create, manage and maintain database objects
pgAdmin is web based and requires Apache HTTP server
Download and Install: https://www.pgadmin.org/download/
pgAdmin Features
Multi-platform
Supports PostgreSQL and EDB Postgres Advanced Server
Multi-deployment Mode – Desktop, Server
Integrated SQL IDE
pl/pgsql and edb-spl Debugger
Schema Diff Tool
ERD Tool
Perform Maintenance Tasks – Vacuum, Backups, Restore etc.
Job Scheduler
Multibyte server-side encoding support
Installing pgAdmin on Linux
Install the EPEL repository:
sudo dnf install -y epel-release
Install pgadmin repository:
sudo rpm -i
https://ftp.postgresql.org/pub/pgadmin/pgadmin4/yum/pgadmin4-redhat-
repo-2-1.noarch.rpm
sudo dnf makecache
Use the yum command to install pgAdmin package:
sudo dnf install -y pgadmin4
Post Installation Script
Run post-install script to configure the Apache-HTTP:
sudo dnf install -y policycoreutils-python-utils
/usr/pgadmin4/bin/setup-web.sh
[rocky@pgsrv1 ~]$ sudo /usr/pgadmin4/bin/setup-web.sh
Setting up pgAdmin 4 in web mode on a Redhat based platform...
……………………
Enter the email address and password to use for the initial pgAdmin user account:
Email address:
[email protected] Password:
Retype password:
pgAdmin 4 - Application Initialisation
======================================
Creating storage and log directories...
Configuring SELinux...
……………………
The Apache web server is not running. We can enable and start the web server for you to finish pgAdmin 4
installation. Continue (y/n)? Y
Created symlink /etc/systemd/system/multi-user.target.wants/httpd.service → /usr/lib/systemd/system/httpd.service.
Apache successfully enabled.
Apache successfully started.
You can now start using pgAdmin 4 in web mode at http://127.0.0.1/pgadmin4
[rocky@pgsrv1 ~]$
Practice Lab - Install pgAdmin
Connect to the Linux VM as sudo user
Execute following commands to install and configure pgAdmin:
sudo dnf install -y epel-release
sudo rpm -i
https://ftp.postgresql.org/pub/pgadmin/pgadmin4/yum/pgadmin4-redhat-
repo-2-1.noarch.rpm
sudo dnf makecache
sudo dnf install -y pgadmin4 policycoreutils-python-utils
/usr/pgadmin4/bin/setup-web.sh
Access pgAdmin Web Interface
Open a browser and type: http://<IP>/pgadmin4
Enter email address and password provided during post install script.
Click Login
pgAdmin - User Interface
Registering a Server
Right Click on the
server to add a server
Common Connection Problems
There are 2 common error messages that you encounter while
connecting to a PostgreSQL database:
Could not connect to Server - Connection refused
This error occurs when either the database server isn't running OR the port 5432 may
not be open on database server to accept external TCP/IP connections.
FATAL: no pg_hba.conf entry
This means your server can be contacted over the network but is not configured to
accept the connection. Your client is not detected as a legal user for the database.
You will have to add an entry for each of your clients to the pg_hba.conf file.
Query Tool
Click on a
Database
Click on
Query Tool
Query Tool - Data Output
Type SQL Query Click on Execute Button
View Results
Databases
The databases menu allows you to create a new database
The menu for an individual database allows you to perform
operations on that database
Create a new object in the database
Drop the database
Open the Query Tool with a script to re-create the database
Perform maintenance
Backup or Restore
Modify the database properties
Creating a Database
Backup and Restore
Schemas
Schemas - Grant Wizard
Domains
Sequences
Tables
Tables - Indexes
Tables - Maintenance
Rules
Rules can be
applied to tables
or views
Triggers
Create a trigger
function before
creating a trigger
Views
Create Tablespaces
Roles
Dashboard
Server Sessions
Transaction per second
Tuples in
Tuples out
Block I/O
Server activity - sessions
Overview of Postgres
Enterprise Manager (PEM)
Postgres Enterprise Manager
Manage, monitor, and tune Postgres at scale
Manage from one Optimize database Monitor system Integrate with
interface performance health other tools
One place to In-depth diagnostics Built-in dashboards APIs and webhooks
visualize and for database reports and customizable to fetch data, send
manage everything and tuning alert thresholds alerts, and manage
servers
PEM - Features
Manage, Monitor and Tune PostgreSQL and EDB Postgres Advanced
Server running on multiple Platforms
Management Monitoring Tuning
Integrated SQL IDE Customizable charts and Detailed performance
Built-in query debugger dashboards diagnostics
User/group access Predefined and custom alerts SQL profiler
management via email or SNMP Capacity management
Schema Diff User-defined metrics log Log manager
analysis
Session profiling Expert wizards for
Database and OS level configuration setup
Job scheduling monitoring
Backup and failover Web hooks and REST API for
management integrations
PEM Architecture
PEM Web Application PostgreSQL EPAS
HTTPD
Monitoring
PEM Agent Data PEM Agent
Managed Host Machine
Client(Browser)
Monitoring
Data EPAS
PEM Storage
(Backend Database: pem) Monitoring PEM Agent
Data
Managed Host Machine
PEM Server Host Machine Managed Hosts
Module Summary
Overview and Features of pgAdmin
Access pgAdmin
Register and Connect to a Database Server
General Database Administration
Object Browser - View Data, Query Tool, Server Status
Overview of Postgres Enterprise Manager
Lab Exercise 1
Open pgAdmin 4 and connect to the default PostgreSQL database cluster
Create a user named pguser
Create a database named pgdb owned by pguser
After creating the pgdb database change its connection limit to 4
Create a schema named pguser inside the pgdb database
The schema owner should be pguser
Lab Exercise 2
You have created the pgdb database with the pguser schema. Create following
objects in the pguser schema:
Table - Teams with columns TeamID, TeamName, TeamRatings
Sequence - seq_teamid start value - 1 increment by 1
Columns - Change the default value for the TeamID column to seq_teamid
Constraint - TeamRatings must be between 1 and 10
Index - Primary Key TeamID
View - Display all teams in ascending order of their ratings. Name the view as vw_top_teams
Lab Exercise 3
View all rows in the Teams table.
Using the Edit data window, you just opened in the previous step, insert the
following rows into the Teams table:
TeamID TeamName TeamRatings
Auto generated Oilers 1
Auto generated Rangers 6
Auto generated Canucks 8
Auto generated Blackhawks 5
Auto generated Bruins 2
SQL Primer
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Data Types Sequences
Structured Query Language Domains
DDL, DML and DCL Statements SQL Joins and Functions
Transaction Control Statements Explain Plans
Tables and Constraints Quoting in PostgreSQL
Views and Materialized Views Indexes
Data Types
Common Data Types:
Numeric Types Character Types Date/Time Types Other Types
TIMESTAMP BYTEA
NUMERIC CHAR
BOOL
DATE MONEY
INTEGER VARCHAR
TIME XML
JSON
SERIAL TEXT INTERVAL JSONB
Structured Query Language
Data Definition Data Manipulation Transaction Control
Data Control Language
Language Language Language
CREATE COMMIT
INSERT
GRANT
ALTER ROLLBACK
UPDATE
DROP SAVEPOINT
REVOKE
DELETE
TRUNCATE SET TRANSACTION
DDL Statements
Statement Syntax
CREATE [TEMPORARY][UNLOGGED] TABLE table_name
( [column_name data_type [ column_constraint] )
[ INHERITS ( parent_table) ]
CREATE TABLE
[ TABLESPACE tablespace_name ]
[ USING INDEX TABLESPACE tablespace_name ]
[ PARTITION BY { RANGE | LIST | HASH } (column_name|( expression) ]
ALTER TABLE ALTER TABLE [IF EXISTS] [ONLY] name [*] action [,…]
DROP TABLE DROP TABLE [ IF EXISTS ] name [, …] [ CASCADE | RESTRICT ]
TRUNCATE TABLE TRUNCATE [ TABLE ] [ ONLY ] name [ * ] [, ….]
DML Statements
Statement Syntax
INSERT INTO table_name [ ( column_name [, ...] ) ]
INSERT
{ DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [,...] | query }
UPDATE [ ONLY ] table_name
UPDATE SET column_name = { expression | DEFAULT }
[ WHERE condition]
DELETE FROM [ ONLY ] table_name
DELETE
[ WHERE condition ]
SELECT [ ALL | DISTINCT ] [ * | expression ]
SELECT
[FROM table [,.. ]
DCL Statements
Statement Syntax
GRANT { { SELECT | INSERT | UPDATE ……} [, … ] | ALL [PRIVILEGES ] }
GRANT ON { [ TABLE ] table_name [, …] | ALL TABLES IN SCHEMA schema_name [ ,…] }
TO role_specification [, …] [ WITH GRANT OPTION ]
REVOKE [ GRANT OPTION FOR ]
{ { SELECT | INSERT | UPDATE ……} [, … ] | ALL [PRIVILEGES ] }
REVOKE
ON { [ TABLE ] table_name [, …] | ALL TABLES IN SCHEMA schema_name [ ,…] }
FROM { [ GROUP ] role_name | PUBLIC } [, …]
Transaction Control Language
Statement Syntax
COMMIT COMMIT [ WORK | TRANSACTION ]
ROLLBACK ROLLBACK [ WORK | TRANSACTION ]
SAVEPOINT SAVEPOINT savepoint_name
SET TRANSACTION SET TRANSACTION transaction_mode [, …]
Database Objects
Object Description
TABLE Named collection of rows
VIEW Virtual table, can be used to hide complex queries
SEQUENCE Used to automatically generate integer values that follow a pattern
INDEX A common way to enhance query performance
DOMAIN A data type with optional constraints
Tables
A table is a named collection of rows
Each table row has same set of columns
Each column has a data type
Tables can be created using the CREATE TABLE statement
Syntax:
Types of Constraints
Constraints are used to enforce data integrity
PostgreSQLsupports different types of constraints:
NOT NULL
CHECK
UNIQUE
PRIMARY KEY
FOREIGN KEY
Constraints can be defined at the column level or table level
Constraints can be added to an existing table using the ALTER TABLE
statement
Constraints can be declared DEFERRABLE or NOT DEFERRABLE
Constraints prevent the deletion of a table if there are dependencies
Views
A View is a Virtual Table and can be used to hide complex queries
Can also be used to represent a selected view of data
Simple views are updatable and allow non-updatable columns
Views can be created using the CREATE VIEW statement
Syntax:
=> CREATE [ OR REPLACE ] VIEW name [ ( column_name [, ...] ) ]
[ WITH ( view_option_name [= view_option_value] [, ... ] ) ]
AS query
Sequences
A sequence is used to automatically generate integer values that follow a
pattern
A sequence has a name, start point and an end point
Sequence values can be cached for performance
Sequence can be used using CURRVAL and NEXTVAL functions
Syntax:
=> CREATE SEQUENCE name [ INCREMENT [ BY ] increment ]
[ MINVALUE minvalue] [ MAXVALUE maxvalue]
[ START [ WITH ] start ] [ CACHE cache ] [ [ NO ] CYCLE ]
[ OWNED BY { table_name.column_name | NONE } ]
Domains
A domain is a data type with optional constraints
Domains can be used to create a data type which allows a selected list of values
Table: emp
Column: cityname
Data Type: city
Domain: city Table: shop
Allowed Values: Edmonton, Column: shoplocation
Calgary, Red Deer Data Type: city
Table: clients
Column: res_city
Data Type: city
Types of JOINS
Type Description
INNER JOIN Returns all matching rows from both tables
Returns all matching rows and rows from left-hand table even if there is no
LEFT OUTER JOIN
corresponding row in the joined table
Returns all matching rows and rows from right-hand table even if there is
RIGHT OUTER JOIN
no corresponding row in the joined table
FULL OUTER JOIN Returns all matching as well as not matching rows from both tables
CROSS JOIN Returns all rows of both tables with Cartesian product on number of rows
Using SQL Functions
Can be used in SELECT statements and WHERE clauses
Includes
String Functions
Format Functions
Date and Time Functions
Aggregate Functions
Example:
=> SELECT lower(name)FROM departments;
=> SELECT * FROM departments
WHERE lower(name) = 'development';
SQL Format Functions
Function Return Type Description Example
convert time stamp to to_char(current_timestamp,
to_char(timestamp, text) text
string 'HH12:MI:SS')
convert interval to to_char(interval '15h 2m 12s',
to_char(interval, text) text
string 'HH24:MI:SS')
convert integer to
to_char(int, text) text to_char(125, '999')
string
to_char(double real/double precision to
text to_char(125.8::real, '999D9')
precision, text) strconvert ing
convert numeric to
to_char(numeric, text) text to_char(-125.8, '999D99S')
string
to_date('05 Dec 2000',
to_date(text, text) date convert string to date
'DD Mon YYYY')
convert string to to_number('12,454.8-',
to_number(text, text) numeric
numeric '99G999D9S')
timestamp with convert string to time to_timestamp('05 Dec 2000',
to_timestamp(text, text)
time zone stamp 'DD Mon YYYY')
to_timestamp(double timestamp with convert Unix epoch to
to_timestamp(1284352323)
precision) time zone time stamp
Execution Plan
An execution plan shows the detailed steps necessary to execute a SQL statement
Planner is responsible for generating the execution plan
The Optimizer determines the most efficient execution plan
Optimization is cost-based, cost is estimated resource usage for a plan
Cost estimates rely on accurate table statistics, gathered with ANALYZE
Costs also rely on seq_page_cost, random_page_cost, and others
The EXPLAIN command is used to view a query plan
EXPLAIN ANALYZE is used to run the query to get actual runtime stats
Execution Plan Components
Execution Plan Components: Syntax:
Cardinality - Row Estimates EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
Access Method - Sequential or Index where option can be one of:
ANALYZE [ boolean ]
VERBOSE [ boolean ]
Join Method - Hash, Nested Loop etc. COSTS [ boolean ]
SETTINGS [ boolean ]
Join Type, Join Order GENERIC_PLAN [ boolean ]
BUFFERS [ boolean ]
WAL [ boolean ]
Sort and Aggregates TIMING [ boolean ]
SUMMARY [ boolean ]
FORMAT { TEXT | XML | JSON | YAML }
Explain Example
Example
postgres=# EXPLAIN SELECT * FROM emp;
QUERY PLAN
------------------------------------------------------
Seq Scan on emp (cost=0.00..1.14 rows=14 width=145)
The numbers that are quoted by EXPLAIN are:
Estimated start-up cost
Estimated total cost
Estimated number of rows output by this plan node
Estimated average width (in bytes) of rows output by this plan node
PEM - Query Tool’s Visual Explain
Quoting
Single quotes and dollar quotes are used to specify non-numeric values
Example:
'hello world'
'2011-07-04 13:36:24'
'{1,4,5}'
$$A string "with" various 'quotes' in.$$
$foo$A string with $$ quotes in $foo$
Double quotes are used for names of database objects which either clash with
keywords, contain mixed case letters, or contain characters other than a-z, 0-9
or underscore
Example:
SELECT * FROM "select“
CREATE TABLE "HelloWorld" ...
SELECT * FROM "Hi everyone and everything"
Indexes
Indexes are a common way to enhance performance
Postgres supports several index types:
Block Range
B-tree SP-GiST Index on
Hash Index (BRIN) GIN GIST
(default) Indexes Expressions
Example Index
Syntax:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [
ONLY ] table_name [ USING method ] ( { column_name | ( expression ) } [
COLLATE collation ] [ opclass [ ( opclass_parameter = value [, ... ] ) ] ] [
ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ INCLUDE ( column_name [, ...] ) ]
[ NULLS [ NOT ] DISTINCT ]
[ WITH ( storage_parameter [= value] [, ... ] ) ]
[ TABLESPACE tablespace_name ]
[ WHERE predicate ]
Example:
Module Summary
Data Types Sequences
Structured Query Language Domains
DDL, DML and DCL Statements SQL Joins and Functions
Transaction Control Statements Explain Plans
Tables and Constraints Quoting in PostgreSQL
Views and Materialized Views Indexes
Lab Exercise - 1
Test your knowledge:
1. Initiate a psql session
2. psql commands access the database True/False
3. The following SELECT statement executes successfully: True/False
=> SELECT ename, job, sal AS Salary FROM emp;
4. The following SELECT statement executes successfully: True/False
=> SELECT * FROM emp;
5. There are coding errors in the following statement. Can you identify them?
=> SELECT empno, ename, sal * 12 annual salary FROM emp;
Lab Exercise - 2
The staff in the HR department wants to hide some of the data in the EMP table.
They want a view called EMPVU based on the employee numbers, employee
names, and department numbers from the EMP table. They want the heading
for the employee name to be EMPLOYEE.
Confirm that the view works. Display the contents of the EMPVU view.
Using your EMPVU view, write a query for the SALES department to display all
employee names and department numbers.
Lab Exercise - 3
You need a sequence that can be used with the primary key column of the dept
table. The sequence should start at 60 and have a maximum value of 200. Have
your sequence increment by 10. Name the sequence dept_id_seq.
To test your sequence, write a script to insert two rows in the dept table.
Backup, Recovery
and PITR
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Backup Types
Database SQL Dumps
Restoring SQL Dumps
Offline Physical Backups
Continuous Archiving
Online Physical Backups Using pg_basebackup
Point-in-time Recovery
Recovery Settings
Backup Tools – Barman and pgBackRest
Types of Backup
As with any database, PostgreSQL databases should be backed up regularly
Logical Backups
Database SQL Dumps using pg_dump
Database Cluster SQL Dump using pg_dumpall
Physical Backups
Offline File System Level Backups using OS commands
Online File System Level Backups using pg_basebackup
Backup Tool – Barman and pgBackRest
Logical Backups
Database SQL Dump
Generate a text file with SQL commands
PostgreSQLprovides the utility program pg_dump for this purpose
pg_dump does not block readers or writers
pg_dump does not operate with special permissions
Dumps created by pg_dump are internally consistent, that is, the dump
represents a snapshot of the database as of the time pg_dump begins running
Syntax:
$ pg_dump [options] [dbname]
pg_dump Options
-a - Data only. Do not dump the data definitions (schema)
-s - Data definitions (schema) only. Do not dump the data
-n <schema> - Dump from the specified schema only
-t <table> - Dump specified table only
-f <file name> - Send dump to specified file. Filename can be specified using absolute or relative location
-Fp - Dump in plain-text SQL script (default)
-Ft - Dump in tar format
-Fc - Dump in compressed, custom format
-Fd - Dump in directory format
-j njobs - Dump in parallel by dumping n jobs tables simultaneously. Only supported with –Fd
-B, --no-blobs - Excludes large objects in dump
-v - Verbose option
SQL Dump - Large Databases
If the operating system has maximum file size limits, it can cause problems
when creating large pg_dump output files
Standard Unix tools can be used to work around this potential problem
Use a compression program, for example gzip:
$ pg_dump dbname | gzip > filename.gz
The split command allows you to split the output into smaller files:
$ pg_dump dbname | split -b 1m - filename
Restore – SQL Dump
Backups taken using pg_dump with plain text
format(Fp) psql client
Backups taken using pg_dumpall
Backup taken using pg_dump with custom(Fc),
tar(Ft) or director(Fd) formats
Supports parallel jobs for during restore pg_restore utility
Selected objects can be restored
pg_restore Options
-l - Display TOC of the archive file
-F [c|d|t] - Backup file format
-d <database name> - Connect to the specified database. Also restores to this database if -C option is omitted
-C - Create the database named in the dump file and restore directly into it
-a - Restore the data only, not the data definitions (schema)
-s - Restore the data definitions (schema) only, not the data
-n <schema> - Restore only objects from specified schema
-N <schema> - do not restore objects in this schema
-t <table> - Restore only specified table
-v - Verbose option
Entire Cluster - SQL Dump
pg_dumpall is used to dump an entire database cluster in plain-text SQL format
Dumps global objects - users, groups, and associated permissions
Use psql to restore
Syntax:
$ pg_dumpall [options…] > filename.backup
pg_dumpall Options
-a - Data only. Do not dump schema
-s - Data definitions (schema) only
-g - Dump global objects only - not databases
-r - Dump only roles
-c - Clean (drop) databases before recreating
-O - Skip restoration of object ownership
-x - do not dump privileges (grant/revoke)
-v - Verbose option
--disable-triggers - disable triggers during data-only restore
--no-role-passwords - do not dump passwords for roles. This allows use of pg_dumpall by non-superusers
--exclude-database - exclude database whose name match with given pattern
Physical Backups
Backup - File system level backup
An alternative backup strategy is to directly copy the files that Postgres uses to
store the data in the database
You can use whatever method you prefer for doing usual file system backups,
for example:
$ tar -cf backup.tar /usr/local/edb/data
The database server must be shut down or in backup mode in order to get a
usable backup
File system backups only work for complete backup and restoration of an entire
database cluster
Two types of File system backup
Offline backups
Online backups
File System Backups
Offline Backups
Taken using OS Copy command
Database Server must be shutdown
Cluster Level Backup and Restore
Online Backups
Continuous archiving must be enabled
Database server start/end backup mode
Cluster Level Backup and Restore with PITR
Methods - pg_basebackup, Barman, pgBackRest
Continuous Archiving
Postgres maintains WAL files for all transactions in pg_wal directory
Postgres automatically maintains the WAL logs which are full and switched
Continuous archiving can be setup to keep a copy of switched WAL Logs which
can be later used for recovery
It also enables online file system backup of a database cluster
Requirements:
wal_level must be set to replica
archive_mode must be set to on (can be set to always)
archive_command must be set in postgresql.conf which archives WAL logs and supports PITR
Continuous Archiving Methods
Parameters in postgresql.conf file
wal_level = replica
Archiver Process archive_mode = on
archive_command = ‘cp -i %p /edb/archive/%f’
Restart the database server
Archive files are generated after every log switch
Parameters in postgresql.conf file
wal_level = replica
archive_mode = on
Streaming WAL max_wal_senders = 3
Restart the database server
pg_receivewal –h localhost –D /edb/archive
Transactions are streamed and written to archive files
Base Backup Using pg_basebackup Tool
pg_basebackup can take an online base backup of a database cluster
This backup can be used for PITR or Streaming Replication
pg_basebackup makes a binary copy of the database cluster files
System is automatically put in and out of backup mode
pg_basebackup - Online Backup
Steps require to take Base Backup:
Modify pg_hba.conf
host replication postgres [Ipv4 address of client]/32 scram-sha-256
Modify postgresql.conf
wal_level = replica
archive_command = 'cp -i %p /home/postgres/archive/%f‘
archive_mode = on
max_wal_senders = 3
wal_keep_size = 512
Backup Command:
$ pg_basebackup [options] ..
Options for pg_basebackup command
-D <directory name> - Location of backup
-F <p or t> - Backup files format. Plain(p) or tar(t)
-R - write standby.signal and append postgresql.auto.conf
-T OLDDIR=NEWDIR - relocate tablespace in OLDDIR to NEWDIR
--waldir - Write ahead logs location
-z - Enable compression(tar) for files
-Z - Compress backup based on setting set to none, client or server
-P - Progress Reporting
-h host - host on which cluster is running
-p port - cluster port
To create a base backup of the server at localhost and store it in the local
directory /home/postgres/pgbackup
$ pg_basebackup -h localhost -D /home/postgres/pgbackup
Verify Base Backups
Verify backup taken by pg_basebackup using pg_verifybackup utility
Backup is verified against a backup_manifest generated by the server at the
time of the backup
Only plain format backups can be verified
Restoring Physical Backups
Point-in-time Recovery
Point-in-time recovery (PITR) is the ability to restore a database cluster
up to the present or to a specified point of time in the past
Uses a full database cluster backup and the write-ahead logs found in
the /pg_wal subdirectory
Must be configured before it is needed (write-ahead log archiving must
be enabled)
Performing Point-in-Time Recovery
Prepare Restore Configure Recover
Stop the server Copy data cluster Configure Start the server
files and folders recovery settings using service or
Take a file from backup in pg_ctl utility
system level location to the postgresql.conf
backup if data directory file Check error log
possible for any issue
Use cp -rp to Create
Clean the data preserve recovery.signal recovery.signal
directory privileges file in the data file is removed
directory automatically
after recovery
Point-in-Time Recovery Settings
Restoring archived WAL using restore_command parameter:
Unix:
restore_command = 'cp /home/postgres/archive/%f "%p"'
Windows:
restore_command = 'copy c:\\mnt\\server\\archivedir\\"%f" "%p"'
Recovery target settings:
recovery_target_name
recovery_target_time
recovery_target_xid
recovery_target_action
Backup and Recovery Tools
Backup And Recovery Manager(Barman)
Open-source administration tool for remote backups and disaster recovery
Manage backups and the recovery phase of multiple servers from one location
Distributed under GNU GPL 3 and maintained by EDB
Barman Architecture http://docs.pgbarman.org/
One Barman for multiple Primary Replica
Postgres servers
Standard connection to Postgres
for management, coordination
and monitoring Barman
Backup
Server
Standard replication connection Processing Tier Remote Tier
for running pg_basebackup and
pg_receivewal
Local Tier S3/Azure
Supports rsync/SSH (Barman Storage)
Barman - Features https://www.pgbarman.org/about/
Remote backup and restore with rsync and the PostgreSQL protocol
Support for file level incremental backups with rsync
Retention policy support
WAL Archive Compression with gzip, bzip2, or pigz
Backup data verification
Backup with RPO=0 using a synchronous physical streaming replication
connection
Rate limiting
Postgres Backup And Restore
pgBackRest
Solves common
Fully supported Open
bottleneck problems with Support capabilities like
Source bacOpen-
parallel processing for symmetric encryption,
Sourceth troubleshooting
backup, compression, and partial restore
support
restoring and archiving
-
Feature comparison
Capability Added value Barman pgBackRest Pg_basebackup
SSH protocol support Yes Yes -
PostgreSQL protocol Works without passwordless ssh. Yes - Yes
Incremental backups Yes Yes -
RPO=0 Restore up to the last commit Yes - -
Rate limiting Preserve IO for Postgres Yes - Yes
Retention and List backups Yes Yes -
Backup compression Less backup space required - Yes -
Symmetric encryption Lower security footprint for the backup data - Yes -
Partial restore (only selected Restore required data for analysis purposes - Yes -
databases)
S3 and Azure Blob Support Use flexible Cloud Storage for backup storage Yes Yes -
Nagios integration Monitor your backups with Nagios Yes Yes -
Module Summary
Backup Types
Database SQL Dumps
Restoring SQL Dumps
Offline Physical Backups
Continuous Archiving
Online Physical Backups Using pg_basebackup
Point-in-time Recovery
Recovery Settings
Backup Tools – Barman and pgBackRest
Lab Exercise - 1
1. The edbstore website database is all setup and as a DBA you need to plan a
proper backup strategy and implement it
As the root user, create a folder /pgbackup and assign ownership to the Postgres user
using the chown utility or the Windows security tab in folder properties
Take a full database dump of the edbstore database with the pg_dump utility. The dump
should be in plain text format
Name the dump file as edbstore_full.sql and store it in the /pgbackup directory
Lab Exercise - 2
1. Take a dump of the edbuser schema from the edbstore database
and name the file as edbstore_schema.sql
2. Take a data-only dump of the edbstore database, disable all triggers
for a faster restore, use the INSERT command instead of COPY, and
name the file as edbstore_data.sql
3. Take a full dump of customers table and name the file as
edbstore_customers.sql
Lab Exercise - 3
1. Take a full database dump of edbstore in compressed format using the
pg_dump utility, name the file as edbstore_full_fc.dmp
2. Take a full database cluster dump using pg_dumpall. Remember pg_dumpall
supports only plain text format; name the file edbdata.sql
Lab Exercise - 4
In this exercise you will demonstrate your ability to restore a database.
1. Drop database edbstore.
2. Create database edbstore with owner edbuser.
3. Restore the full dump from edbstore_full.sql and verify all the objects
and their ownership.
4. Drop database edbstore.
5. Create database edbstore with edbuser owner.
6. Restore the full dump from the compressed file edbstore_full_fc.dmp
and verify all the objects and their ownership.
Lab Exercise - 5
1. Create a directory /opt/arch or c:\arch and give ownership to the
Postgres user.
2. Configure your cluster to run in archive mode and set the archive log location
to be /opt/arch or c:\arch.
3. Take a full online base backup of your cluster in the /pgbackup directory
using the pg_basebackup utility.
Routine
Maintenance
Tasks
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Updating Optimizer Statistics
Handling Data Fragmentation using Routine Vacuuming
Preventing Transaction ID Wraparound Failures
Automatic Maintenance using Autovacuum
Re-indexing in Postgres
Database Maintenance
Data files become fragmented as data is modified and deleted
Database maintenance helps reconstruct the data files
If done on time nobody notices but when not done everyone knows
Must be done before you need it
Improves performance of the database
Saves database from transaction ID wraparound failures
Maintenance Tools
Maintenance thresholds can be configured using the pgAdmin Client
Postgres maintenance thresholds can be configured in postgresql.conf
Manual scripts can be written watch stat tables like pg_stat_user_tables
Maintenance commands:
ANALYZE
VACUUM
CLUSTER
Maintenance command vacuumdb can be run from OS prompt
Autovacuum can help in automatic database maintenance
Optimizer Statistics
Optimizer statistics play a vital role in query planning
Not updated in real time
Collects information for relations including size, row counts, average row size
and row sampling
Stored permanently in catalog tables
The maintenance command ANALYZE updates the statistics
Example - Updating Statistics
Data Fragmentation and Bloat
Data is stored in data file pages
An update or delete of a row
does not immediately remove
the row from the disk page
Eventually this row space
becomes obsolete and causes
fragmentation and bloating
Routine Vacuuming
Obsoleted rows can be removed or reused using vacuuming
Helps in shrinking data file size when required
Vacuuming can be automated using autovacuum
The VACUUM command locks tables in access exclusive mode
Long running transactions may block vacuuming, thus it should be done during
low usage times
Vacuuming Commands
When executed, the VACUUM command:
Can recover or reuse disk space occupied by obsolete rows
Updates data statistics
Updates the visibility map, which speeds up index-only scans
Protects against loss of very old data due to transaction ID wraparound
The VACUUM command can be run in two modes:
VACUUM
VACUUM FULL
Vacuum and Vacuum Full
VACUUM
Removes dead rows and marks the space available for future reuse
Does not return the space to the operating system
Space is reclaimed if obsolete rows are at the end of a table
VACUUM FULL
More aggressive algorithm compared to VACUUM
Compacts tables by writing a complete new version of the table file with no dead space
Takes more time
Requires extra disk space for the new copy of the table, until the operation completes
VACUUM Syntax
VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
Options:
FULL [ boolean ] PARALLEL integer
FREEZE [ boolean ] INDEX_CLEANUP { AUTO | ON | OFF }
VERBOSE [ boolean ] PROCESS_MAIN [ boolean ]
ANALYZE [ boolean ] PROCESS_TOAST [ boolean ]
DISABLE_PAGE_SKIPPING [ boolean ] SKIP_DATABASE_STATS [ boolean ]
SKIP_LOCKED [ boolean ] ONLY_DATABASE_STATS [ boolean ]
TRUNCATE [ boolean ] BUFFER_USAGE_LIMIT [ size ]
Example - Vacuuming
Example – Vacuuming (continued)
Preventing Transaction ID Wraparound Failures
MVCC depends on transaction ID numbers
Transaction IDs have limited size (32 bits at this writing)
A cluster that runs for a long time (more than 4 billion transactions)
would suffer transaction ID wraparound
This causes a catastrophic data loss
To avoid this problem, every table in the database must be
vacuumed at least once for every two billion transactions
Vacuum Freeze
VACUUM FREEZE will mark rows as frozen
Postgres reserves a special XID, FrozenTransactionId
FrozenTransactionId is always considered older than every normal XID
VACUUM FREEZE replaces transaction IDs with FrozenTransactionId, thus
rows will appear to be “in the past”
vacuum_freeze_min_age controls when a row will be frozen
VACUUM normally skips pages without dead row versions, but some rows may
need FREEZE
vacuum_freeze_table_age controls when a whole table must be scanned
The Visibility Map
Each heap relation has a Visibility Map which keeps track of which pages
contain only tuples
Stored at <relfilenode>_vm
Helps vacuum to determine whether pages contain dead rows
Can also be used by index-only scans to answer queries
VACUUM command updates the visibility map
The visibility map is vastly smaller, so can be cached easily
vacuumdb Utility
The VACUUM command has a command-line executable wrapper called
vacuumdb
vacuumdb can VACUUM all databases using a single command
Syntax:
vacuumdb [OPTION]... [DBNAME]
Available options can be listed using:
vacuumdb --help
Autovacuuming
Highly recommended feature of Postgres
It automates the execution of VACUUM, FREEZE and ANALYZE commands
Autovacuum consists of a launcher and many worker processes
A maximum of autovacuum_max_workers worker processes are allowed
Launcher will start one worker within each database every
autovacuum_naptime seconds
Workers check for inserts, updates and deletes and execute VACUUM and/or
ANALYZE as needed
track_counts must be set to TRUE as autovacuum depends on statistics
Temporary tables cannot be accessed by autovacuum
Autovacuuming Parameters
Autovacuum Launcher Autovacuum Worker
Vacuuming Thresholds
Process Processes
autovacuum autovacuum_max_workers autovacuum_vacuum_scale_factor
autovacuum_naptime autovacuum_vacuum_threshold
autovacuum_analyze_scale_factor
autovacuum_analyze_threshold
autovacuum_vacuum_insert_scale_threshold
autovacuum_vacuum_insert_threshold
autovacuum_freeze_max_age
Per-Table Thresholds
Autovacuum workers are resource intensive
Table-by-table autovacuum parameters can be configured for large tables
Configure the following parameters using ALTER TABLE or CREATE TABLE:
autovacuum_enabled
autovacuum_vacuum_threshold
autovacuum_vacuum_scale_factor
autovacuum_analyze_threshold
autovacuum_analyze_scale_factor
autovacuum_vacuum_insert_scale_threshold
autovacuum_vacuum_insert_threshold
autovacuum_freeze_max_age
Routine Reindexing
Indexes are used for faster data access
UPDATE and DELETE on a table modify underlying index entries
Indexes are stored on data pages and become fragmented over time
REINDEX rebuilds an index using the data stored in the index's table
Time required depends on:
Number of indexes
Size of indexes
Load on server when running command
When to Reindex
There are several reasons to use REINDEX:
An index has become "bloated", meaning it contains many empty or nearly-empty pages
You have altered a storage parameter (such as fillfactor) for an index
An index built with the CONCURRENTLY option failed, leaving an "invalid" index
Syntax:
=> REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [
CONCURRENTLY ] name
Module Summary
Updating Optimizer Statistics
Handling Data Fragmentation using Routine Vacuuming
Preventing Transaction ID Wraparound Failures
Automatic Maintenance using Autovacuum
Re-indexing in Postgres
Lab Exercise - 1
1. While monitoring table statistics on the edbstore database, you found that
some tables are not automatically maintained by autovacuum. You decided to
perform manual maintenance on these tables. Write a SQL script to perform
the following maintenance:
Reclaim obsolete row space from the customers table.
Update statistics for emp and dept tables.
Mark all the obsolete rows in the orders table for reuse.
2. Execute the newly created maintenance script on edbstore database.
Lab Exercise - 2
1. The composite index named ix_orderlines_orderid on (orderid,
orderlineid) columns of the orderlines table is performing very slowly.
Write a statement to reindex this index for better performance.
Moving Data
Using COPY
Command
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Loading flat files
Import and export data using COPY
Examples of COPY Command
Using COPY FREEZE for performance
Loading Flat Files into Database Tables
A "flat file" is a plain text or mixed text file which usually
contains one record per line
Postgres COPY command can be used to load flat files
into a database table
339
The COPY Command
COPY moves data between Postgres tables and standard file-system files
COPY TO copies the contents of a table or a query to a file
COPY FROM copies data from a file to a table
The file must be accessible to the server
COPY Command Syntax
Copy From:
COPY table_name [(column list)] FROM 'filename'|PROGRAM 'command'|STDIN [options][WHERE cond.]
Copy To:
COPY table_name[(column list])|(query) TO 'filename'|PROGRAM 'command'|STDOUT [options]
Copy Command Options
FORMAT format_name
FREEZE [ boolean ]
DELIMITER 'delimiter_character'
NULL 'null_string'
DEFAULT 'default_string'
HEADER [ boolean ]
QUOTE 'quote_character'
ESCAPE 'escape_character'
FORCE_QUOTE { ( column_name [, ...] ) | * }
FORCE_NOT_NULL ( column_name [, ...] )
FORCE_NULL ( column_name [, ...] )
ENCODING 'encoding_name'
Example Export to File
=> COPY emp (empno,ename,job,sal,comm,hiredate) TO '/tmp/emp.csv' CSV HEADER;
COPY
=> \! cat /tmp/emp.csv
empno,ename,job,sal,comm,hiredate
7369,SMITH,CLERK,800.00,,17-DEC-80 00:00:00
7499,ALLEN,SALESMAN,1600.00,300.00,20-FEB-81 00:00:00
7521,WARD,SALESMAN,1250.00,500.00,22-FEB-81 00:00:00
7566,JONES,MANAGER,2975.00,,02-APR-81 00:00:00
7654,MARTIN,SALESMAN,1250.00,1400.00,28-SEP-81 00:00:00
7698,BLAKE,MANAGER,2850.00,,01-MAY-81 00:00:00
7782,CLARK,MANAGER,2450.00,,09-JUN-81 00:00:00
7788,SCOTT,ANALYST,3000.00,,19-APR-87 00:00:00
7839,KING,PRESIDENT,5000.00,,17-NOV-81 00:00:00
7844,TURNER,SALESMAN,1500.00,0.00,08-SEP-81 00:00:00
7876,ADAMS,CLERK,1100.00,,23-MAY-87 00:00:00
7900,JAMES,CLERK,950.00,,03-DEC-81 00:00:00
7902,FORD,ANALYST,3000.00,,03-DEC-81 00:00:00
7934,MILLER,CLERK,1300.00,,23-JAN-82 00:00:00
Example Import from File
edb=# CREATE TEMP TABLE empcsv (LIKE emp);
CREATE TABLE
edb=# COPY empcsv (empno, ename, job, sal, comm, hiredate)
edb-# FROM '/tmp/emp.csv' CSV HEADER;
COPY
edb=# SELECT * FROM empcsv;
empno | ename | job | mgr | hiredate | sal | comm | deptno
-------+--------+-----------+-----+--------------------+---------+---------+--------
7369 | SMITH | CLERK | | 17-DEC-80 00:00:00 | 800.00 | |
7499 | ALLEN | SALESMAN | | 20-FEB-81 00:00:00 | 1600.00 | 300.00 |
7521 | WARD | SALESMAN | | 22-FEB-81 00:00:00 | 1250.00 | 500.00 |
7566 | JONES | MANAGER | | 02-APR-81 00:00:00 | 2975.00 | |
7654 | MARTIN | SALESMAN | | 28-SEP-81 00:00:00 | 1250.00 | 1400.00 |
7698 | BLAKE | MANAGER | | 01-MAY-81 00:00:00 | 2850.00 | |
7782 | CLARK | MANAGER | | 09-JUN-81 00:00:00 | 2450.00 | |
7788 | SCOTT | ANALYST | | 19-APR-87 00:00:00 | 3000.00 | |
7839 | KING | PRESIDENT | | 17-NOV-81 00:00:00 | 5000.00 | |
7844 | TURNER | SALESMAN | | 08-SEP-81 00:00:00 | 1500.00 | 0.00 |
7876 | ADAMS | CLERK | | 23-MAY-87 00:00:00 | 1100.00 | |
7900 | JAMES | CLERK | | 03-DEC-81 00:00:00 | 950.00 | |
7902 | FORD | ANALYST | | 03-DEC-81 00:00:00 | 3000.00 | |
7934 | MILLER | CLERK | | 23-JAN-82 00:00:00 | 1300.00 | |
(14 rows)
Example - COPY Command on Remote Host
COPY command on remote host using psql
$ cat emp.csv | ssh 192.168.192.83 “psql –U edbstore edbstore
-c ‘copy emp from stdin;’ “
COPY FREEZE
FREEZE option of COPY statement
Add rows to a newly created table and freezes them
Table must be created or truncated in current subtransaction
Improves performance of initial bulk load
Does violate normal rules of MVCC
Usage:
=> COPY tablename FROM filename FREEZE;
Module Summary
Loading flat files
Import and export data using COPY
Examples of COPY Command
Using COPY FREEZE for performance
346
Lab Exercise - 1
In this lab exercise you will demonstrate your ability to copy data:
1. Unload the emp table from the edbuser schema to a csv file, with column headers
2. Create a copyemp table with the same structure as the emp table
3. Load the csv file (from step 1) into the copyemp table
Replication and
High Availability
Tools
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Data Replication
Data Replication in Postgres
Streaming Replication and Architecture
Synchronous, Asynchronous and Cascaded Replication
Setup Streaming Replication
Logical Replication Architecture
Overview: EDB Postgres Distributed, EDB Failover Manager, Replication Server
and Replication Manager (repmgr)
Data Replication
Replication is the process of copying data and changes to a secondary location
for data safety and availability
Data loss can occur due to several reasons
Replication is aimed towards availability of the data when a primary source
goes offline
Data can be recovered from backup but downtimes are costly
Replication aims towards lowering downtime
Failovers can be configured to such a level where application may not notice
the primary source is offline
Data Replication in Postgres
Data replication options:
Physical Streaming Replication
Logical Streaming Replication
EDB Postgres Distributed
Cluster management tools:
EDB Failover Manager
High Availability Routing for Postgres(HARP)
Patroni
Replication manager(repmgr)
Streaming Replication
Streaming Replication (Hot Standby) is a major feature of Postgres
Replica connects to the primary node using REPLICATION protocol
WAL segments are streamed to replica server
No log shipping delays, stream WAL content across to replica immediately
Synchronous/Asynchronous options available
Supports cascaded streaming replication
Hot Streaming Architecture
Production Replica
WAL Sender WAL Receiver
WAL stream
Reports
Primary database Replica database
Asynchronous Replication
Streaming replication is asynchronous by default but can be configured as
synchronous
Asynchronous
Disconnected architecture
Transaction is committed on primary and flushed to WAL segment
Later transaction is transmitted to replica server(s) using stream
Some data loss is possible
Replication using WAL Archive method is always asynchronous
Synchronous Replication
Synchronous Replication
A 2-safe replication method offering zero data loss
Transaction must apply changes to primary and synchronously replicated replicas using two-
phase commit actions
User gets a commit message after confirmation from both primary and replica
This will introduce a delay in committing transactions
Cascading Replication
Primary
Streaming replication supports single
master node
Cascade replication can be used to share
the replication overhead of primary with
other replicas
Replica can stream changes to other Replica 1
Replicas
Helps minimize inter-site bandwidth
overheads on primary node
Asynchronous only
Replica 2 Replica 3
Setup Streaming Replication
Primary Server Configuration
For Physical Streaming Replication:
Change WAL Content parameter:
wal_level = replica #Default is replica
Two options to allow streaming connection:
max_wal_senders
max_replication_slots
Wal files to be retained in pg_wal for streaming:
wal_keep_size
max_slot_wal_keep_size
Synchronous Streaming Replication Configuration
Default level of Streaming Replication is Asynchronous
Synchronous level can also be configured using additional parameters:
synchronous_commit=on
synchronous_standby_names
If the synchronous replica stops responding, then COMMITs will be blocked
forever until someone manually intervenes
Transactions can be configured not to wait for replication by setting the
synchronous_commit parameter to local or off
Configure Authentication
Authentication setting on the primary server must allow replication connections
from the replica(s)
Provide a suitable entry or entries in pg_hba.conf with the database field set to
replication
Open pg_hba.conf of primary server:
host replication all 192.168.56.2/32 md5
Note - You will need to reload the primary server
Take a Full Backup of the Primary Server
Backup the Primary Server using pg_basebackup:
pg_basebackup -h localhost -U postgres –p 5444 -D /backup/data1 -R
-R option
Creates a default copy of standby.signal file
Add primary server connection info to postgresql.auto.conf
Replica Configuration
hot_standby Set this parameter to “ON” for read-only replica
primary_conninfo Set connection string to connect with primary or cascaded replica
primary_slot_name Specify replication slot name to be used for connection
max_standby_streaming_delay Duration for which replica has to wait during query conflicts
wal_receiver_create_temp_slot Authorize WAL receiver process to be able to create a temporary replication slot
recovery_min_apply_delay Parameter used for delayed replication
Replica Recovery Settings
Replica configuration settings must be set in postgresql.conf or
postgresql.auto.conf
Create a file name standby.signal in the data directory
standby.signal indicates the server should start as a replica
Start the replica using system services or pg_ctl
Logical Replication
Logical Replication
Logical replication is a method of
Publication 1 Subscription 1
replicating selected data objects and
their changes Publication 2 Subscription 2
Based on publications and subscriptions WAL
Subscription
Primary Sender Standby
Worker
Can be used to consolidate data Logical
Replication
Worker
Subscription 1
Portable across hardware and software
Logical Subscription 2
version Replication
Launcher
Publication 3 Subscription 3
Tables on standby server which are part
of a subscription must be treated as Publication 4 Subscription 4
read only to avoid conflicts
Reporting
Consolidate
Upgrades
When to Use Logical Replication
Sending incremental changes in a single database or a subset of a
database to subscribers
Consolidating multiple databases into a single one
Replicating between different major versions of Postgres
Giving access to replicated data to different groups of users
Sharing a subset of the database between multiple databases
Setting Up Logical Replication
Change wal_level to logical in postgresql.conf
Add pg_hba.conf entry in each server to allow connection
Connect to database in publication instance
Create a publication using CREATE PUBLICATION statement
A published table must have a “replica identity” configured in order to be able
to replicate UPDATE and DELETE operations
Connect to database in subscription instance and create a subscription using
CREATE SUBSCRIPTION statement
Example – Logical Replication Setup
Initialize sample publication(primarypub) and subscription(primarysub)
instance
[postgres@localhost ~]$ initdb --version
[postgres@localhost ~]$ initdb –D primarypub –U pubdba
[postgres@localhost ~]$ initdb –D primarysub –U subdba
Edit postgresql.conf parameter for both instances
[postgres@localhost ~]$ vi primarypub/postgresql.conf
port=5420
wal_level=logical
[postgres@localhost ~]$ vi primarysub/postgresql.conf
port=5421
wal_level=logical
Example – HBA Entries and Starting Instances
Add pg_hba.conf entries for connections
[postgres@localhost ~]$ vi primarysub/pg_hba.conf
host all pubdba 192.168.56.101/32 md5
[postgres@localhost ~]$ vi primarypub/pg_hba.conf
host all subdba 192.168.56.101/32 md5
Start both instances
[postgres@localhost ~]$ pg_ctl –D primarypub/ start
[postgres@localhost ~]$ pg_ctl –D primarysub/ start
Example – Create Tables and Publication
Connect to default database in publication instance
[postgres@localhost ~]$ psql –p 5420 –U pubdba postgres
Create a sample table and publication
=# CREATE TABLE pubexample(id INT PRIMARY KEY,
name VARCHAR(30));
=# INSERT INTO pubexample
VALUES(generate_series(1,5000),’Test1’);
=# SELECT count(*) FROM pubexample;
=# CREATE PUBLICATION testpub FOR TABLE pubexample;
Example – Create Tables and Subscription
Connect to default database in subscription instance
[postgres@localhost ~]$ psql –p 5421 –U subdba postgres
Create a sample table and subscription
=# CREATE TABLE pubexample(id INT PRIMARY KEY,
name VARCHAR(30));
=# CREATE SUBSCRIPTION testsub CONNECTION
‘host=localhost port=5420 user=pubdba dbname=postgres’
PUBLICATION testpub;
=# SELECT count(*) FROM pubexample;
Example – Test Logical Replication
Add data to publication
[postgres@localhost ~]$ psql –p 5420 –U pubdba postgres
postgres=# INSERT INTO pubexample
VALUES (generate_series(5001,10000),’Test1’);
postgres=# \q
Check changes on Subscription
[postgres@localhost ~]$ psql –p 5421 –U subdba postgres
postgres=# SELECT count(*) FROM pubexample;
Monitoring Basics
Monitoring Replication
pg_stat_replication
Show connected replicas and their status on the primary
pg_stat_subscription
Shows the status of subscription when using logical replication
pg_stat_wal_receiver
Shows the WAL receiver process status on Replica
Recovery information functions:
pg_is_in_recovery()
pg_current_wal_lsn
pg_last_wal_receive_lsn
pg_last_xact_replay_timestamp()
Example - Monitoring Replication
Execute:
=# SELECT * FROM pg_stat_replication;
Find lag (bytes):
=# SELECT pg_wal_lsn_diff(sent_lsn, replay_lsn) FROM
pg_stat_replication;
Find lag (seconds):
=# SELECT CASE WHEN pg_last_wal_receive_lsn() =
pg_last_wal_replay_lsn()
THEN 0 ELSE
EXTRACT (EPOCH FROM now() -pg_last_xact_replay_timestamp())
END AS stream_delay;
Recovery Control Functions
Name Return Type Description
pg_is_wal_replay_paused() bool True if recovery is paused.
pg_wal_replay_pause() void Pauses recovery immediately.
pg_wal_replay_resume() void Restarts recovery if it was paused.
EDB Postgres Distributed -
Overview
EDB Postgres Distributed
The most advanced replication solution for Postgres
Maintain extreme Upgrade with Choose the level
high availability near zero downtime of consistency
Postgres clusters deployed with Rolling upgrades of application and Robust capabilities provide
EDB Postgres Distributed keep top database software eliminate the flexibility to meet application
tier enterprise applications running largest source of downtime data loss requirements
Always ON
Top-tier enterprise applications are critical to an organization’s success in all
regions where business is conducted, whether a single region or globally
The application The application The availability of The application data
represents a must perform well the application must always be
promise to its for a good user directly ties to current and
customers experience revenue generation available, or the
user loses trust
More Than Bi-directional
Replication
MULTI-MASTER REPLICATION ENABLING HIGHLY AVAILABLE AND
GEOGRAPHICALLY DISTRIBUTED POSTGRES CLUSTERS
Logical replication of data and schema
enabled via standard Postgres extension
Data consistency options that span from
immediate to eventual consistency
Robust tooling to manage conflicts, monitor
performance, and validate consistency
Deploy natively to cloud, virtual, or bare
metal environments
Geo-fencing, allowing selectively replicate
data for security compliance and
jurisdiction control.
EDB Postgres Distributed Features
Multi-Master
Synchronous or Flexible Always-ON
Row Level
Asynchronous Deployment DDL Replication DDL and Row Filters
Consistency
Replication for Architectures
Postgres
Configurable Conflict-free
Database Rolling
Parallel Apply Auto Partitioning Column-level Replicated Data
Upgrades
Conflict Resolution Types (CRDTs)
Transactions State Next Generation
Subscriber-only Open Telemetry
Tracking Across PGD CLI Connection Routing
Nodes Integration
Failovers using PGD-Proxy
Deployment - Single Location
Locations = 1, local redundancy = 3,
nodes = 3, active locations = 1
Global group with single data group of
A1, A2 and A3
Lead Primary A1 receiving all writes but
changes can also be received by A2 and
A2
Shadow Primary A2, A3 receiving writes
Can be 3 data nodes (recommended)
Can be 2 data nodes and 1 witness that doesn't
hold data (not depicted)
Example Deployment - Multiple Location
Locations = 2,
local redundancy = 3,
nodes = 3,
active locations = 1
Replication Server Overview
EDB Postgres Replication Server
Single Master Replication (SMR) for Reporting or Migration
Master
EDB Advanced Server
PostgreSQL Read/Write
Oracle®
SQL Server ®
Data filtering
Scheduling
EDB Advanced Server
PostgreSQL Read
Oracle®
SQL Server ®
Replica
Replication Server
REPLICATES BETWEEN POSTGRES AND NON-POSTGRES DATABASES
Integrate with Oracle or SQL Server
databases to offload reporting or to
feed data to legacy applications
Flexibility to replicate a subset of data
from the source database
Graphical user interface provides easy
configuration and management
Includes utility to validate data
consistency between the source and Replication from SQL Server
target databases or Oracle to Postgres
EDB Replication Server Features
Replicate Oracle or SQL Server data to EDB Snapshot and continuous modes
Postgres Advanced Server
Define and apply row filters
Distributed multi-Publication/Subscription
Architecture Flexible replication scheduler
Synchronize data across geographies Replication History Viewer
Replicate tables and sequences Graphical Replication Console and CLI
Controlled switchover and failover
Supports cascading replication
Trigger and Log-based replication
Failover Manager Overview
Why Failover Manager
Ensure business Maintain high Upgrade with
continuity availability minimal downtime
Monitor health Meet your SLAs by Switchover on demand
databases and identify switching over to the to move the primary to
failures quickly most recent standby standby for maintenance
EDB Postgres Failover Manager
AUTOMATICALLY DETECT FAILURES
Client Applications
Monitors database health -
detects failures and takes
Load Balancer (e.g. Pgpool)
action
Automatically fails over to the
most current standby,
reconfigures others Read/Write
Read
Primary Primary Witness (Optional)
Reconfigures load balancers
on failover - integrates with
pgPool and others Streaming
Replication
Avoids “split brain” scenarios -
Prevents two nodes from EDB Postgres Failover EDB Postgres Failover EDB Postgres Failover
Manager Agent Manager Agent
thinking that each is primary Manager Agent
EFM Features
Multiple health checks for Primary & Replica nodes
Automatic Failover from Primary to Replica node
Controlled switchovers for planned events on primary
Configurable fencing operations
User configurable failure detection wait times
Witness node protects against ‘split brain’ scenarios
Support for multiple streaming replicas
Replica promotion based on WAL location and node priority
Real-time notifications to chat rooms, SNMP and SMTP for all
cluster status changes
Setup an EFM Cluster
Set up Streaming replication Client Connection Pools and
Application Load Balancer
between the two servers
Install EFM
Configure the Streaming Replication
efm.properties file
Start EFM
Primary Replica - 2
Add nodes to EFM cluster EFM Agent EFM Agent
Monitor the EFM and Streaming
database servers Replication Replica - 1
EFM Agent
Replication Manager Overview
Replication Manager(repmgr)
Cluster Management tool for Postgres
Maintain high Perform upgrades Open Source
availability using switchovers
Automatic Failover to Add/remove replicas Open Source from
Replica in a Streaming and switchover of EnterpriseDB and
Replication Environment primary instance licensed under GPL
repmgr Features
Open-source tool for managing replication and failover
Supports Postgres Streaming Replication
repmgr tool for setup:
Add/remove replicas
Perform switchovers
Promote a replica
repmgrd tool:
Monitor replication
Automatic failover detection with witness protection
Email notification
repmgr Architecture
Replica - 1 Primary Replica - 2
Streaming Replication Streaming Replication
repmgr user and metadata
repmgr
repmgr repmgr
repmgrd repmgrd repmgrd
Module Summary
Data Replication
Data Replication in Postgres
Streaming Replication and Architecture
Synchronous, Asynchronous and Cascaded Replication
Setup Streaming Replication
Logical Replication Architecture
Overview: EDB Postgres Distributed, EDB Failover Manager,
Replication Server and Replication Manager (repmgr)
Course Summary
Introduction and Architectural Overview Database Security
System Architecture Monitoring and Admin Tools Overview
Installation SQL Primer
User Tools - Command Line Interfaces Backup and Recovery
Database Clusters Routine Maintenance Tasks
Database Configuration Data Loading
Data Dictionary Data Replication and High Availability
Creating and Managing Database Objects
Next Steps
Certify your Postgres skills with EDB Certifications for Postgres
Continue your skills development with the following classes:
Advanced Database Administration
Monitoring and Alerting with Postgres Enterprise Manager
Tuning and Maintenance
See the Training Portal for the full library of Postgres training classes
Get familiar with the EDB Tools available as part of the EDB Postgres Platform
For any questions related to EDB Postgres Trainings and Certifications,
or for additional information, write to:
[email protected]
Thank you!
Please visit our Training Portal for
more courses and workshops!
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .