PostgreSQL Performance
PostgreSQL Performance
Bruce Momjian
16th January 2003
POSTGRESQL is an object-relational database developed on the Internet by a group of developers spread across the
globe. It is an open-source alternative to commercial databases like Oracle and Informix.
POSTGRESQL was originally developed at the University of California at Berkeley. In 1996, a group began devel-
opment of the database on the Internet. They use email to share ideas and file servers to share code. POSTGRESQL
is now comparable to commercial databases in terms of features, performance, and reliability. It has transactions,
views, stored procedures, and referential integrity constraints. It supports a large number of programming interfaces,
including ODBC, Java (JDBC), TCL/TK, PHP, Perl, and Python. POSTGRESQL continues to improve at a tremendous pace
thanks to a talented pool of Internet developers.
Performance Concepts
There are two aspects of database performance tuning. One is improving the database’s use of the CPU, memory, and
disk drives in the computer. The second is optimizing the queries sent to the database. This article talks about the
hardware aspects of performance tuning. The optimization of queries is done using SQL commands like CREATE INDEX,
VACUUM, VACUUM ANALYZE, CLUSTER, and EXPLAIN. These are discussed in my book, PostgreSQL: Introduction and
Concepts at http://www.postgresql.org/docs/awbook.html .
To understand hardware performance issues, it is important to understand what is happening inside the computer.
For simplicity, a computer can be thought of as a central processing unit (CPU) surrounded by storage. On the same chip
with the CPU are several CPU registers which store intermediate results and various pointers and counters. Surrounding
this is the CPU cache which holds the most recently accessed information. Beyond the CPU cache is a large amount
of random-access main memory (RAM) which holds executing programs and data. Beyond this main memory are disk
drives, which store even larger amounts of information. Disk drives are the only permanent storage area, so anything
to be kept when the computer is turned off must be placed there. In summary, here are the storage areas surrounding
the CPU:
CPU
Registers
CPU Cache
Kernel Cache
Disk Drive
Storage Area Measured in
CPU registers bytes
CPU cache kilobytes
RAM megabytes
disk drives gigabytes
You can see that storage areas increase in size as they get farther from the CPU. Ideally, a huge amount of permanent
memory could be placed right next to the CPU, but this would be too slow and expensive. In practice, the most frequently
used information is stored next to the CPU, and less frequently accessed information is stored farther away and brought
to the CPU as needed.
2
POSTGRESQL does not directly change information on disk. Instead, it requests data be read into the POSTGRESQL
shared buffer cache. POSTGRESQL backends then read/write these blocks, and finally flush them back to disk.
Backends that need to access tables first look for needed blocks in this cache. If they are already there, they can
continue processing right away. If not, an operating system request is made to load the blocks. The blocks are loaded
either from the kernel disk buffer cache, or from disk. These can be expensive operations.
The default POSTGRESQL configuration allocates 64 shared buffers. Each buffer is 8 kilobytes. Increasing the
number of buffers makes it more likely backends will find the information they need in the cache, thus avoiding an
expensive operating system request. The change can be made with a postmaster command-line flag or by changing the
value of shared_buffers in postgresql.conf.
Postgres Backend
Postgres Backend
Postgres Backend
You may think, “I will just give all my RAM to the POSTGRESQL shared buffer cache.” However, if you do that, there
will be no room for the kernel or for any programs to run. The proper size for the POSTGRESQL shared buffer cache is
the largest useful size that does not adversely affect other activity.
To understand adverse activity, you need to understand how UNIX operating systems manage memory. If there is
enough memory to hold all programs and data, little memory management is required. However, if everything doesn’t
fit in RAM, the kernel starts forcing memory pages to a disk area called swap. It moves pages that have not been used
recently. This operation is called a swap pageout. Pageouts are not a problem because they happen during periods of
inactivity. What is bad is when these pages have to be brought back in from swap, meaning an old page that was moved
out to swap has to be moved back into RAM. This is called a swap pagein. This is bad because while the page is moved
from swap, the program is suspended until the pagein completes.
Pagein activity is shown by system analysis tools like vmstat and sar and indicates there is not enough memory
available to function efficiently. Do not confuse swap pageins with ordinary pageins, which may include pages read from
the filesystem as part of normal system operation. If you can’t find swap pageins, many pageouts is a good indicator
you are are also doing swap pageins.
3
Effects of Cache Size
You may wonder why cache size is so important. First, imagine the POSTGRESQL shared buffer cache is large enough
to hold an entire table. Repeated sequential scans of the table will require no disk access because all the data is already
in the cache. Now imagine the cache is one block smaller than the table. A sequential scan of the table will load all
table blocks into the cache until the last one. When that block is needed, the oldest block is removed, which in this
case is the first block of the table. When another sequential scan happens, the first block is no longer in the cache, and
to load it in, the oldest block is removed, which in this case is now the second block in the table. This pushing out of
the next needed block continues to the end of the table. This is an extreme example, but you can see that a decrease
of one block can change the efficiency of the cache from 100% to 0%. It shows that finding the right cache size can
dramatically affect performance.
Keep in mind that the postmaster allocates all shared memory when it starts. This area stays the same size even if no
one is accessing the database. Some operating systems pageout unreferenced shared memory, while others lock shared
memory into RAM. Locked shared memory is preferred. The POSTGRESQL administrators guide has information about
kernel configuration for various operating systems, http://developer.postgresql.org/docs/postgres/kernel-resources.html
4
Disk Locality
The physical nature of disk drives makes their performance characteristics different from the other storage areas
mentioned in this article. The other storage areas can access any byte with equal speed. Disk drives, with their
spinning platters and moving heads, access data near the head’s current position much faster than data farther away.
Moving the disk head to another cylinder on the platter takes quite a bit of time. Unix kernel developers know
this. When storing a large file on disk, they try to place the pieces of the file near each other. For example, suppose
a file requires ten blocks on disk. The operating system may place blocks 1-5 on one cylinder and blocks 6-10 on
another cylinder. If the file is read from beginning to end, only two head movements are required — one to get to the
cylinder holding blocks 1-5, and another to get to blocks 6-10. However, if the file is read non-sequentially, e.g. blocks
1,6,2,7,3,8,4,9,5,10; ten head movements are required. As you can see, with disks, sequential access is much faster
than random access. This is why POSTGRESQL prefers sequential scans to index scans if a significant portion of the
table needs to be read. This also highlights the value of the cache.
Moving Joins Symbolic links allow the movement of joined tables to separate disks. If tables A and B are joined,
lookups of table A can be performed on one drive while lookups of table B can be done on a second drive.
Moving Write-Ahead Log Symbolic links can be used to move the pg_xlog directory to a different disk drive. (Pg_xlog
exists in POSTGRESQL releases 7.1 and later.) Unlike other writes, POSTGRESQL log writes must be flushed to
disk before completing a transaction. The cache cannot be used to delay these writes. Having a separate disk
for log writes allows the disk head to stay on the current log cylinder so writes can be performed without head
movement delay. (You can use the postgres -F parameter to prevent log writes from being flushed to disk, but an
operating system crash may require a restore from backup.)
Other options include the use of RAID disks to spread a single filesystem across several drives. Mirroring can slow
down database writes, but will speed database reads because data can be retrieved from either drive.
File Systems
Some operating systems support multiple disk file systems. In such cases, it can be difficult to know which file system
performs best. POSTGRESQL usually performs best on traditional Unix file systems like the BSD UFS/FFS filesystems,
which many operating systems support. The default 8K block size of UFS is the same as POSTGRESQL’s page size. You
can run on journal and log-based file systems, but these cause extra overhead during fsync’s of the write-ahead log.
Older SvR3-based file systems become too fragmented to yield good performance.
5
File system choice is particularly difficult on Linux because there are so many file system choices, and none of
them are optimal: ext2 is not entirely crash-safe, ext3, XFS, and JFS are journal-based, and Reiser is optimized for small
files and does journalling. The journalling file systems can be significantly slower than ext2 but when crash recovery
is required, ext2 isn’t an option. If ext2 must be used, mount it with sync enabled. Some people recommend XFS or an
ext3 filesystem mounted with data=writeback.
NFS and other remote file systems are not recommended for use by POSTGRESQL. NFS does not have the same file
system semantics as a local file system, and these inconsistencies can cause data reliability or crash recovery problems.
Multiple CPUs
POSTGRESQL uses a multi-process model, meaning each database connection has its own Unix process. Because of
this, all multi-cpu operating systems can spread multiple database connections among the available CPUs. However,
if only a single database connection is active, it can only use one CPU. POSTGRESQL does not use multi-threading to
allow a single process to use multiple CPUs.
Checkpoints
When write-ahead log files fill up, a checkpoint is performed to force all dirty buffers to disk so the log file can be
recycled Checkpoints are also performed automatically at certain intervals, usually every 5 minutes. If there is a lot of
database write activity, the write-ahead log segments can fill too quickly, causing excessive slowness as all dirty disk
buffers are flushed to disk.
log_timestamp = true
After you force the postmaster to recognize this postgresql.conf change with pg_ctl reload, you will see these lines in
the PostgreSQL server log file:
Measure the duration between checkpoints to determine if you are checkpointing too frequently. As you can see from
the example above, checkpoints are happening ever fourty seconds, which is too frequent for good performance. Do
not be alarmed if there are several log entries with identical timestamps, as shown above. Multiple messages are often
generated by a single checkpoint.
checkpoint_segments = 3
The default value is three. Increase it until checkpoints happen only every few minutes. Another log message that
may appear is:
6
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
This message indicates that the wal_files parameter should be increased in postgresql.conf.
Conclusion
Fortunately, POSTGRESQL doesn’t require much tuning. Most parameters are automatically adjusted to maintain opti-
mum performance. Cache size and sort size are two parameters administrators can control to make better user of avail-
able memory. Disk access can also be spread across drives. Other parameters may be set in share/postgresql.conf.sample.
You can copy this file to data/postgresql.conf to experiment with some of POSTGRESQL’s even more exotic parameters.