Tuning PostgreSQL for performance
July 3, 2003 Copyright 2003 Shridhar Daithankar and Josh Berkus.
Authorized for re-distribution only under the PostgreSQL license (see www.postgresql.org/license).
Table of Contents
1 Introduction
2 Some basic parameters
2.1 Shared buffers
2.2 Sort memory
2.3 Effective Cache Size
2.4 Fsync and the WAL files
3 Some less known parameters
3.1 random_ page_cost
3.2 Vacuum_ mem
3.3 max_fsm_pages
3.4 max fsm_ relations
3.5 wal_buffers
4 Other tips
4.1 Check your file system
4.2 Try the Auto Vacuum daemon
4.3 Try FreeBSD
5 The CONF Setting Guide
1 Introduction
This is a quick start guide for tuning PostgreSQL's settings for performance. This assumes minimal familiarity with PostgreSQL administration. In particular, one should know,There are two important things for any performance optimization:
- Decide what level of performance you want
If you don't know your expected level of performance, you will end up chasing a carrot always couple of meters ahead of you. The performance tuning measures give diminishing returns after a certain threshold. If you don't set this threshold beforehand, you will end up spending lot of time for minuscule gains.
- Know your load
This document focuses entirely tuning postgresql.conf best for your existing setup. This is not the end of performance tuning. After using this document to extract the maximum reasonable performance from your hardware, you should start optimizing your application for efficient data access, which is beyond the scope of this article.
- Hardware Selection and Setup
Databases are very bound to your system's I/O (disk) access and memory usage. As such, selection and configuration of disks, RAID arrays, RAM, operating system, and competition for these resources will have a profound effect on how fast your database is. We hope to have a later article covering this topic.
- Efficient Application Design
Your application also needs to be designed to access data efficiently, though careful query writing, planned and tested indexing, good connection management, and avoiding performance pitfalls particular to your version of PostgreSQL. Expect another guide someday helping with this, but really it takes several large books and years of experience to get it right ... or just a lot of time on the mailing lists.
2 Some basic parameters
2.1 Shared buffers
Shared buffers defines a block of memory that PostgreSQL will use to hold requests that are awaiting attention from the kernel buffer and CPU. The default value is quite low for any real world workload and need to be beefed up. However, unlike databases like Oracle, more is not always better. There is a threshold above which increasing this value can hurt performance.- Start at 4MB (512) for a workstation
- Medium size data set and 256-512MB available RAM: 16-32MB (2048-4096)
- Large dataset and lots of available RAM (1-4GB): 64-256MB (8192-32768)
2.2 Sort memory
This parameter sets maximum limit on memory that a database connection can use to perform sorts. If your queries have order-by or group-by clauses that require sorting large data set, increasing this parameter would help. But beware: this parameter is per sort, per connection. Think twice before setting this parameter too high on any database with many users. A recommended approach is to set this parameter per connection as and when required; that is, low for most simple queries and higher for large, complex queries and data dumps.
2.3 Effective Cache Size
This parameter allows PostgreSQL to make best possible use of RAM available on your server. It tells PostgreSQL the size of OS data cache. So that PostgreSQL can draw different execution plan based on that data.
2.4 Fsync and the WAL files
This parameters sets whether or not write data to disk as soon as it is committed, which is done through Write Ahead Logging (WAL). If you trust your hardware, your power company, and your battery power supply enough, you set this to No for an immediate boost to data write speed. But be very aware that any unexpected database shutdown will force you to restore the database from your last backup.3 Some less known parameters
3.1 random_page_cost
This parameter sets the cost to fetch a random tuple from the database, which influences the planner's choice of index vs. table scan. This is set to a high value as the default default based on the expectation of slow disk access. If you have reasonably fast disks like SCSI or RAID, you can lower the cost to 2. You need to experiment to find out what works best for your setup by running a variety of queries and comparing execution times.3.2 Vacuum_mem
This parameter sets the memory allocated to Vacuum. Normally, vacuum is a disk intensive process, but raising this parameter will speed it up by allowing PostgreSQL to copy larger blocks into memory. Just don't set it so high it takes significant memory away from normal database operation. Things between 16-32MB should be good enough for most setups.3.3 max_fsm_pages
PostgreSQL records free space in each of its data pages. This information is useful for vacuum to find out how many and which pages to look for when it frees up the space.3.4 max _ fsm _ relations
This setting dictates how many number of relations (tables) will be tracked in free space map. Again this is a database cluster-wide setting, so set it accordingly. In version 7.3.3 and later, this parameter should be set correctly as a default. In older versions, bump it up to 300-1000.3.5 wal_buffers
This setting decides the number of buffers WAL(Write ahead Log) can have. If your database has many write transactions, setting this value bit higher than default could result better usage of disk space. Experiment and decide. A good start would be around 32-64 corresponding to 256-512K memory.4 Other tips
4.1 Check your file system
On OS like Linux, which offers multiple file systems, one should be careful about choosing the right one from a performance point of view. There is no agreement between PostgreSQL users about which one is best.4.2 Try the Auto Vacuum daemon
There is a little known module in PostgreSQL contrib directory called as pgavd. It works in conjunction with statistics collector. It periodically connects to a database and checks if it has done enough operations since the last check. If yes, it will vacuum the database.4.3 Try FreeBSD
Large updates, deletes, and vacuum in PostgreSQL are very disk intensive processes. In particular, since vacuum gobbles up IO bandwidth, the rest of the database activities could be affected adversely when vacuuming very large tables.If you are not done with your choice of OS for your server platform, consider BSD for this reason.
5 The CONF Setting Guide
Available here is an Annotated Guide to the PostgreSQL configuration file settings, in both OpenOffice.org and PDF format. This guide expands on the official documentation and may eventually be incorporated into it.- The first column of the chart is the GUC setting in the postgresql.conf file.
- The second is the maximum range of the variable; note that the maximum range is often much larger than the practical range. For example, random_page_cost will accept any number between 0 and several billion, but all practical numbers are between 1 and 5.
- The third column contains an enumeration of RAM or disk space used by each unit of the parameter.
- The fourth column indicates whether or not the variable may be SET from the PSQL terminal during an interactive setting. Most settings marked as "no" may only be changed by restarting PostgreSQL.
- The fifth column quotes the official documentation available from the PostgreSQL web site.
- The last column is our notes on the setting, how to set it, resources it uses, etc. You'll notice some blank spaces, and should be warned as well that there is still strong disagreement on the value of many settings.
As noted in the worksheet, it covers PostgreSQL versions 7.3 and 7.4. If you are using an earlier version, you will not have access to all of these settings, and defaults and effects of some settings will be different.
No comments:
Post a Comment