Monday, September 18, 2006

PostgreSQL 8.0 Performance Checklist


This is a set of rules of thumb for setting up your PostgreSQL 8.0 server. A lot of the below is based on anecdotal evidence or practical scaling tests; there's a lot about database performance that we, and OSDL, are still working out. However, this should get you started. All information below is useful as of January 12, 2005 and will likely be updated later. Discussions of settings below supercede the recommendations I've made on General Bits.

Five Hardware Principles of Setting Up Your PostgreSQL Server

  1. Disks > RAM > CPU
    If you're going to throw money at a PostgreSQL server, throw it at high-performance disk arrays and get average processors and adequate RAM. If you have a little more money, get more RAM. PostgreSQL, like other ACID-compliant RDBMSes, is very I/O intensive, and it's a rare application that taxes the CPU more than the SCSI card (some do exist, though). This applies to small servers as well as it applies to eight-ways with NetApps; get the cost-effective CPU if it allows you to buy a high-end RAID card and many disks.
  2. More Spindles == Better
    Given multiple disks, PostgreSQL and most operating systems will parallelize read and write requests on the database. This makes an enormous difference on transaction-processing systems, and a significant improvement on any application where the entire database does not fit in RAM. Given today's minimum disk sizes (72GB) you might be tempted to use just one disk, or a single RAID 1 mirrored pair; however, you'll find that using 4, 6, or even 14 disks will yield performance boosts. Oh, and SCSI is still signifiacantly higher DB throughput than IDE, even with Serial ATA.
  3. Seperate the Transaction Log from the Database:
    Assuming that you've already forked out the money for a decent-sized array, there are more intelligent options than throwing everything on a single RAID. For one thing, putting the database transaction log (pg_xlog) on its own, dedicated disk resource (an array or plain disk), makes as much as a 12% difference in performance on databases with high write activity. This is especially vital on small systems with slow SCSI or IDE disks: even in a two-disk server, you can put the transaction log onto the operating system disk and reap some benefits.
  4. RAID 1+0/0+1 > RAID 5:
    RAID 5 with 3 disks has become an unfortunate standard among large vendor's economy servers. This is possibly the slowest array configuration possible for PostgreSQL; you can expect as little as 50% of the query speed as you would get with a plain SCSI disk. Instead, focus on RAID 1 or 1+0 or 0+1 for any set of 2, 4 or 6 disks. Over 6 disks, RAID 5 starts to perform acceptably again, and the comparison tends to be a lot more on the basis of your individual controller. Perhaps more importantly, a cheap RAID card can be a liability; often it is better to use software RAID than the Adaptec built-in card that came with your server.
  5. Applications must play nice together:
    The other big mistake I see a lot of organizations making is putting PostgreSQL together on a server with several other applications which compete for the same resources. Worst among these is putting PostgreSQL and other RDBMSes on the same machine; both database systems will fight over disk bandwith and the OS disk cache, and both will perform poorly. Document servers and security logging programs are almost as bad. PostgreSQL can share a machine with applications which are mainly CPU-and-RAM-intensive, such as Apache, provided that there is enough RAM.

Twelve Settings You'll Want To Adjust in Your PostgreSQL.Conf File

There's a truly frightening amount of new options in the PostgreSQL.conf file. Even once-familiar options from the last 5 versions have changed names and parameter formats. It is intended to give you, the database administrator, more control, but can take some getting used to.

What follows are the settings that most DBAs will want to change, focused more on performance than anything else. There are quite a few "specialty" settings which most users won't touch, but those that use them will find indispensable. For those, you'll have to wait for the book.

Remember: PostgreSQL.conf settings must be uncommented to take effect, but re-commenting them does not necessarily restore the default values!

Connecting

listen_addresses: Replaces both the tcp_ip and virtual_hosts settings from 7.4. Defaults to localhost in most installations, allowing only connections on the console. Many DBAs will want to set this to "*", meaning all available interfaces, after setting proper permissions in the pg_hba.conf file, in order to make PostgreSQL accessable to the network. As an improvment over previous versions, the "localhost" default does permit connections on the "loopback" interface, 127.0.0.1, enabling many server browser-based utilities.

max_connections: exactly like previous versions, this needs to be set to the actual number of simultaneous connections you expect to need. High settings will require more shared memory (shared_buffers). As the per-connection overhead, both from PostgreSQL and the host OS, can be quite high, it's important to use connection pooling if you need to service a large number of users. For example, 150 active connections on a medium-end single-processor 32-bit Linux server will consume significant system resources, and 600 is about the limit on that hardware. Of course, beefier hardware will allow more connections.

Memory

shared_buffers: As a reminder: This figure is NOT the total memory PostgreSQL has to work with. It is the block of dedicated memory PostgreSQL uses for active operations, and should be a minority of your total RAM on the machine, since PostgreSQL uses the OS disk cache as well. Unfortunately, the exact amount of shared buffers required is a complex calculation of total RAM, database size, number of connections, and query complexity. Thus it's better to go with some rules of thumb in allocating, and monitor the server (particuarly pg_statio views) to determine adjustments.
On dedicated servers, useful values seem to be between between 8MB and 400MB (between 1000 and 50,000 for 8K page size). Factors which raise the desired shared buffers are larger active portions of the database, large complex queries, large numbers of simultaneous queries, long-running procedures or transactions, more available RAM, and faster/more CPUs. And, of course, other applications on the machine. Contrary to some expectations, allocating much too much shared_buffers can actually lower peformance, due time required for scanning. Here's some examples based on anecdotes and TPC tests on Linux machines:
  • Laptop, Celeron processor, 384MB RAM, 25MB database: 12MB/1500
  • Athlon server, 1GB RAM, 10GB decision-support database: 120MB/15000
  • Quad PIII server, 4GB RAM, 40GB, 150-connection heavy transaction processing database: 240MB/30000
  • Quad Xeon server, 8GB RAM, 200GB, 300-connection heavy transaction processing database: 400MB/50000
Please note that increasing shared_buffers, and a few other memory parameters, will require you to modify your operating system's System V memory parameters. See the main PostgreSQL documentation for instructions on this.

work_mem: used to be called sort_mem, but has been re-named since it now covers sorts, aggregates, and a few other operations. This is non-shared memory, which is allocated per-operation (one to several times per query); the setting is here to put a ceiling on the amount of RAM any single operation can grab before being forced to disk. This should be set according to a calculation based on dividing the available RAM (after applications and shared_buffers) by the expected maximum concurrent queries times the average number of memory-using operations per query.
Consideration should also be paid to the amount of work_mem needed by each query; processing large data sets requires more. Web database applications generally set this quite low, as the number of connections is high but queries are simple; 512K to 2048K generally suffices. Contrawise, decision support applications with their 160-line queries and 10 million-row aggregates often need quite a lot, as much as 500MB on a high-memory server. For mixed-use databases, this parameter can be set per connection, at query time, in order to give more RAM to specific queries.

maintenance_work_mem: formerly called vacuum_mem, this is the quantity of RAM PostgreSQL uses for VACUUM, ANALYZE, CREATE INDEX, and adding foriegn keys. You should raise it the larger your database tables are, and the more RAM you have to spare, in order to make these operations as fast as possible. A setting of 50% to 75% of the on-disk size of your largest table or index is a good rule, or 32MB to 256MB where this can't be determined.

Disk and WAL

checkpoint_segments: defines the on-disk cache size of the transaction log for write operations. You can ignore this in mostly-read web database, but for transaction processing databases or reporting databases involving large data loads, raising it is performance-critical. Depening on the volume of data, raise it to between 12 and 256 segments, starting conservatively and raising it if you start to see warning messages in the log. The space required on disk is equal to (checkpoint_segments * 2 + 1) * 16MB, so make sure you have enough disk space (32 means over 1GB).

max_fsm_pages: sizes the register which tracks partially empty data pages for population with new data; if set right, makes VACUUM faster and removes the need for VACUUM FULL or REINDEX. Should be slightly more than the total number of data pages which will be touched by updates and deletes between vacuums. The two ways to determine this number are to run VACUUM VERBOSE ANALYZE, or if using autovacuum (see below) set this according to the -V setting as a percentage of the total data pages used by your database. fsm_pages require very little memory, so it's better to be generous here.

vacuum_cost_delay: If you have large tables and a significant amount of concurrent write activity, you may want to make use of a new feature which lowers the I/O burden of VACUUMs at the cost of making them take longer. As this is a very new feature, it's a complex of 5 dependant settings for which we have only a few performance tests. Increasing vacuum_cost_delay to a non-zero value turns the feature on; use a reasonable delay, somewhere between 50 and 200ms. For fine tuning, increasing vacuum_cost_page_hit and decreasing vacuum_cost_page_limit will soften the impact of vacuums and make them take longer; in Jan Wieck's tests on a transaction processing test, a delay of 200, page_hit of 6 and limit of 100 decreased the impact of vacuum by more than 80% while tripling the execution time.

Query Planner

These settings allow the query planner to make accurate estimates of operation costs, and thus pick the best possible query plan. There are two global settings worth bothering with:

effective_cache_size: tells the query planner the largest possible database object that could be expected to be cached. Generally should be set to about 2/3 of RAM, if on a dedicated server. On a mixed-use server, you'll have to estimate how much of the RAM and OS disk cache other applications will be using and subtract that.

random_page_cost: a variable which estimates the average cost of doing seeks for index-fetched data pages. On faster machines, with faster disks arrays, this should be lowered, to 3.0, 2.5 or even 2.0. However, if the active portion of your database is many times larger than RAM, you will want to raise the factor back towards the default of 4.0. Alternatively, you can base adjustments on query performance. If the planner seems to be unfairly favoring sequential scans over index scans, lower it; if it's using slow indexes when it shouldn't, raise it. Make sure you test a variety of queries. Do not lower it below 2.0; if that seems necessary, you need to adjust in other areas, like planner statistics.

Logging

log_destination: this replaces the unintuitive syslog setting in prior versions. Your choices are to use the OS's administrative log (syslog or eventlog) or to use a seperate PostgreSQL log (stderr). The former is better for system monitoring; the latter, better for database troubleshooting and tuning.

redirect_stderr: If you decide to go with a seperate PostgreSQL log, this setting allows you to log to a file using a native PostgreSQL utility instead of command-line redirection, allowing automated log rotation. Set it to True, and then set log_directory to tell it where to put the logs. The default settings for log_filename, log_rotation_size, and log_rotation_age are good for most people.

Autovacuum and You

As you tumble toward production on 8.0, you're going to want to set up a maintenance plan which includes VACUUMs and ANALYZEs. If your database involves a fairly steady flow of data writes, but does not require massive data loads and deletions or frequent restarts, this should mean setting up pg_autovacuum. It's better than time-scheduled vacuums because:

  • Tables are vacuumed based on their activity, avoiding vacuuming read-only tables.
  • The frequency of vacuums grows automatically with increasing database activity.
  • It's easier to calculate free space map requirements and avoid database bloat.

Setting up autovacuum requires an easy build of the module in the contrib/pg_autovacuum directory of your PostgreSQL source (Windows users should find autovaccuum included in the PGInstaller package). You turn on the stats configuration settings detailed in the README. Then you start autovacuum after PostgreSQL is started as a seperate process; it will shut down automatically when PostgreSQL shuts down.

The default settings for autovacuum are very conservative, though, and are more suitable for a very small database. I generally use something aggressive like:
-D -v 400 -V 0.4 -a 100 -A 0.3
This vacuums tables after 400 rows + 40% of the table has been updated or deleted, and analyzes after 100 rows + 30% of the table has been inserted, updated or deleted. The above configuration also lets me set my max_fsm_pages to 50% of the data pages in the database with confidence that that number won't be overrun, causing database bloat. We are currently testing various settings at OSDL and will have more hard figures on the above soon.

Note that you can also use autovacuum to set the Vacuum Delay options, instead of setting them in PostgreSQL.conf. Vacuum Delay can be vitally important for systems with very large tables or indexes; otherwise an untimely autovacuum call can halt an important operation.

There are, unfortunately, a couple of serious limitations to 8.0's autovacuum which will hopefully be eliminated in future versions:

  • Has no long-term memory: autovacuum forgets all activity tracking when you restart the database. So if you do regular restarts, you should do a full-database VACUUM ANALYZE right before or after.
  • Pays no attention to how busy the server is: there were plans to check system load before vacuuming, but that's not a current feature. So if you have extreme load peaks, autovacuum may not be for you.

No comments: