Tuesday 27 April 2010

Tuning Hot Standby

Focus for the last month has been on tuning the new PostgreSQL 9.0 features. The latest set of changes have been around Hot Standby.

Erik Rijkers published some good performance test results, very helpful, thanks. Mark Kirkwood also published some insightful results, also thanks.

After the latest set of patches the results show that we get equivalent performance between master and standby under heavy load of very fast indexes SELECTs. The peak reported transaction rate was 28,000 tps (transactions per second), showing that the internal mechanics of queries during Hot Standby are very efficient. This is great news and has taken much work from myself and the usual suspects.

What we have now is a very tight set of code that should provide good performance and scalability from all angles. Changes from the master are applied using minimal locking, so that a high rate of user queries should be possible even under a mixed workload of changes from master and many standby queries. What we don't yet have good independent performance tests to confirm that, or do further tuning.

Of course, there's no magic there. If we give the standby the same workload of changes that the master had and then we add a whole new workload we can probably bring a server to its knees. That's just CPU physics, so nothing surprising there. What we've done is relax the internal locking to allow multiple concurrent workloads to hit the server from both directions, and also allow PostgreSQL to scale well to high numbers of CPUs.

Anyway, bold words. I'm happy to hear from people that have got a good test environment, publish controlled tests and can answer detailed questions to help us isolate any strangeness in the tests. Particularly keen to have Dtrace installed, so either Solaris or SystemTap. pgbench is a good benchmarking tool for what we need, though other controlled testing is also welcome.

We need independent measurements of CPU overheads and processing rates of
* standby when no queries running, just replaying transactions
* standby when we have a mix of changes and queries
especially on multi-socket servers. It would also be useful to have comparisons of how moving from a single master to master plus slave can improve total cluster throughput.

Summary is that things are looking good so far. Is that the end? Doubt it. What I'm saying is that these new features can now be taken seriously enough to spend time on in-depth benchmarking of your workloads to prove them for yourselves and also to see if you can find some performance issues in the current code. Plan it now, but wait for beta, which is soon...

If you want to hear more about these types of issues, come to CHAR(10) conference:
http://www.char10.org/

No comments:

Post a Comment