Thursday, 24 June 2010

Hot Bugs & Cold Beer

2ndQuadrant is sponsoring a number of Bugs & Beer events at PostgreSQL user groups around the world.

The idea is to get together with your friends and see if we can shake out any more bugs in Hot Standby prior to release.

Particular focus on these areas:
* transactions holding locks before/during/after base backup taken
* prepared transactions
* correctness of results
* VACUUMs on master
* any other ways you can find to break it

Please write to me at if you'd like funding for a local event.

All the best,

Thursday, 17 June 2010

What's a VLF?

I read a good blog called

The funniest thing about it for me was the artful blogger talked about having 87,000 VLFs, taking up 1.5TB, though didn't mention what a VLF was.

Anyway, I checked - VLFs are WAL files in SQLServer. In case, like me, you didn't know, VLF stands for Virtual Log File. So it's a virtual physical thing. Got it. :-\

Which means the blogged-about related to 1.5TB worth of WAL files in the equivalent of pg_xlog. Ouch!

Ah! That reminds me: on both SQLServer and DB2 the log contains undo information, which means that the transaction log can't reuse files while transactions are still active. So long running write transactions causes the log to grow in size and can eventually fill the disk. This can be a real pain for DBAs and requires workarounds. That effect doesn't occur with PostgreSQL, so pg_xlog never overflows for that reason. With PostgreSQL, undo information is effectively stored in the database itself in the form of previous row versions. So a database grows as changes occur. Long running write statements can have a similar effect in PostgreSQL, but not long running transactions. So, one more problem that us Postgres-people get to miss out on.

And just to complete the circle, I will avoid explaining what a WAL file is here.

Smoothing replication

At first, streaming replication looked pretty raw though I've been doing a thorough review of the code. I guess that's probably how Hot Standby looked at first as well. Now, I'm getting quite excited at the way things are shaping up.

Originally the WALSender process waited for 200ms after sending each chunk of data. That's now been fixed so that WALSender will stream continually until no outstanding WAL data remains.

Also, the max chunk size has been reduced to 128kB, which is now the same default size used by DRBD.

On Monday, I noticed that the WALSender was sending WAL data before it had been fsynced. So in case of a crash that could cause a problem. That only really matters if you're using synchronous_commit = off or fsync = off. (Though fsync = off and replication just have no business being configured simultaneously).

Lots of smaller changes have also come through in recent months, so now we have the ability to get log messages when replication connects or disconnects. We also now have special messages if the replication connection is refused by the primary via the pg_hba.conf.

From here, it's looking like streaming replication is fast, efficient and low latency. It's also easy to use and easy to troubleshoot. It's an impressive community effort.

I'm also happy to say that almost all of the features of pg_standby have been assimilated into core now, so the only external module you'll need is something called pg_archivecleanup which is new in this release.

Next few months we're going to see designs emerge for synchronous replication. That is going to require some subtle coding to get good. It sounds like a couple of people are planning prototypes of how that should work.