Monday, 7 September 2009

Hot Standby

I'm currently working on the Hot Standby project, which kicked off design and development around July 2008 and was active through to February 2009. After a break to assist with the release of Postgres 8.4, the project began again in July 2009 as planned. It's definitely been a voyage of discovery.

Hot Standby seems to be a popular feature from two recent polls, so I guess that is why some people have been getting fairly anxious about the project recently. Anyway, what is it? Why is it important? Let me explain.

If you set up Log Shipping replication with PostgreSQL (called "Warm Standby" in the docs), you'll discover that the standby node can't run queries. All it does is keep processing changes so that it keeps mostly/nearly/eventually up to date with the Master - hence why its called a "warm" standby. Useful, but not enough, hence we move on to Hot Standby.

Why "Hot"? I've used the terms Cold/Warm/Hot similarly to the way they're used in other areas of High Availability computing. A Cold Standby/Spare is one that isn't available for immediate use and requires some work to get things configured. A Warm Standby/Spare is one that can be put into action fairly quickly, while a Hot Standby/Spare is one that can be used immediately without taking the system offline.

With Postgres, Hot Standby refers to the ability to

* Run read-only queries on a standby server that is being concurrently maintained by log shipping replication. We use full MVCC so read-only queries don't block incoming changes, nor do the changes lock out queries. There are some conflicts that can occur, discussed later.

* Switch from standby mode to normal/master-mode without stopping/restarting the server. So queries keep on running without interruption while the database server changes mode - hence it is "Hot". Many people are unaware of this second aspect of the feature.

The reasons for query conflicts are fairly detailed for a short blog, so I'll cover that in a later post. There'll be detailed docs with the patch when it's available around Sept 15.

The strange thing about Hot Standby is "it just works". When you run a few queries there's an initial buzz, but then you realise that it all works normally without much fuss. That's a shame really because this is probably the most technical patch to hit Postgres in years, involving many different parts of the internals. Seems fair though, after all, I admit I give scarcely no thought to all the people who spent time on the hardware and software that allows me to write this and for you to read it.

I'll be in Tokyo for the JPUG 10th Anniversary meeting later this year to present about Hot Standby. I'm very pleased to be attending and it would be a good place to learn more.

There's a practical course in London in late September if you want to work together while we cover the full details on Hot Standby and replication. Probably be more training dates and other events over the coming months also.

4 comments:

  1. hahaha, just works is a problem? i like those sort of problems!!! bring it on, would love to see this in action with postgres

    ReplyDelete
  2. One of the current issues with creating a Warm-Standby within PostgreSQL for a disaster recovery configuration, is the lack of the capability to easily switch back and forth from the master (primary) to the slave (alternate). This situation is desirable for several uses: disaster drills, network patching, OS patching, hardware updates, … If such a capability existed, then transactions against the primary would be stopped, the roles of primary and alternate are temporarily reversed (slave promotion), and the transactions are restarted against the new primary (previous alternate or slave). Then the roles can be reversed back.

    As this is not currently available within PostgreSQL, alternative (inferior replication) DR configurations have to be evaluated.

    My feeling is the improving of the Warm-Standby is more important than the creation of a Hot-Standby. Making PostgreSQL's Warm-Standby more akin to Oracle's Data Guard. Yes, Oracle also released a Hot-Standby configuration within its Data Guard technology referred to as Active Data Guard (with an additional cost).

    ReplyDelete
  3. Thanks for your comments, anon. I guess if it hurts, you're probably doing it wrong. It's true that there is no "switchback" command and that you have to re-issue the commands as if you were starting from scratch. The underlying technology is for you to choose, so many people choose tools like rsync which can significantly reduce the time to re-run the switchback operation by avoiding transferring data that is already the same. That's why most people think what you've described isn't a problem at all and that leaves Hot Standby as a popular feature. I'm not sure why some people overlook this, but they do, so you're not alone.

    ReplyDelete
  4. Another feature I am eager to see is read-only table/tablespace. Right now in my production database there are about 90% of data which are read-only. From time to time, autovacuum picks up these table to prevent XID wrap-around. If read-only table/tablespace is available, I hope I can put these tables in a filesystem that can shared among several database nodes. It will save a lot of disk space.

    ReplyDelete