Thursday 21 April 2011

Busy Times

It's been 6 months since I found time to blog, which I guess shows how much I had been concentrating on getting Sync Replication finished.

Sync Replication is the raison d'etre for in-database replication. Only by bringing replication to the database layer can we control the replication process in a useful way. Did it have to be transaction log shipping replication? No, I guess it might have been possible to do sync rep using other mechanisms such as triggers or writesets but the transaction log seemed the most natural way to go, at least initially.

Now its done, I breathe a sigh of relief after 7 full years of work. The strange thing is that in order to fund such a task I needed to build a company, 2ndQuadrant. It's kind of like having to build the ramp up which the blocks of stone would travel for the pyramids of ancient Egypt. Anyway, its a good thing because it's brought together many contributors and opened up funding mechanisms to do the things we want to do with PostgreSQL.

Now it's finished, I see all the other tasks still to do, so I'll be busy a while longer yet. Feature complete, no way.

I'm pleased that I got all the essential features into sync rep that I was looking for. Transaction controlled replication, minimal bandwidth usage, shared memory queues ordered by xlog pointers, avoidance of complex configuration details and most importantly an approach everyone agrees is robust.

I hadn't realised it, but the sync rep implementation is actually better than MySQL's semi-synchronous replication. Don't think anybody set out to do that, just as usual the PostgreSQL approach to building things seems to end up with a rigorous design and implementation.

I'm thinking about replication because I've just been assembling the talk proposals for CHAR(11), the conference on Clustering, High Availability and Replication. The Call for Papers for the CHAR(11) conference is now closed, though we have a very cool lineup of speakers. Even better than CHAR(10) last year.

As ever, more on all of the above another time.

5 comments:

  1. Is the subset you completed useable by others and will it be released?

    ReplyDelete
  2. Not really sure what you mean by subset? Sync rep is part of the 9.1 release, about to go into beta.

    http://developer.postgresql.org/pgdocs/postgres/warm-standby.html#SYNCHRONOUS-REPLICATION

    ReplyDelete
  3. Why is PG synchronous replication better than MySQL's semi-sync? That is a big statement without any follow-up of why you think that.

    ReplyDelete
  4. @Harrison: I was referring to the stricter implementation in the PostgreSQL version. With both implementations, we commit the transaction and then wait while the commit is passed to the standby before replying to
    the user. With PostgreSQL, a second session cannot see any of the changes made by the first transaction until the reply has been received. With MySQL, a second session can see changes before the reply has been received, making it possible to act on information before it is definitely safe. That's why MySQL is just "semi-sync" and not "sync". Might seem a minor point, but it makes the difference between zero data loss and non-zero data loss.

    ReplyDelete
  5. I think the PG implementation is awesome. I haven't read the PG docs but I hope they are clear about the differences between sync and semi-sync.

    I worked with Wei Li on the first implementation of semi-sync while at Google. The official implementation from MySQL is similar to what we did.


    . I lurked on the lists early in the discussion and the differences

    ReplyDelete