Thursday, 28 April 2011

Brand New Kayak

I've just bought a new Dagger GT max kayak on eBay, just collected it today.

It's shorter than my old Mountain Bat, with a flat bottom for surfing and rivers, and tramlines to allow you edge through turns better. Lots of padding and a back rest that actually works. And its bright yellow.

Like most things, it makes me think about databases in a new way.

First the buying experience: I'm ecstatic, but I've not been in the water yet. Why am I ecstatic? Well, its yellow and has got lots of features I'm interested in. And its yellow. From that I take it that look and feel is important with a new product in addition to the real usability features. Uh, yeh, err... just like psql...

I realise that this might be the best I ever feel about the kayak. If it has shortcomings, then I'll be disappointed. Imagine a boat with very few issues, with footrests that can be adjusted to make it just right. That sounds like I boat I'd like, and a database too.

I also note that it's taken me 18 years to buy a new kayak. From that I learn that annoyances with products do build up over a period of time and that useful new features are important in changing. But kayak salesmen need to be patient and respect the views and wishes of paddlers with prior experience of other craft.

What made me change? A friend bought one. Not just that - I watched him go down some whitewater that I'd had trouble on, but he edged it like it wasn't there. From that I learn that word of mouth and references are important, but demonstrations are even better.

Now back to my first thought: why did I buy the Dagger? It's been interesting to watch kayak development over the intervening years, with all sorts of specialist kayaks emerging. Sea kayaks, river kayaks, whitewater and playboats. My feeling was that these were all too specialist. I wanted a boat I could use for short sea trips and whitewater. This made me think about Stonebraker's recent years. Why the fascination with all these specialist databases? They are good for some situations, no question. But how do you know the conditions you'll be facing? How can you trust you haven't selected something too specialised?

What I'd really like is a comfortable canoe that can be configured according to the conditions I meet. I don't really want a seacanoe or a playboat because then I'd need lots of different boats, all sitting waiting for the right situation. I know I can't have a modifiable kayak because its hull is made of PBS. But I can get that with software, if its configurable enough to meet my needs. Not hundreds of adjustments, just a few important parameters to allow me to tailor it to the major points of the current solution. Speed, stability, comfort, safety and security.

So, some important lessons for databases: How do I make PostgreSQL bright yellow?

Tuesday, 26 April 2011

Feedback on the PostgreSQL development process

For the last few weeks, the PostgreSQL Hackers list has been discussing how to improve the PostgreSQL development process.

You might be forgiven for asking "Why? What is wrong with it?". Indeed, you might.

The process has changed many times down the years. Essentially, the process revolves around a few key people with the knowledge and time to contribute reviews of the submitted patches. All of those people have got views about what's right and wrong with the exact current system.

What would be useful is to hear from people who
* never submitted a patch for a definite blocking reason
* submitted a patch but had it rejected
* wrote a first patch but were dissuaded from doing that again

If you'd like to review patches for PostgreSQL then we're short of manpower there. We're short of manpower because PostgreSQL believes that peer review is an essential technique to producing good code. You'll need to spend some time getting to understand the review process and guidelines and you may also need assistance on some technical aspects. Apart from that, reviews consist of asking questions like "Won't that break ALTER TABLE?" and observing "there's not enough code comments here, and no docs".

If you have feedback, or you can help, please join the hackers list and speak out.

Thursday, 21 April 2011

Busy Times

It's been 6 months since I found time to blog, which I guess shows how much I had been concentrating on getting Sync Replication finished.

Sync Replication is the raison d'etre for in-database replication. Only by bringing replication to the database layer can we control the replication process in a useful way. Did it have to be transaction log shipping replication? No, I guess it might have been possible to do sync rep using other mechanisms such as triggers or writesets but the transaction log seemed the most natural way to go, at least initially.

Now its done, I breathe a sigh of relief after 7 full years of work. The strange thing is that in order to fund such a task I needed to build a company, 2ndQuadrant. It's kind of like having to build the ramp up which the blocks of stone would travel for the pyramids of ancient Egypt. Anyway, its a good thing because it's brought together many contributors and opened up funding mechanisms to do the things we want to do with PostgreSQL.

Now it's finished, I see all the other tasks still to do, so I'll be busy a while longer yet. Feature complete, no way.

I'm pleased that I got all the essential features into sync rep that I was looking for. Transaction controlled replication, minimal bandwidth usage, shared memory queues ordered by xlog pointers, avoidance of complex configuration details and most importantly an approach everyone agrees is robust.

I hadn't realised it, but the sync rep implementation is actually better than MySQL's semi-synchronous replication. Don't think anybody set out to do that, just as usual the PostgreSQL approach to building things seems to end up with a rigorous design and implementation.

I'm thinking about replication because I've just been assembling the talk proposals for CHAR(11), the conference on Clustering, High Availability and Replication. The Call for Papers for the CHAR(11) conference is now closed, though we have a very cool lineup of speakers. Even better than CHAR(10) last year.

As ever, more on all of the above another time.