Wednesday, 17 August 2011

R is for Innovation

I'm pleased to note that Teradata just announced a plugin for the R language.

As many of you will know, PostgreSQL has supported server functions written in the R language for many years. So its good that Teradata has seen the light at last and by doing so has validated the innovations that PostgreSQL has made.

That means the list of databases that have responded directly to innovations in PostgreSQL, now extends to Oracle, Informix(Illustra), SQLServer, Sybase, DB2, Teradata. Of course, MySQL have been trying to catch up for a long time,

That pretty much is the complete set. Cool. Well, almost.

I'm intrigued as to what NoSQL vendors think will happen next. If their core values are simplicity then what new features can they add without going back on their core philosophy. Austerity isn't something you can have more of, is it? Let's wait and see what happens when the VC runs out.

PostgreSQL really is in a leadership position with regards to database innovation. And I'm happier than ever to be part of this phenomenon.

Tuesday, 19 July 2011

Cascading Replication

Cascading Replication is now part of PostgreSQL 9.2, thanks to Fujii Masao.

The idea is that a streaming replication standby can also stream data onto other standbys. This allows a complex network of interrelated servers to fulfil the roles of High Availability, High Durability, Distributed data access capacity and Reporting requirements.

You can set up chained configurations like A -> B -> C.

or more complex arrangements like

This should make it much easier to reduce bandwidth for intercontinental replication.

Nice thing is that Hot Standby feedback works across the whole cluster, so you easily manage the interrelationships between servers.

CHAR(11) Conference Success

Finally recovered from attending CHAR(11) in Cambridge, UK. 2 complete days of Clustering, High Availability and Replication talks from various experts.

We had 15 talks from 14 speakers from US, Japan and from 8 European countries, including the keynote from Jan Wieck. Attendees came from US and all across Europe, many of whom could give detailed talks themselves. There's always next year...

The most amazing thing were the comments we received from attendees. Every talk was packed solid, and judging by the seats alone it seems almost everybody went to all the talks - for the whole talk. I don't recall a conference having such a good attendee rate, not even CHAR(10) last year.

Based on that, it looks pretty certain that we'll run CHAR(12) next year. We did discuss Japan for CHAR(12) but that's not going to be as easy as we'd hoped. Let's see how that goes.

I'm pleased with how everything ran, so a big thanks to the organising team.

Thanks very much to Koichi Suzuki for visiting again. The panel discussion between Postgres-XC, MGRID and Greenplum was very enlightening.

Thanks to all the speakers and attendees also.

Thursday, 16 June 2011

Five Nines

In High Availability we talk about "Five Nines" meaning 99.999% availability. I like to joke that a badly configured system has "Nine Fives" availability or 55.555555% availability.

With a sensible architecture and good operational controls, data can be made "Five Nines" safe with PostgreSQL 9.1.

I was reminded today that "Five Nines" had another meaning in an earlier age. Wilfrid Owen's wartime poetry describes

And towards our distant rest began to trudge.
Men marched asleep. Many had lost their boots
But limped on, blood-shod. All went lame; all blind;
Drunk with fatigue; deaf even to the hoots
Of tired, outstripped Five-Nines that dropped behind.

meaning artillery shells falling away from the target of the front line troops.

The poem ends with an exhortation to learn from earlier mistakes

My friend, you would not tell with such high zest
To children ardent for some desperate glory,
The old Lie: Dulce et decorum est
Pro patria mori.

"How sweet and fitting it is to die for one's country"

I'm sure there's a modern message there, but I'll leave that up to you.

Wednesday, 4 May 2011

Gentlemen, Start your Engines

The racing season is upon us. We have both the Le Mans 24 hour race and the Indy 500 coming in the next month, both long distance, high speed motor racing events.

We also have the beta of PostgreSQL 9.1 and associated tools.

Just like motor sport, a 5 minute engine test proves very little. Only good solid usage at high levels of performance will prove whether the engine is good enough to be world class.

Just like a race, we have deadlines and we must remember we aren't the only people in the world producing database software. The deadline is more important this year because we are attempting to cut the time of the beta cycle down by weeks and months.

The PostgreSQL project needs you to start your engines. Start testing PostgreSQL 9.1 as soon as possible and take it to the very limits of durability and performance.

Make the tests run for 500 miles and/or 24 hours. Report the results, in detail.

Do it. Do it now.

Thursday, 28 April 2011

Brand New Kayak

I've just bought a new Dagger GT max kayak on eBay, just collected it today.

It's shorter than my old Mountain Bat, with a flat bottom for surfing and rivers, and tramlines to allow you edge through turns better. Lots of padding and a back rest that actually works. And its bright yellow.

Like most things, it makes me think about databases in a new way.

First the buying experience: I'm ecstatic, but I've not been in the water yet. Why am I ecstatic? Well, its yellow and has got lots of features I'm interested in. And its yellow. From that I take it that look and feel is important with a new product in addition to the real usability features. Uh, yeh, err... just like psql...

I realise that this might be the best I ever feel about the kayak. If it has shortcomings, then I'll be disappointed. Imagine a boat with very few issues, with footrests that can be adjusted to make it just right. That sounds like I boat I'd like, and a database too.

I also note that it's taken me 18 years to buy a new kayak. From that I learn that annoyances with products do build up over a period of time and that useful new features are important in changing. But kayak salesmen need to be patient and respect the views and wishes of paddlers with prior experience of other craft.

What made me change? A friend bought one. Not just that - I watched him go down some whitewater that I'd had trouble on, but he edged it like it wasn't there. From that I learn that word of mouth and references are important, but demonstrations are even better.

Now back to my first thought: why did I buy the Dagger? It's been interesting to watch kayak development over the intervening years, with all sorts of specialist kayaks emerging. Sea kayaks, river kayaks, whitewater and playboats. My feeling was that these were all too specialist. I wanted a boat I could use for short sea trips and whitewater. This made me think about Stonebraker's recent years. Why the fascination with all these specialist databases? They are good for some situations, no question. But how do you know the conditions you'll be facing? How can you trust you haven't selected something too specialised?

What I'd really like is a comfortable canoe that can be configured according to the conditions I meet. I don't really want a seacanoe or a playboat because then I'd need lots of different boats, all sitting waiting for the right situation. I know I can't have a modifiable kayak because its hull is made of PBS. But I can get that with software, if its configurable enough to meet my needs. Not hundreds of adjustments, just a few important parameters to allow me to tailor it to the major points of the current solution. Speed, stability, comfort, safety and security.

So, some important lessons for databases: How do I make PostgreSQL bright yellow?

Tuesday, 26 April 2011

Feedback on the PostgreSQL development process

For the last few weeks, the PostgreSQL Hackers list has been discussing how to improve the PostgreSQL development process.

You might be forgiven for asking "Why? What is wrong with it?". Indeed, you might.

The process has changed many times down the years. Essentially, the process revolves around a few key people with the knowledge and time to contribute reviews of the submitted patches. All of those people have got views about what's right and wrong with the exact current system.

What would be useful is to hear from people who
* never submitted a patch for a definite blocking reason
* submitted a patch but had it rejected
* wrote a first patch but were dissuaded from doing that again

If you'd like to review patches for PostgreSQL then we're short of manpower there. We're short of manpower because PostgreSQL believes that peer review is an essential technique to producing good code. You'll need to spend some time getting to understand the review process and guidelines and you may also need assistance on some technical aspects. Apart from that, reviews consist of asking questions like "Won't that break ALTER TABLE?" and observing "there's not enough code comments here, and no docs".

If you have feedback, or you can help, please join the hackers list and speak out.

Thursday, 21 April 2011

Busy Times

It's been 6 months since I found time to blog, which I guess shows how much I had been concentrating on getting Sync Replication finished.

Sync Replication is the raison d'etre for in-database replication. Only by bringing replication to the database layer can we control the replication process in a useful way. Did it have to be transaction log shipping replication? No, I guess it might have been possible to do sync rep using other mechanisms such as triggers or writesets but the transaction log seemed the most natural way to go, at least initially.

Now its done, I breathe a sigh of relief after 7 full years of work. The strange thing is that in order to fund such a task I needed to build a company, 2ndQuadrant. It's kind of like having to build the ramp up which the blocks of stone would travel for the pyramids of ancient Egypt. Anyway, its a good thing because it's brought together many contributors and opened up funding mechanisms to do the things we want to do with PostgreSQL.

Now it's finished, I see all the other tasks still to do, so I'll be busy a while longer yet. Feature complete, no way.

I'm pleased that I got all the essential features into sync rep that I was looking for. Transaction controlled replication, minimal bandwidth usage, shared memory queues ordered by xlog pointers, avoidance of complex configuration details and most importantly an approach everyone agrees is robust.

I hadn't realised it, but the sync rep implementation is actually better than MySQL's semi-synchronous replication. Don't think anybody set out to do that, just as usual the PostgreSQL approach to building things seems to end up with a rigorous design and implementation.

I'm thinking about replication because I've just been assembling the talk proposals for CHAR(11), the conference on Clustering, High Availability and Replication. The Call for Papers for the CHAR(11) conference is now closed, though we have a very cool lineup of speakers. Even better than CHAR(10) last year.

As ever, more on all of the above another time.