Thursday, 28 April 2011

Brand New Kayak

I've just bought a new Dagger GT max kayak on eBay, just collected it today.

It's shorter than my old Mountain Bat, with a flat bottom for surfing and rivers, and tramlines to allow you edge through turns better. Lots of padding and a back rest that actually works. And its bright yellow.

Like most things, it makes me think about databases in a new way.

First the buying experience: I'm ecstatic, but I've not been in the water yet. Why am I ecstatic? Well, its yellow and has got lots of features I'm interested in. And its yellow. From that I take it that look and feel is important with a new product in addition to the real usability features. Uh, yeh, err... just like psql...

I realise that this might be the best I ever feel about the kayak. If it has shortcomings, then I'll be disappointed. Imagine a boat with very few issues, with footrests that can be adjusted to make it just right. That sounds like I boat I'd like, and a database too.

I also note that it's taken me 18 years to buy a new kayak. From that I learn that annoyances with products do build up over a period of time and that useful new features are important in changing. But kayak salesmen need to be patient and respect the views and wishes of paddlers with prior experience of other craft.

What made me change? A friend bought one. Not just that - I watched him go down some whitewater that I'd had trouble on, but he edged it like it wasn't there. From that I learn that word of mouth and references are important, but demonstrations are even better.

Now back to my first thought: why did I buy the Dagger? It's been interesting to watch kayak development over the intervening years, with all sorts of specialist kayaks emerging. Sea kayaks, river kayaks, whitewater and playboats. My feeling was that these were all too specialist. I wanted a boat I could use for short sea trips and whitewater. This made me think about Stonebraker's recent years. Why the fascination with all these specialist databases? They are good for some situations, no question. But how do you know the conditions you'll be facing? How can you trust you haven't selected something too specialised?

What I'd really like is a comfortable canoe that can be configured according to the conditions I meet. I don't really want a seacanoe or a playboat because then I'd need lots of different boats, all sitting waiting for the right situation. I know I can't have a modifiable kayak because its hull is made of PBS. But I can get that with software, if its configurable enough to meet my needs. Not hundreds of adjustments, just a few important parameters to allow me to tailor it to the major points of the current solution. Speed, stability, comfort, safety and security.

So, some important lessons for databases: How do I make PostgreSQL bright yellow?


  1. "But I can get that with software, if its configurable enough to meet my needs."

    Configuration is certainly important, but it's easy to go too far. Software can blur the line between "configuration" and "selecting an entirely different product".

    For instance, in my opinion, the choice between MyISAM and InnoDB is too close to the latter. In general, configuration that really changes the semantics often goes too far.

    I absolutely agree about the specialization. Any significant project will go through a wide range of demands and changes over time. Having 10 database systems (or one that offers a bunch of "configuration" options that radically change the behavior) isn't any more useful than having 10 kayaks at home.

  2. I agree the "storage engine" concept doesn't really work very well. It's too deep a change and not possible to do cleanly, like having a kayak with pluggable bow and stern - it would leak like crazy. That's a lesson learned, but for me the lesson is not "pluggable is bad" just that it needs to be done right.

  3. Simon and Jeff,

    Well said. This is one of the reasons I like PostgreSQL. It allows me to plugin new features without messing up what I've got going already. If I need hstore for something I add it and it doesn't really affect the other stuff I've got coded. I need to do scientific stats beyond what PostgreSQL provides, I add PL/R and so forth.

    For example with MyISAM vs. InnoDB if you change from InnoDb to MyISAM you've suddenly got to worry about the differences in how SQL, constraints (or lack there of), indexes are implemented and how it ruins your existing code that works on those tables you need to convert or you have to create a sideline table and join. The same issue with GraphDB Storage engine they have as far as I can see.

    For many use cases PostgreSQL LTree and MySQL oqgraph serve the same purpose, but LTree is much cleaner because my tree lives in the same table with my data as just another column. None if this having to keep track of a separate table. It's much easier from a trigger perspective to keep a column in synch than a parallel table or at least I find it to be easier.