Thursday, 30 October 2014

Where lies the truth?

Ben Bradlee, the former editor of the Washington Post died recently. A famous speech of his from 1997 contains some words that mean something for me. It starts like this

"Newspapers don't tell the truth under many different, and occasionally innocent, scenarios. Mostly when they don't know the truth. Or when they quote someone who does not know the truth.

And more and more, when they quote someone who is spinning the truth, shaping it to some preconceived version of a story that is supposed to be somehow better than the truth, omitting details that could be embarrassing.

And finally, when they quote someone who is flat out lying...."

and summarises with

"Where lies the truth? That's the question that pulled us into this business, as it propelled Diogenes through the streets of Athens looking for an honest man. The more aggressive our search for the truth, the more people are offended by the press. The more complicated are the issues and the more sophisticated are the ways to disguise the truth, the more aggressive our search for the truth must be, and the more offensive we are sure to become to some. So be it."

before ending

"I take great strength from that now, knowing that in my experience the truth does emerge. It takes forever sometimes, but it does emerge. And that any relaxation by the press will be extremely costly to democracy."

Who would have thought that his words apply so well to PostgreSQL and the cost of data integrity? Yes, referential integrity does have an additional performance cost to make it work right, but how else can we be sure that we are passing valid data around? Surely the purpose of a database needs to be primarily a home for the truth, verified to be so by cross checks and constraints. (to be continued)

Wednesday, 23 January 2013

Reducing Contention with Foreign Key Locks

PostgreSQL's Foreign Key support underpins our ability to enforce data quality in the database.
A feature known as "Referential Integrity"allows us to link relational tables together to ensure the values in each table match.

Each lookup requires a lock to be applied to the row. In 9.2 and earlier that lock prevented writes to the row that was being checked. Which meant that UPDATEs against that row would be blocked, reducing performance in some applications. Worse, it also meant that deadlocks could occur in some cases.

The solution was a relatively simple idea: just lock the key value, not the whole row. That allows UPDATEs to proceed and application contention to be reduced in many cases.

Álvaro Herrera just committed a patch to make this happen. The simple idea required many new concepts and changes to make it work. The patch was so large it touched more than 100 files and thousands of changed lines.

Development took more than 2 years, with reviews from many different hackers to bring things to a state where it can be committed. So many people and companies were involved in this its hard to mention one without missing others. It's best to call this a true community effort in support of Álvaro and his work.

So in release 9.3, available in Sept 2013 we expect to see significant improvements in the performance of many applications. And it will make it easier to avoid those problems for new applications.

Thursday, 9 August 2012

PostgreSQL: The Multi-Model Database Server

I'd like to change the way we describe PostgreSQL.

Calling PostgreSQL and Object Relational database is misleading and years out of date. Yes, PostgreSQL is Relational and the project follows the SQL Standard very closely, but that's not all it does.

PostgreSQL supports all of the following:

* Relational
* Object Relational
* Nested Relational (record types)
* Array Store
* Key-Value Store (hstore)
* Document Store (XML, JSON)

and 9.2 adds

* Range Types

So what do we call it?

We support multiple models, so I guess we should call it a "Multi-Model Database".

The good thing here is that we support them all, in one platform, allowing you to join data together no matter what shape the data is held in.

Which means that PostgreSQL is a great General Purpose database and a great default choice for use by applications. Stonebraker has spoken out against the idea of a General Purpose database, but his interest is in bringing VC-funded startups to market, not in supporting production systems and catering to a range of business requirements with flexibility and speed. The reality is that if you pick a specialised database that fits your current requirements you're completely stuck when things change, like they always do.

Saturday, 23 June 2012

Alan Turing's Legacy

It's Alan Turing's 100th Anniversary today.

Interesting for me, since I live near Bletchley Park where he worked during the war. I recommend a visit there, its becoming something of a tourist attraction now after years of obscurity.

Born out of the needs of that war, Turing laid the ground work for both modern computation and cryptanalysis. Many other people helped that effort and have been forgotten, something not helped by people re-writing history in films like U-571 and other sensationalist material.

Originally, computers had just one use - mathematical computation. We might think he foresaw many things, but I don't think he even dreamed of the worlds of databases or graphics. The level and complexity of "persistent computing" we now have in PostgreSQL would astound him, though he'd probably not make the distinction between the different layers of software and hardware that we do, only see the whole system working together as being amazing.

I like to think he'd be impressed with what we've done, but I'm also thinking he might observe how much more there is yet to achieve.

On a related note, I saw something that pointed out that Blade Runner predicted flying cars yet all the cops did with them was land next to a static payphone. I guess imagining mobile phones was a stretch too far. Yet we have the mobile phones, but not the flying cars.

Predicting the future is easy. Building it is harder and slower than we like; funding the future is the hardest part.  Turing would have done nothing without a budget, and sheer persistence.