Thursday, 4 February 2010

Parallel Query (1)

I recently returned from a lunch meeting of the UK ex-Teradatans to see old friends and colleagues. Some people know that I spent time with Teradata when it was in startup mode, what seems like a very long time ago now. Anyway, that's left me with good knowledge and interest in parallel database systems. And that's why I know Greenplum's Chief Architect Chuck McDevitt and hence why I've been using Greenplum on and off since 2005. Greenplum have also funded some of the developments I've done for PostgreSQL.

I'm disappointed we've not made much progress with parallel operations and partitioning in core Postgres in last few releases. Recent Greenplum results show we have much work to do in improving things. http://community.greenplum.com/showthread.php?t=113
Some people may think I should be sad at that, though the way I see it, Greenplum is very close to being PostgreSQL. It just happens to have some good performance enhancements of great use in Data Warehousing. A few other enhanced versions of Postgres exist also.

Some other recent results also show that MonetDB and Infobright don't fare any better by comparison either.
http://community.greenplum.com/showthread.php?t=111

Having seen the above results I'm thinking about projects for next release now. Anybody want to fund some additional Data Warehousing features in Postgres core? I'm determined that next release we will get Bitmap Indexes in core, at least.

There's some more to discuss on parallel query, such as "How does this all relate to Hot Standby?", so I'll follow up later with another blog.

9 comments:

  1. Will on-disk bitmap indexes be contributed by Greenplum, or?

    ReplyDelete
  2. Greenplum contributed their code to the PostgreSQL community some time ago. There's been some showstoppers in that code that have prevented its acceptance to Postgres core, meaning a couple of developers have wrestled with it to little success. That was nothing to do with Greenplum, since they were keen to make the contribution to the community. I think I probably need to put both feet in and take sole responsibility for getting it in, since it needs some heavy lifting and also a committer interested in review-and-commit. In 9.1 though now!

    ReplyDelete
  3. I vote for parallel query execution. :D

    ReplyDelete
  4. I vote for parallel query execution too

    ReplyDelete
  5. Very interesting test results. We are exploring the possibility of using PostgreSQL for OLAP workloads. Is it possible to bring PostgreSQL to the same performance level? That would be great.

    ReplyDelete
  6. If you use summary tables or materialized views, yes. But that presumes you know in advance the questions that will be asked. If you do then you also need the developer resources to set that up for you and sufficient additional performance to maintain them. The cross over comes when that burden becomes unmanageable and you need to switch to a database capable of addressing ad-hoc queries quickly. Automating materialized view maintenance will help with that, but Postgres don't have that yet.

    ReplyDelete
  7. Two things that I think would be more helpful than on-disk bitmap indexes would be Index Ordered Tables (using the mssql term) and Column Based Storage. The latter is probably a multi-release sized project, but I think step 1 would be to fully separate out logical from physical storage; some smaller projects along that path would be to implement self optimizing column storage (ie. pack fixed length columns at row start automagically), and to allow column re-order operations via alter table. Both highly desired features, and probably doable in a single release.

    ReplyDelete
  8. "more helpful" needs to have a defined context so we understand the use case. Most often discussions like this end up with people with different use cases needlessly disagreeing. We also need to be clear about use case so that we can identify sponsors for this work.

    Index Ordered Tables have been studied for a while now and proposed in the form of Grouped Item Indexes. That patch was never finished by Heikki, but its worth revisiting.

    Column based storage was discussed on a previous blog, so I won't re-visit that here.

    ReplyDelete
  9. "Index Ordered Tables" that is the same as a automatically maintained custered index?

    ReplyDelete