Database Explorer: Parallel Query (1)

Thursday, 4 February 2010

Parallel Query (1)

I recently returned from a lunch meeting of the UK ex-Teradatans to see old friends and colleagues. Some people know that I spent time with Teradata when it was in startup mode, what seems like a very long time ago now. Anyway, that's left me with good knowledge and interest in parallel database systems. And that's why I know Greenplum's Chief Architect Chuck McDevitt and hence why I've been using Greenplum on and off since 2005. Greenplum have also funded some of the developments I've done for PostgreSQL.

I'm disappointed we've not made much progress with parallel operations and partitioning in core Postgres in last few releases. Recent Greenplum results show we have much work to do in improving things. http://community.greenplum.com/showthread.php?t=113
Some people may think I should be sad at that, though the way I see it, Greenplum is very close to being PostgreSQL. It just happens to have some good performance enhancements of great use in Data Warehousing. A few other enhanced versions of Postgres exist also.

Some other recent results also show that MonetDB and Infobright don't fare any better by comparison either.
http://community.greenplum.com/showthread.php?t=111

Having seen the above results I'm thinking about projects for next release now. Anybody want to fund some additional Data Warehousing features in Postgres core? I'm determined that next release we will get Bitmap Indexes in core, at least.

There's some more to discuss on parallel query, such as "How does this all relate to Hot Standby?", so I'll follow up later with another blog.

9 comments:

Devrim Gündüz4 February 2010 at 09:12
Will on-disk bitmap indexes be contributed by Greenplum, or?
ReplyDelete
Replies
Simon Riggs4 February 2010 at 09:35
Greenplum contributed their code to the PostgreSQL community some time ago. There's been some showstoppers in that code that have prevented its acceptance to Postgres core, meaning a couple of developers have wrestled with it to little success. That was nothing to do with Greenplum, since they were keen to make the contribution to the community. I think I probably need to put both feet in and take sole responsibility for getting it in, since it needs some heavy lifting and also a committer interested in review-and-commit. In 9.1 though now!
ReplyDelete
Replies
Mark Wong4 February 2010 at 11:29
I vote for parallel query execution. :D
ReplyDelete
Replies
Hugo4 February 2010 at 13:24
I vote for parallel query execution too
ReplyDelete
Replies
mike5 February 2010 at 08:28
Very interesting test results. We are exploring the possibility of using PostgreSQL for OLAP workloads. Is it possible to bring PostgreSQL to the same performance level? That would be great.
ReplyDelete
Replies
Simon Riggs5 February 2010 at 08:42
If you use summary tables or materialized views, yes. But that presumes you know in advance the questions that will be asked. If you do then you also need the developer resources to set that up for you and sufficient additional performance to maintain them. The cross over comes when that burden becomes unmanageable and you need to switch to a database capable of addressing ad-hoc queries quickly. Automating materialized view maintenance will help with that, but Postgres don't have that yet.
ReplyDelete
Replies
Robert Treat6 February 2010 at 05:43
Two things that I think would be more helpful than on-disk bitmap indexes would be Index Ordered Tables (using the mssql term) and Column Based Storage. The latter is probably a multi-release sized project, but I think step 1 would be to fully separate out logical from physical storage; some smaller projects along that path would be to implement self optimizing column storage (ie. pack fixed length columns at row start automagically), and to allow column re-order operations via alter table. Both highly desired features, and probably doable in a single release.
ReplyDelete
Replies
Simon Riggs7 February 2010 at 00:35
"more helpful" needs to have a defined context so we understand the use case. Most often discussions like this end up with people with different use cases needlessly disagreeing. We also need to be clear about use case so that we can identify sponsors for this work.

Index Ordered Tables have been studied for a while now and proposed in the form of Grouped Item Indexes. That patch was never finished by Heikki, but its worth revisiting.

Column based storage was discussed on a previous blog, so I won't re-visit that here.
ReplyDelete
Replies
Anonymous9 February 2010 at 13:31
"Index Ordered Tables" that is the same as a automatically maintained custered index?
ReplyDelete
Replies

Add comment

Database Explorer

Thursday, 4 February 2010

Parallel Query (1)

9 comments:

About Me

Donate to Open Source

Blog Archive

My Blog List

Database Explorer

Thursday, 4 February 2010

Parallel Query (1)

9 comments:

About Me

Donate to Open Source

Subscribe To

Blog Archive

My Blog List