tag:blogger.com,1999:blog-90535365296947845632023-11-15T05:06:57.255-08:00Database ExplorerSimon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.comBlogger35125tag:blogger.com,1999:blog-9053536529694784563.post-395443417092085882014-10-30T02:28:00.001-07:002014-10-30T02:32:57.972-07:00Where lies the truth?<p>
Ben Bradlee, the former editor of the Washington Post died recently. A famous speech of his from 1997 contains some words that mean something for me. It starts like this
</p>
<p>
"Newspapers don't tell the truth under many different, and occasionally innocent, scenarios. Mostly when they don't know the truth. Or when they quote someone who does not know the truth.
</p>
<p>
And more and more, when they quote someone who is spinning the truth, shaping it to some preconceived version of a story that is supposed to be somehow better than the truth, omitting details that could be embarrassing.
<p>
And finally, when they quote someone who is flat out lying...."
<p>
and summarises with
<p>
"Where lies the truth? That's the question that pulled us into this business, as it propelled Diogenes through the streets of Athens looking for an honest man. The more aggressive our search for the truth, the more people are offended by the press. The more complicated are the issues and the more sophisticated are the ways to disguise the truth, the more aggressive our search for the truth must be, and the more offensive we are sure to become to some. So be it."
<p>
before ending
<p>
"I take great strength from that now, knowing that in my experience the truth does emerge. It takes forever sometimes, but it does emerge. And that any relaxation by the press will be extremely costly to democracy."
<p>
Who would have thought that his words apply so well to PostgreSQL and the cost of data integrity? Yes, referential integrity does have an additional performance cost to make it work right, but how else can we be sure that we are passing valid data around? Surely the purpose of a database needs to be primarily a home for the truth, verified to be so by cross checks and constraints.
(to be continued)
Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com1tag:blogger.com,1999:blog-9053536529694784563.post-1510655096474674222013-01-23T08:50:00.002-08:002013-01-23T08:50:41.276-08:00Reducing Contention with Foreign Key LocksPostgreSQL's Foreign Key support underpins our ability to enforce data quality in the database.<div>
A feature known as "Referential Integrity"allows us to link relational tables together to ensure the values in each table match.</div>
<div>
<br /></div>
<div>
Each lookup requires a lock to be applied to the row. In 9.2 and earlier that lock prevented writes to the row that was being checked. Which meant that UPDATEs against that row would be blocked, reducing performance in some applications. Worse, it also meant that deadlocks could occur in some cases.</div>
<div>
<br /></div>
<div>
The solution was a relatively simple idea: just lock the key value, not the whole row. That allows UPDATEs to proceed and application contention to be reduced in many cases.</div>
<div>
<br /></div>
<div>
Álvaro Herrera just committed a patch to make this happen. The simple idea required many new concepts and changes to make it work. The patch was so large it touched more than 100 files and thousands of changed lines.</div>
<div>
<br /></div>
<div>
Development took more than 2 years, with reviews from many different hackers to bring things to a state where it can be committed. So many people and companies were involved in this its hard to mention one without missing others. It's best to call this a true community effort in support of Álvaro and his work.</div>
<div>
<br /></div>
<div>
So in release 9.3, available in Sept 2013 we expect to see significant improvements in the performance of many applications. And it will make it easier to avoid those problems for new applications.</div>
<div>
<br /><br /></div>
Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com1tag:blogger.com,1999:blog-9053536529694784563.post-91022004625385247332012-08-09T11:53:00.000-07:002012-08-09T11:57:25.057-07:00PostgreSQL: The Multi-Model Database ServerI'd like to change the way we describe PostgreSQL.<br />
<br />
Calling PostgreSQL and Object Relational database is misleading and years out of date. Yes, PostgreSQL is Relational and the project follows the SQL Standard very closely, but that's not all it does.<br />
<br />
PostgreSQL supports all of the following:<br />
<br />
* Relational<br />
* Object Relational<br />
* Nested Relational (record types)<br />
* Array Store<br />
* Key-Value Store (hstore)<br />
* Document Store (XML, JSON)<br />
<br />
and 9.2 adds<br />
<br />
* Range Types<br />
<br />
So what do we call it?<br />
<div>
<br /></div>
<div>
We support multiple models, so I guess we should call it a "Multi-Model Database".</div>
<div>
<br /></div>
<div>
The good thing here is that we support them all, in one platform, allowing you to join data together no matter what shape the data is held in.</div>
<div>
<br /></div>
<div>
Which means that PostgreSQL is a great General Purpose database and a great default choice for use by applications. Stonebraker has spoken out against the idea of a General Purpose database, but his interest is in bringing VC-funded startups to market, not in supporting production systems and catering to a range of business requirements with flexibility and speed. The reality is that if you pick a specialised database that fits your current requirements you're completely stuck when things change, like they always do.</div>Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com3tag:blogger.com,1999:blog-9053536529694784563.post-71921388255660369582012-06-23T01:58:00.000-07:002012-06-23T07:21:41.611-07:00Alan Turing's LegacyIt's Alan Turing's 100th Anniversary today.<br />
<br />
Interesting for me, since I live near Bletchley Park where he worked during the war. I recommend a visit there, its becoming something of a tourist attraction now after years of obscurity.<br />
<br />
Born out of the needs of that war, Turing laid the ground work for both modern computation and cryptanalysis. Many other people helped that effort and have been forgotten, something not helped by people re-writing history in films like U-571 and other sensationalist material.<br />
<br />
Originally, computers had just one use - mathematical computation. We might think he foresaw many things, but I don't think he even dreamed of the worlds of databases or graphics. The level and complexity of "persistent computing" we now have in PostgreSQL would astound him, though he'd probably not make the distinction between the different layers of software and hardware that we do, only see the whole system working together as being amazing.<br />
<br />
I like to think he'd be impressed with what we've done, but I'm also thinking he might observe how much more there is yet to achieve.<br />
<br />
On a related note, I saw something that pointed out that Blade Runner predicted flying cars yet all the cops did with them was land next to a static payphone. I guess imagining mobile phones was a stretch too far. Yet we have the mobile phones, but not the flying cars.<br />
<br />
Predicting the future is easy. Building it is harder and slower than we like; funding the future is the hardest part. Turing would have done nothing without a budget, and sheer persistence.<br />
<br />Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com0tag:blogger.com,1999:blog-9053536529694784563.post-77244395750362673902012-03-05T07:57:00.003-08:002012-03-05T08:05:02.006-08:00Replication Test ServersAs various people have already noted, IBM has set up some servers that will allow us to test software for robustness and performance and have kindly donated those to us for project use. Thanks very much to Mark Wong for arranging that as well as various IBM folk for assistance.<br /><br />The general plan is for Rob Haas to get access to one server for performance testing, while I have access to 3 smaller virtual servers for testing replication. It's a pretty rough plan as yet and full access is still being fully configured.<br /><br />This is good because it will allow us access right at the start of PostgreSQL 9.3 development cycle.<br /><br />I'll post again when we've finished the 9.2 CommitFest and have had the chance to set these up in a useful way.Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com0tag:blogger.com,1999:blog-9053536529694784563.post-60243171977185912362011-08-17T14:57:00.000-07:002011-08-17T15:10:12.077-07:00R is for InnovationI'm pleased to note that Teradata just announced a plugin for the R language.
<br />
<br />As many of you will know, PostgreSQL has supported server functions written in the R language for many years. So its good that Teradata has seen the light at last and by doing so has validated the innovations that PostgreSQL has made.
<br />
<br />That means the list of databases that have responded directly to innovations in PostgreSQL, now extends to Oracle, Informix(Illustra), SQLServer, Sybase, DB2, Teradata. Of course, MySQL have been trying to catch up for a long time,
<br />
<br />That pretty much is the complete set. Cool. Well, almost.
<br />
<br />I'm intrigued as to what NoSQL vendors think will happen next. If their core values are simplicity then what new features can they add without going back on their core philosophy. Austerity isn't something you can have more of, is it? Let's wait and see what happens when the VC runs out.
<br />
<br />PostgreSQL really is in a leadership position with regards to database innovation. And I'm happier than ever to be part of this phenomenon.Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com4tag:blogger.com,1999:blog-9053536529694784563.post-7176255471315496032011-07-19T08:25:00.000-07:002011-07-19T08:45:00.096-07:00Cascading ReplicationCascading Replication is now part of PostgreSQL 9.2, thanks to Fujii Masao.<br /><br />The idea is that a streaming replication standby can also stream data onto other standbys. This allows a complex network of interrelated servers to fulfil the roles of High Availability, High Durability, Distributed data access capacity and Reporting requirements.<br /><br />You can set up chained configurations like A -> B -> C.<br /><br />or more complex arrangements like<br />......A<br />....B....E<br />..C.D...F.G<br /><br />This should make it much easier to reduce bandwidth for intercontinental replication.<br /><br />Nice thing is that Hot Standby feedback works across the whole cluster, so you easily manage the interrelationships between servers.Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com5tag:blogger.com,1999:blog-9053536529694784563.post-25484306445924233092011-07-19T08:12:00.001-07:002011-07-19T08:25:05.784-07:00CHAR(11) Conference SuccessFinally recovered from attending CHAR(11) in Cambridge, UK. 2 complete days of Clustering, High Availability and Replication talks from various experts.<br /><br />We had 15 talks from 14 speakers from US, Japan and from 8 European countries, including the keynote from Jan Wieck. Attendees came from US and all across Europe, many of whom could give detailed talks themselves. There's always next year...<br /><br />The most amazing thing were the comments we received from attendees. Every talk was packed solid, and judging by the seats alone it seems almost everybody went to all the talks - for the whole talk. I don't recall a conference having such a good attendee rate, not even CHAR(10) last year.<br /><br />Based on that, it looks pretty certain that we'll run CHAR(12) next year. We did discuss Japan for CHAR(12) but that's not going to be as easy as we'd hoped. Let's see how that goes.<br /><br />I'm pleased with how everything ran, so a big thanks to the organising team.<br /><br />Thanks very much to Koichi Suzuki for visiting again. The panel discussion between Postgres-XC, MGRID and Greenplum was very enlightening.<br /><br />Thanks to all the speakers and attendees also.Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com0tag:blogger.com,1999:blog-9053536529694784563.post-68010478671364235322011-06-16T03:18:00.000-07:002011-06-16T03:46:53.314-07:00Five NinesIn High Availability we talk about "Five Nines" meaning 99.999% availability. I like to joke that a badly configured system has "Nine Fives" availability or 55.555555% availability.<br /><br />With a sensible architecture and good operational controls, data can be made "Five Nines" safe with PostgreSQL 9.1.<br /><br />I was reminded today that "Five Nines" had another meaning in an earlier age. Wilfrid Owen's wartime poetry describes <br /><br />And towards our distant rest began to trudge. <br />Men marched asleep. Many had lost their boots<br />But limped on, blood-shod. All went lame; all blind;<br />Drunk with fatigue; deaf even to the hoots<br />Of tired, outstripped Five-Nines that dropped behind.<br /><br />meaning artillery shells falling away from the target of the front line troops.<br /><br />The poem ends with an exhortation to learn from earlier mistakes<br /><br />My friend, you would not tell with such high zest<br />To children ardent for some desperate glory,<br />The old Lie: Dulce et decorum est<br />Pro patria mori.<br /><br />"How sweet and fitting it is to die for one's country"<br /><br />I'm sure there's a modern message there, but I'll leave that up to you.Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com1tag:blogger.com,1999:blog-9053536529694784563.post-50169689507717976642011-05-04T03:03:00.000-07:002011-05-04T03:21:10.961-07:00Gentlemen, Start your EnginesThe racing season is upon us. We have both the Le Mans 24 hour race and the Indy 500 coming in the next month, both long distance, high speed motor racing events.<br />http://www.indianapolismotorspeedway.com/indy500/<br />http://www.lemans.org/<br /><br />We also have the beta of PostgreSQL 9.1 and associated tools.<br /><br />Just like motor sport, a 5 minute engine test proves very little. Only good solid usage at high levels of performance will prove whether the engine is good enough to be world class.<br /><br />Just like a race, we have deadlines and we must remember we aren't the only people in the world producing database software. The deadline is more important this year because we are attempting to cut the time of the beta cycle down by weeks and months.<br /><br />The PostgreSQL project needs you to start your engines. Start testing PostgreSQL 9.1 as soon as possible and take it to the very limits of durability and performance.<br /><br />Make the tests run for 500 miles and/or 24 hours. Report the results, in detail.<br /><br />Do it. Do it <span style="font-style:italic;">now</span>.Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com1tag:blogger.com,1999:blog-9053536529694784563.post-8280480626921214262011-04-28T10:00:00.000-07:002011-04-28T10:53:30.885-07:00Brand New KayakI've just bought a new Dagger GT max kayak on eBay, just collected it today.<br /><br />It's shorter than my old Mountain Bat, with a flat bottom for surfing and rivers, and tramlines to allow you edge through turns better. Lots of padding and a back rest that actually works. And its bright yellow.<br /><br />Like most things, it makes me think about databases in a new way.<br /><br />First the buying experience: I'm ecstatic, but I've not been in the water yet. Why am I ecstatic? Well, its yellow and has got lots of features I'm interested in. And its yellow. From that I take it that look and feel is important with a new product in addition to the real usability features. Uh, yeh, err... just like psql...<br /><br />I realise that this might be the best I ever feel about the kayak. If it has shortcomings, then I'll be disappointed. Imagine a boat with very few issues, with footrests that can be adjusted to make it just right. That sounds like I boat I'd like, and a database too.<br /><br />I also note that it's taken me 18 years to buy a new kayak. From that I learn that annoyances with products do build up over a period of time and that useful new features are important in changing. But kayak salesmen need to be patient and respect the views and wishes of paddlers with prior experience of other craft.<br /><br />What made me change? A friend bought one. Not just that - I watched him go down some whitewater that I'd had trouble on, but he edged it like it wasn't there. From that I learn that word of mouth and references are important, but demonstrations are even better.<br /><br />Now back to my first thought: why did I buy the Dagger? It's been interesting to watch kayak development over the intervening years, with all sorts of specialist kayaks emerging. Sea kayaks, river kayaks, whitewater and playboats. My feeling was that these were all too specialist. I wanted a boat I could use for short sea trips and whitewater. This made me think about Stonebraker's recent years. Why the fascination with all these specialist databases? They are good for some situations, no question. But how do you know the conditions you'll be facing? How can you trust you haven't selected something too specialised?<br /><br />What I'd really like is a comfortable canoe that can be configured according to the conditions I meet. I don't really want a seacanoe or a playboat because then I'd need lots of different boats, all sitting waiting for the right situation. I know I can't have a modifiable kayak because its hull is made of PBS. But I can get that with software, if its configurable enough to meet my needs. Not hundreds of adjustments, just a few important parameters to allow me to tailor it to the major points of the current solution. Speed, stability, comfort, safety and security.<br /><br />So, some important lessons for databases: How do I make PostgreSQL bright yellow?Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com3tag:blogger.com,1999:blog-9053536529694784563.post-16314210025651496282011-04-26T00:39:00.000-07:002011-04-26T00:54:07.477-07:00Feedback on the PostgreSQL development processFor the last few weeks, the PostgreSQL Hackers list has been discussing how to improve the PostgreSQL development process.<br /><br />You might be forgiven for asking "Why? What is wrong with it?". Indeed, you might.<br /><br />The process has changed many times down the years. Essentially, the process revolves around a few key people with the knowledge and time to contribute reviews of the submitted patches. All of those people have got views about what's right and wrong with the exact current system.<br /><br />What would be useful is to hear from people who<br />* never submitted a patch for a definite blocking reason<br />* submitted a patch but had it rejected<br />* wrote a first patch but were dissuaded from doing that again<br /><br />If you'd like to review patches for PostgreSQL then we're short of manpower there. We're short of manpower because PostgreSQL believes that peer review is an essential technique to producing good code. You'll need to spend some time getting to understand the review process and guidelines and you may also need assistance on some technical aspects. Apart from that, reviews consist of asking questions like "Won't that break ALTER TABLE?" and observing "there's not enough code comments here, and no docs".<br /><br />If you have feedback, or you can help, please join the hackers list and speak out.Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com3tag:blogger.com,1999:blog-9053536529694784563.post-77842648921531070412011-04-21T12:12:00.000-07:002011-04-21T12:34:48.040-07:00Busy TimesIt's been 6 months since I found time to blog, which I guess shows how much I had been concentrating on getting Sync Replication finished.<br /><br />Sync Replication is the raison d'etre for in-database replication. Only by bringing replication to the database layer can we control the replication process in a useful way. Did it have to be transaction log shipping replication? No, I guess it might have been possible to do sync rep using other mechanisms such as triggers or writesets but the transaction log seemed the most natural way to go, at least initially.<br /><br />Now its done, I breathe a sigh of relief after 7 full years of work. The strange thing is that in order to fund such a task I needed to build a company, 2ndQuadrant. It's kind of like having to build the ramp up which the blocks of stone would travel for the pyramids of ancient Egypt. Anyway, its a good thing because it's brought together many contributors and opened up funding mechanisms to do the things we want to do with PostgreSQL.<br /><br />Now it's finished, I see all the other tasks still to do, so I'll be busy a while longer yet. Feature complete, no way.<br /><br />I'm pleased that I got all the essential features into sync rep that I was looking for. Transaction controlled replication, minimal bandwidth usage, shared memory queues ordered by xlog pointers, avoidance of complex configuration details and most importantly an approach everyone agrees is robust.<br /><br />I hadn't realised it, but the sync rep implementation is actually better than MySQL's semi-synchronous replication. Don't think anybody set out to do that, just as usual the PostgreSQL approach to building things seems to end up with a rigorous design and implementation.<br /><br />I'm thinking about replication because I've just been assembling the talk proposals for CHAR(11), the conference on Clustering, High Availability and Replication. The Call for Papers for the CHAR(11) conference is now closed, though we have a very cool lineup of speakers. Even better than CHAR(10) last year.<br /><br />As ever, more on all of the above another time.Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com5tag:blogger.com,1999:blog-9053536529694784563.post-49914929657318583262010-10-20T07:44:00.000-07:002010-10-20T13:52:50.443-07:00Extensions in 9.1+One of the most exciting developments in 9.1 will be the new CREATE EXTENSION feature that Dimitri Fontaine has been designing. Dimitri has been working on them for some time now, proposing them in detail at the 2010 Developer Meeting. Recently he's been coding them and the first patch arrived a few days ago. Reviews should happen in November, more than enough time to happen for the next release.<br /><br />What do extensions do? Make it easier to add and remove plugins, datatypes, functions etc with minimum fuss and without needing to compile things yourself. It turns out that there's a great many aspects to this and much more complex than I've made it sound.<br /><br />Dimitri has himself designed a number of very cool add-ons for PostgreSQL and its from those that he's gained insight into what's required. I've been trying to follow it myself, but I'm a bit lost on some aspects of it just yet.<br /><br />Anyway, looks like there will be a few websites and repositories from which you can download extensions, so the hope is that it will be much easier to both obtain and install add-in components in future versions of PostgreSQL.<br /><br />We're especially lucky that funding has been made available from the 4CaaST project, a Cloud Computing project funded by the European Union (under FP7). More on that another time.Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com0tag:blogger.com,1999:blog-9053536529694784563.post-32631915236190146242010-10-05T18:48:00.000-07:002010-10-05T19:00:45.976-07:00Cleaning your archiveWhether you're thinking of using PostgreSQL 9.0 yet or not, you maybe interested in a small new utility that is designed to help you clean out old WAL archive files.<br /><br />pg_archivecleanup is a standalone program which simply removes all WAL files older than a certain filename. You could use it like this:<br /><br />pg_archivecleanup /my/archive 000000010000DEAD0000BEEF<br /><br />The good thing here is that the tool works just fine for any release of PostgreSQL, not just 9.0. Mostly useful for bash or Windows scripts, since people will no doubt confirm it's trivial to do this in Python or perl etc..<br /><br />The utility was designed as a dual purpose tool, since with 9.0 pg_standby is no longer required, yet you still need some way to clear down an archive. So you can also include it in the archive_cleanup_command, like so<br /><br />archive_cleanup_command = 'pg_archivecleanup /my/archive %r'<br /><br />which gets called regularly with changing values of %r to clean out your archive directories.<br /><br />Not big, but its useful.Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com0tag:blogger.com,1999:blog-9053536529694784563.post-407508430523869802010-10-01T01:15:00.000-07:002010-10-01T01:32:53.245-07:00PostgreSQL Admin Book: Last EditI've been working on a PostgreSQL Administration book for some time now and am very happy to report that I'm done with the last edit on the last chapter. Hannu finished his a few days earlier.<br /><br />Phew!<br /><br />The full title of the book is "PostgreSQL 9 Administration Cookbook" and has been written by Hannu Krosing and myself. We've written it together over the course of 9 months or so. I guess it's no surprise to some, but these things always take more time than you expect.<br /><br />The book is in the form of short, punchy recipes that tell you what you need to know on a topic quickly with examples, then we describe how it works later. Some of the recipes are really basic to make sure we address the frequent questions, but around half are advanced topics. It's certainly taken some time to express both the basic and the advanced topics succinctly.<br /><br />An advanced database system needs advanced books to show it off in the best light. I very much hope it is up to the high standards of the PostgreSQL community and moves us all forwards.<br /><br />I'll write some more about it later. All I can really express now is relief!<br /><br />You can pre-order the book here:<br /><br /><A herf="http://www.2ndquadrant.com/postgresql-books/">http://www.2ndquadrant.com/postgresql-books/</A>Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com5tag:blogger.com,1999:blog-9053536529694784563.post-7038383887091010042010-06-24T13:26:00.000-07:002010-06-25T05:56:10.633-07:00Hot Bugs & Cold Beer2ndQuadrant is sponsoring a number of Bugs & Beer events at PostgreSQL user groups around the world.<br /><br />The idea is to get together with your friends and see if we can shake out any more bugs in Hot Standby prior to release.<br /><br />Particular focus on these areas:<br />* transactions holding locks before/during/after base backup taken<br />* prepared transactions<br />* correctness of results<br />* VACUUMs on master<br />* any other ways you can find to break it<br /><br />Please write to me at simon@2ndQuadrant.com if you'd like funding for a local event.<br /><br />All the best,Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com1tag:blogger.com,1999:blog-9053536529694784563.post-8319685094421048482010-06-17T14:34:00.000-07:002010-06-17T14:58:56.306-07:00What's a VLF?I read a good blog called http://sqlfool.com/<br /><br />The funniest thing about it for me was the artful blogger talked about having 87,000 VLFs, taking up 1.5TB, though didn't mention what a VLF was.<br /><br />Anyway, I checked - VLFs are WAL files in SQLServer. In case, like me, you didn't know, VLF stands for Virtual Log File. So it's a virtual physical thing. Got it. :-\<br /><br />Which means the blogged-about related to 1.5TB worth of WAL files in the equivalent of pg_xlog. Ouch!<br /><br />Ah! That reminds me: on both SQLServer and DB2 the log contains undo information, which means that the transaction log can't reuse files while transactions are still active. So long running write transactions causes the log to grow in size and can eventually fill the disk. This can be a real pain for DBAs and requires workarounds. That effect doesn't occur with PostgreSQL, so pg_xlog never overflows for that reason. With PostgreSQL, undo information is effectively stored in the database itself in the form of previous row versions. So a database grows as changes occur. Long running write statements can have a similar effect in PostgreSQL, but not long running transactions. So, one more problem that us Postgres-people get to miss out on.<br /><br />And just to complete the circle, I will avoid explaining what a WAL file is here.Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com2tag:blogger.com,1999:blog-9053536529694784563.post-60077854072553648902010-06-17T13:49:00.000-07:002010-06-17T14:06:30.287-07:00Smoothing replicationAt first, streaming replication looked pretty raw though I've been doing a thorough review of the code. I guess that's probably how Hot Standby looked at first as well. Now, I'm getting quite excited at the way things are shaping up.<br /><br />Originally the WALSender process waited for 200ms after sending each chunk of data. That's now been fixed so that WALSender will stream continually until no outstanding WAL data remains.<br /><br />Also, the max chunk size has been reduced to 128kB, which is now the same default size used by DRBD.<br /><br />On Monday, I noticed that the WALSender was sending WAL data before it had been fsynced. So in case of a crash that could cause a problem. That only really matters if you're using synchronous_commit = off or fsync = off. (Though fsync = off and replication just have no business being configured simultaneously).<br /><br />Lots of smaller changes have also come through in recent months, so now we have the ability to get log messages when replication connects or disconnects. We also now have special messages if the replication connection is refused by the primary via the pg_hba.conf.<br /><br />From here, it's looking like streaming replication is fast, efficient and low latency. It's also easy to use and easy to troubleshoot. It's an impressive community effort.<br /><br />I'm also happy to say that almost all of the features of pg_standby have been assimilated into core now, so the only external module you'll need is something called pg_archivecleanup which is new in this release.<br /><br />Next few months we're going to see designs emerge for synchronous replication. That is going to require some subtle coding to get good. It sounds like a couple of people are planning prototypes of how that should work.Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com0tag:blogger.com,1999:blog-9053536529694784563.post-35747384094582389372010-05-17T11:33:00.000-07:002010-05-17T14:20:22.890-07:00Bollywood FeaturesPostgreSQL 9.0 is in the can and is coming to the end of post-production. Still looking forward to the usual last minute special effects by ILM (Industrial Lane & Magic) to pull some extra zing into performance.<br /><br />So what next? Well, three genres attract my attention: High performance ("OLTP"), Replication and Data Warehousing. The first two are action features, while the last one is more of a romance for me. Some of the stunts for the first two are similar, so we'll be looking to reduce production costs with some clever thinking.<br /><br />No blockbusters this year, though hopefully some worthwhile features.<br /><br />High performance stuff we hope for in 9.1 will be<br /><br />* Group commit<br />* MERGE related stuff - production costs stalled the first attempt to bring this to the big screen, though some Euro sponsorship should make this viable. Hoping that Greg Smith will take the lead role on this cos its a big performance critical project, with some hard bits.<br />* Some surprise features!<br /><br />Replication<br /><br />* Synchronous replication<br />* Relay replication<br />* Fast switchover - an art house flick with a small, yet good following<br /><br />Data Warehousing features for 9.1 will be<br /><br />* Bit map indexes - has had its script rewritten a few times and the actors have changed as well, though it really needs to happen this time around cos its just so cool.<br />I also expect to keep my eye on Partitioning features to ensure it actually works for the use cases we care about, which is Big Data.<br /><br />Dancing is a key element of Bollywood and the best bits are when everyone gets involved in huge Busby Berkeley numbers. Expecting the dancing to be particularly intense for Synchronous Replication though the dancers are also fairly well trained, so we're hoping for a happy ending with a cool soundtrack.<br /><br />Unsure if its a promise I can keep, but no more features with the word "Hot" in the title, especially since the dance routines are so long.<br /><br />Not going to Cannes this year, but I will be appearing at the Brits: <a href="http://www.char10.org">www.char10.org</a>Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com3tag:blogger.com,1999:blog-9053536529694784563.post-77587191809118730822010-04-27T23:43:00.000-07:002010-04-28T11:25:11.306-07:00Tuning Hot StandbyFocus for the last month has been on tuning the new PostgreSQL 9.0 features. The latest set of changes have been around Hot Standby.<br /><br />Erik Rijkers published some good performance test results, very helpful, thanks. Mark Kirkwood also published some insightful results, also thanks.<br /><br />After the latest set of patches the results show that we get equivalent performance between master and standby under heavy load of very fast indexes SELECTs. The peak reported transaction rate was 28,000 tps (transactions per second), showing that the internal mechanics of queries during Hot Standby are very efficient. This is great news and has taken much work from myself and the usual suspects.<br /><br />What we have now is a very tight set of code that should provide good performance and scalability from all angles. Changes from the master are applied using minimal locking, so that a high rate of user queries should be possible even under a mixed workload of changes from master and many standby queries. What we don't yet have good independent performance tests to confirm that, or do further tuning.<br /><br />Of course, there's no magic there. If we give the standby the same workload of changes that the master had and then we add a whole new workload we can probably bring a server to its knees. That's just CPU physics, so nothing surprising there. What we've done is relax the internal locking to allow multiple concurrent workloads to hit the server from both directions, and also allow PostgreSQL to scale well to high numbers of CPUs.<br /><br />Anyway, bold words. I'm happy to hear from people that have got a good test environment, publish controlled tests and can answer detailed questions to help us isolate any strangeness in the tests. Particularly keen to have Dtrace installed, so either Solaris or SystemTap. pgbench is a good benchmarking tool for what we need, though other controlled testing is also welcome.<br /><br />We need independent measurements of CPU overheads and processing rates of<br />* standby when no queries running, just replaying transactions<br />* standby when we have a mix of changes and queries<br />especially on multi-socket servers. It would also be useful to have comparisons of how moving from a single master to master plus slave can improve total cluster throughput.<br /><br />Summary is that things are looking good so far. Is that the end? Doubt it. What I'm saying is that these new features can now be taken seriously enough to spend time on in-depth benchmarking of your workloads to prove them for yourselves and also to see if you can find some performance issues in the current code. Plan it now, but wait for beta, which is soon...<br /><br />If you want to hear more about these types of issues, come to CHAR(10) conference:<br /><a href="http://www.char10.org/">http://www.char10.org/</a>Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com0tag:blogger.com,1999:blog-9053536529694784563.post-5690877742484381882010-04-01T00:16:00.000-07:002010-04-01T00:23:41.265-07:00The Empty Set?April 1st, so time to goof around a little.<br /><br />I discovered recently that you can have a table called " ". That's one space character or ascii #32. Just make sure you enclose it in quotes. You can also have a table with no columns. So try<br /><br />CREATE TABLE " " ();<br /><br />which works fine. What does that look like?<br /><br />postgres=# select * from " ";<br />--<br />(0 rows)<br /><br />Can you have rows in this almost non-existent table?<br /><br />INSERT INTO " " VALUES ();<br /><br />fails, but let's try<br /><br />INSERT INTO " " DEFAULT VALUES;<br /><br />Works! And in fact you can insert multiple rows this way.<br /><br />How about an index? Surely not?? Yep, we can build an index on it, even if it has no columns, as long as it references an immutable function with constant input<br /><br />CREATE INDEX ON " " (exp(1));<br /><br />The name of the index is " _exp_idx"! And for my coup de grace, yes, it can be a unique index<br /><br />CREATE UNIQUE INDEX ON " " (exp(1));<br /><br />which then enforces that there can only be a single row in our table (called " "). I've tried and failed to create an index that would enforce that our empty table called " " with no columns should have zero rows. Any takers?<br /><br />So how about a Schema? Surely not? Yep.<br /><br />CREATE SCHEMA " ";<br /><br />and a table called " " in the " " schema? Oh yes.<br /><br />CREATE TABLE " "." " ();<br /><br />I also tried VACUUM VERBOSE " "." ";<br />INFO: vacuuming " . "<br />INFO: " ": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages<br />DETAIL: 0 dead row versions cannot be removed yet.<br />There were 0 unused item pointers.<br />0 pages are entirely empty.<br />CPU 0.00s/0.00u sec elapsed 0.00 sec.<br />VACUUM<br /><br />Nothing practical comes out of that at all, just some fun. Needless to say, we only cover practical and useful things on 2ndQuadrant's courses...serious stuff.Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com4tag:blogger.com,1999:blog-9053536529694784563.post-16786548642802519552010-03-26T07:19:00.001-07:002010-03-26T07:30:39.398-07:00Tuning ReplicationPerformance of streaming replication had been bothering me for some time. I'd noticed that the CPU utilisation was extremely high, so that the WALSender process was guzzling CPU even when nothing was happening.<br /><br />When we think we have a clue about something and we were right we call that "intuition" and it sounds all mystical and clever. When we are wrong about something we call those "preconceptions". Strange difference of wording. Anyway, I had an idea that turned out to be a preconception. I started by looking for a smoking gun: one part of the system that would turn out to be using more CPU than another. There was no smoking gun, so that trail was a deadend. So next idea was just to look at the bgwriter code to see if there were any clear differences.<br /><br />Anyway, the real culprit was something trivial. WALSender was supposed to sleep every 200ms by default. What it was actually doing was sleeping for 200 microseconds and then seeing if there was anything to send. So all it took to improve performance was to multiply by 1000 for the call to pg_usleep(). If only tuning databases was ever that simple!<br /><br />That leaves me to say that the streaming replication now has a clean bill of health from an overall performance perspective. As far as I can see, YMMV, caveat emptor etc..Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com2tag:blogger.com,1999:blog-9053536529694784563.post-65401582614343805752010-03-14T13:29:00.000-07:002010-03-14T14:18:38.279-07:003.14159265358979I'm told today is Pi day, in celebration of our friendly transcendental constant and because today is the 14th March, which is written as 3.14 in some locales....<br /><br />So I thought I'd celebrate with a look at Postgres' mathematical musculature.<br /><br />postgres=# select pi();<br /> pi <br />------------------<br />3.14159265358979<br /><br />Cool! That matches on all the digits, so we're rocking. So what datatype is this?<br /><br />postgres=# select pg_typeof(pi());<br /> pg_typeof <br />------------------<br />double precision<br /><br />and as the manual says, we support at least 15 digits for this datatype. Even better. So let's flex those long dormant trigonometry muscles:<br /><br />postgres=# select sin(90);<br /> sin <br />-------------------<br />0.893996663600558<br /><br />Oh no! Surely sin() of 90 degrees is 1.0? Perhaps the default is radians and I just forgot. Of course, doh! The manual doesn't actually say, which seems like a flaw. Here comes a doc patch. So lets supply radians instead.<br /><br />postgres=# select sin(radians(90));<br />sin<br />-----<br /> 1<br /><br />Phew! That works. Now lets try some advanced stuff. There are 2*pi() radians in a circle, so pi() radians is half way round. So the sin() of that should be 0, and the cos() should be 1.<br /><br />postgres=# select sin(pi()), cos(pi());<br /> sin | cos<br />----------------------+-----<br />1.22464679914735e-16 | -1<br /><br />Hmmm. That is somewhat strange, but I guess that pi is transcendental so any representation with a fixed number of digits will always be slightly wrong by exactly that number of digits. That makes sense, but I guess I was expecting sin(pi()) to return 0.<br /><br />Just to put this in perspective, in comparison to the diameter of the earth that is a precision of about one millionth of a millimetre. So that is very accurate for most applications.<br /><br />Let's quickly check "e", another well known transcendental. We don't provide a function for e as we do for pi, but its easy to calculate.<br /><br />postgres=# select exp(1);<br /> exp <br />------------------<br />2.71828182845905<br /><br />That looks good. So lets play some games with that also.<br /><br />postgres=# select ln(exp(1));<br />ln<br />----<br /> 1<br /><br />Thank goodness for that. All good. I notice in the docs that exp() takes either a double precision or a numeric input, so which one did we just try there? Let's see<br /><br />postgres=# select ln(exp(1::float));<br />ln<br />----<br /> 1<br /><br />and lets also try<br /><br />postgres=# select ln(exp(1)::numeric);<br /> ln <br />--------------------<br />1.0000000000000018<br /><br />Hmmm. Looks like Postgres is being irrational. Or should that be insufficiently irrational? It's been a long day. I can't wait for "sqrt(2) day".Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com2tag:blogger.com,1999:blog-9053536529694784563.post-52751625613232542222010-02-04T06:18:00.000-08:002010-02-04T06:24:38.703-08:00Parallel Query (1)I recently returned from a lunch meeting of the UK ex-Teradatans to see old friends and colleagues. Some people know that I spent time with Teradata when it was in startup mode, what seems like a very long time ago now. Anyway, that's left me with good knowledge and interest in parallel database systems. And that's why I know Greenplum's Chief Architect Chuck McDevitt and hence why I've been using Greenplum on and off since 2005. Greenplum have also funded some of the developments I've done for PostgreSQL.<br /><br />I'm disappointed we've not made much progress with parallel operations and partitioning in core Postgres in last few releases. Recent Greenplum results show we have much work to do in improving things. <a href="http://community.greenplum.com/showthread.php?t=113">http://community.greenplum.com/showthread.php?t=113</a><br />Some people may think I should be sad at that, though the way I see it, Greenplum is very close to being PostgreSQL. It just happens to have some good performance enhancements of great use in Data Warehousing. A few other enhanced versions of Postgres exist also.<br /><br />Some other recent results also show that MonetDB and Infobright don't fare any better by comparison either.<br /><a href="http://community.greenplum.com/showthread.php?t=111">http://community.greenplum.com/showthread.php?t=111</a><br /><br />Having seen the above results I'm thinking about projects for next release now. Anybody want to fund some additional Data Warehousing features in Postgres core? I'm determined that next release we will get Bitmap Indexes in core, at least.<br /><br />There's some more to discuss on parallel query, such as "How does this all relate to Hot Standby?", so I'll follow up later with another blog.Simon Riggshttp://www.blogger.com/profile/06017750505968534813noreply@blogger.com9