"New Breed" Database Extensibility

At present, I can think of at least 5 "new breed" database vendors that you allow you to extend their SQL language in some form or another:

Greenplum (I think)

Of course, the old guard is well-represented in this category as well - Oracle, DB2, SQL Server, Teradata, etc. all allow language extension via custom functions, plug-ins, etc.

I don't know enough about it yet to know, but I think that Greenplum and AsterData might also belong on this list due to their support for Map/Reduce.

It wasn't all that long ago (3 or 4 years) that most of these vendors didn't even support the full SQL standard, never mind compiled-code extensions to their SQL language. Oh, the possibilities.

Viewing the Execution Plan of a Running Query in SQL Server

Or, Simple SQL Server DBA Tricks

Here's a trick I wish I'd discovered long ago... given a long-running query started by session ID of 42, you can get the handle to the cached execution plan for the query by running:

SELECT plan_handle FROM sys.dm_exec_requests WHERE session_id = 42

Given that plan handle, you can get an XML representation of the query plan by running:

SELECT query_plan FROM sys.dm_exec_query_plan (0x06001D009FE38431400399D4000000000000000000000000);

(The hex value there is the value returned by the previous query.)

Now, save the output of that query to a file with a .sqlplan extension, and open it with Management Studio. Viola! A graphical representation of the query plan for the problem query as it was actually executed.

Grabbing the query and generating a plan for it will usually give you the same result, I believe. It has always bugged me that there was a potential difference between the plan generated for a problem query at run-time vs. the plan generated at problem diagnosis time though. This relieves that little concern and provides other options for working with the query plan as well (it is XML, after all).

Belated Thoughts on the DATAllegro Acquisition

Ok, so, my previously announced comeback may have been a bit premature. Live and learn. But that doesn't mean that I haven't been thinking or reading about the acquisition of DATAllegro by Microsoft.

In short, I think this acquisition was a master stroke by Microsoft. Not only does it put MS in a much better position - specifically versus Oracle - it keeps DATAllegro out of the hands of Oracle. Given the modularity of the DATAllegro system, I think they probably represented the fastest way for Oracle to make the leap to MPP. And unless the folks at DATAllegro have gone out of their way to mislead me, I expect the DATAllegro/MS integration will happen fast enough to cause Oracle some serious heartburn.

It seems, then, that MS has killed two birds with one stone.

No, actually, I think they may have killed three birds. Not only has MS put itself in a much better position (1 bird) while simultaneously hurting Oracle (2 birds), it has left everyone else with no other options to boot (3 birds). I for one don't see consolidation in the database industry as a given, and I especially don't believe that it's "imminent", as others have predicted. So the icing on the cake may be that by taking DATAllegro off the market, MS may have left the other vendors with nowhere to turn.

Why? Tune in tomorrow for my thoughts on that. :-)

Microsoft to Acquire DATAllegro

As announced here and here, Microsoft is acquiring DATAllegro. Guess I called that one right... though I 'spose my time frame was a bit off.

One key quote:

“Integrating DATAllegro’s nonproprietary hardware platform and flexible software architecture into Microsoft SQL Server will provide customers with the strongest offering in the market,” said Stuart Frost, CEO of DATAllegro.

I look forward to seeing what "integrating into Microsoft SQL Server" means.

More on the Vertica Niche

Professor David DeWitt posted the following comments in reply to my Netezza/Vertica predictions. I've reproduced them here for visibility, and so I can respond to them more properly.

It's important not to misinterpret Mike Stonebraker's comments about database systems and "one size no longer fitting all applications." Commercial data processing needs inspired early database design, and current relational products from mainstream vendors, such as Oracle and Microsoft, reflect this heritage. Furthermore, the development of commercial relational database systems in the late 1980s and early 1990s was driven almost entirely by the debit-credit benchmark (TPC/A and TPC/B). Consequently, traditional commercial offerings do an excellent job at transaction processing and a relatively poor job of handling ad hoc, decision support queries on multi-terabyte database systems (a space that Teradata has dominated at the high end). While the plummeting cost of hardware has enabled more and more organizations to mine their multi-terabyte databases, many companies find Teradata solutions to be out of their price range. As a result, vendors such as Vertica, Greenplum, and Netezza have found traction in the data warehousing space.

Mike's vision of the future in built upon the belief that: 1) the market will trend towards superior performance through specialization, and 2) that different data sets -- or maybe even the same data set -- will be managed by different types of database systems depending on the target application. For example, a stock tick stream might be managed initially by a stream database system in order to provide real time answers to continuous queries, but then it will be loaded into a data warehouse where it will be managed by a system like Vertica or Netezza for historical analysis.

It is a mistake to label a system that is tuned for a specific function such as data warehousing as serving a "niche" market. Just as Oracle and SQL Server are designed for transaction processing, Vertica, Netezza, Greenplum, and Teradata are built specifically for handling ad hoc queries on multi-terabyte data warehouses. And even within this latter space, the different products are specialized in different ways. Rather than relying on the indices used extensively by database systems targeted for the OLTP marketplace, Netezza focuses on very fast sequential scan performance using custom hardware. Vertica takes this specialization to the next performance level by observing that a database system based on a column-oriented architecture is much better for OLAP applications than one that uses a row-oriented architecture -- an architecture designed originally for OLTP applications.
[David DeWitt, professor of computer sciences at the University of Wisconsin and Advisor to Vertica Systems, Inc.]

Different database systems being used for different tasks is already the reality. (And not one that people are very happy about, I might add.) That's why there is a data warehousing market, as David points out - Oracle, SQL Server, etc. can't provide those capabilities acceptably. So I already agree with Mike Stonebraker's assertions, because they're already true.

I am confident that the dominance of the database market by traditional OLTP databases will continue to decrease, but I do not believe they they will ever be in the minority. Further, I think that it is the very splintering of how databases are used that will cause both Netezza and Vertica to end up in niches. (And the other emerging-DB vendors to end up out of business, at that...) As someone focused on (ok, obsessed with) reporting and analytics, I'd love to see Oracle and SQL Server relegated to the 'dinosaur' category that mainframes now enjoy. Unfortunately, despite my personal preferences, I just don't see it happening.

There are five factors that I think will lead to a trend toward niches:
• Inertia - Everybody starts with a traditional, OLTP-oriented database, and most of them are inclined to stay there. (Ok, ok, not everybody, I know, but the vast majority...) Remember, nobody gets fired for buying IBM (or Oracle, in this case).
• Dataset size - Only a certain number of datasets warrant a Netezza or Vertica. Granted, that number will grow, but with hardware and software improvements, Oracle, SQL Server, etc. will keep up to some degree.
• Cost - With SQL Server being on par with Oracle these days, you can get a good database cheap. The price gap between the traditional and emerging database systems can be so large that people are mentally, not financially, unwilling to make the jump. ("Yes, I get 50x the performance, but it costs me seven figures... I can get a hell of an Oracle system for that money!")

Thus, between human tendencies, database size and system cost, Netezza, Greenplum, Vertica, et al are immediately relegated to a small, but admittedly growing, portion of the market. But on with the list.

• ROI - Not everyone needs ad hoc or even flexible reporting. When your reporting needs are simple, performance generally isn't that much of an issue. Even if you are feeling adventurous, have lots of data and lots of money, moving to a non-traditional database system may sometimes not be worth the hassle.
• Competition - If the factors above create a specialized reporting/analytics database niche (psssst - one already exists), then multiple players within that niche that can substantially differentiate themselves will create smaller niches. Netezza is a category all its own. Paraccel and Dataupia are essentially the same thing. [Ed: This turns out to be very not true - see here] But Vertica... Vertica is notably different, on a variety of levels. And that makes for at least three sub-niches already.

Thus Vertica will be better for some things (ad hoc reporting comes to mind) while Netezza will be better for others. That suitability for different things within an already-small segment of the market will lead to specialization... which is just another word for niche.

Now, I will come right out and say that I can see the potential holes in my argument. Let me call out some of those now and save the rest of you the hassle. :-)

• The emerging databases will eventually get cheaper. Even so, Oracle, Teradata, etc. have pretty deep pockets and pretty healthy margins, so I don't see serious differentiation on price lasting forever.
• Data sets will continue to grow. But I don't think that means that smaller datasets will stop growing in quantity either, so the relative market percentages will likely stay about the same.
• Vertica seems to have mastered both scaling up and scaling down, so they may be able to attract a broader audience than Netezza (which does not scale down well), thus broadening their niche (and maybe breaking out of niche status altogether).
• Oracle or Microsoft is likely to come out with (aka buy) something to be competitive in this space, which will likely change things so drastically as to make my theory completely invalid.

So, what else am I missing?