And Now, I Have Seen it All

So I'm probably the last one to see this, but I just found a Dataupia commercial on YouTube. A rather funny one, I have to admit.

Yes, I am indeed that much of a database dork that I thought to search for database company names on YouTube.

Thoughts on Dataupia

1. Contrary to what you may have read here, Dataupia is not a software-only offering. (Vertica and Greenplum are still the only new DB players I'm aware of that are available sans hardware, OSS projects aside.)

2. Yesterday's announcement of the Dynamic Aggregation Engine is philosophically very interesting to me. Why? Because it makes clear Dataupia's mental model - nothing revolutionary, nothing necessarily even fancy, just useful combinations of established tools. I find that refreshing, to be honest, because while new ways of doing things are often exciting, it's good to remember that sometimes what you need is already right in front of you.

If that doesn't make much sense to anyone but me... well, it wouldn't be the first time. :-)

Netezza Quietly Separates Itself from the Pack

It has been a great year to be a database geek. With new products like Vertica and Dataupia's Satori Server arriving, how could one not be excited?

Depressingly, however, throughout this year, my confidence in Netezza has been slipping. I've been a huge Netezza fan since I first learned about them almost three years ago. I spent quite a bit of time telling everyone I could find that they needed to get on the Netezza bandwagon or get run over by it. So my sense that the database world was running away and leaving Netezza behind was downright depressing.

Turns out I was right. I just shouldn't have been depressed about it.

I think my concern about Netezza's future was somewhat justified, as this year's assault on Netezza has been three-fold:

It occurred to me a few days ago, however, that the seemingly innocuous things I've heard from Netezza lately actually add up to some faith-restoring conclusions. The fact that they exhibited at SC07, for example, lends some credence to my theory that they aren't just a database vendor anymore. Now granted, I wasn't at SC07, so maybe I'm seeing what I want to see. Could be.

But what really soothed my soul was the discovery that SPUBoxes are available to Netezza Development Network members . SPUBoxes have long been a reality inside Netezza, and I never understood why they weren't a reality outside Netezza. I guess I still don't have an answer, but the important thing is that they are available outside Netezza.

Now, as a software guy, hardware doesn't usually do much for me, but I think this is key. One of Netezza's long-standing issues has been how to get the system in the hands of developers, partners, etc. so that they can push the platform forward. Loaning people multi-million dollar systems just isn't a scalable approach, after all. The SPUBox finally solves that problem. That should mean explosive growth in the number of people familiar with the system as well as the number of applications developed for it.

Or maybe I should say with it. SPUBoxes are specifically aimed at developers who want to use the new UDFs functionality to push work down to the SPUs. That seems like an odd goal for a database company, even one with proprietary hardware.

Unless, of course, Netezza doesn't really consider itself a database company anymore.

And so, we've come full circle. I think the database world is running away from Netezza, but not because Netezza is standing still. I think they're just running in a totally different direction.

I wonder how long it'll take the world to catch up with them this time...

More on the Vertica Niche

Professor David DeWitt posted the following comments in reply to my Netezza/Vertica predictions. I've reproduced them here for visibility, and so I can respond to them more properly.

It's important not to misinterpret Mike Stonebraker's comments about database systems and "one size no longer fitting all applications." Commercial data processing needs inspired early database design, and current relational products from mainstream vendors, such as Oracle and Microsoft, reflect this heritage. Furthermore, the development of commercial relational database systems in the late 1980s and early 1990s was driven almost entirely by the debit-credit benchmark (TPC/A and TPC/B). Consequently, traditional commercial offerings do an excellent job at transaction processing and a relatively poor job of handling ad hoc, decision support queries on multi-terabyte database systems (a space that Teradata has dominated at the high end). While the plummeting cost of hardware has enabled more and more organizations to mine their multi-terabyte databases, many companies find Teradata solutions to be out of their price range. As a result, vendors such as Vertica, Greenplum, and Netezza have found traction in the data warehousing space.

Mike's vision of the future in built upon the belief that: 1) the market will trend towards superior performance through specialization, and 2) that different data sets -- or maybe even the same data set -- will be managed by different types of database systems depending on the target application. For example, a stock tick stream might be managed initially by a stream database system in order to provide real time answers to continuous queries, but then it will be loaded into a data warehouse where it will be managed by a system like Vertica or Netezza for historical analysis.

It is a mistake to label a system that is tuned for a specific function such as data warehousing as serving a "niche" market. Just as Oracle and SQL Server are designed for transaction processing, Vertica, Netezza, Greenplum, and Teradata are built specifically for handling ad hoc queries on multi-terabyte data warehouses. And even within this latter space, the different products are specialized in different ways. Rather than relying on the indices used extensively by database systems targeted for the OLTP marketplace, Netezza focuses on very fast sequential scan performance using custom hardware. Vertica takes this specialization to the next performance level by observing that a database system based on a column-oriented architecture is much better for OLAP applications than one that uses a row-oriented architecture -- an architecture designed originally for OLTP applications.
[David DeWitt, professor of computer sciences at the University of Wisconsin and Advisor to Vertica Systems, Inc.]

Different database systems being used for different tasks is already the reality. (And not one that people are very happy about, I might add.) That's why there is a data warehousing market, as David points out - Oracle, SQL Server, etc. can't provide those capabilities acceptably. So I already agree with Mike Stonebraker's assertions, because they're already true.

I am confident that the dominance of the database market by traditional OLTP databases will continue to decrease, but I do not believe they they will ever be in the minority. Further, I think that it is the very splintering of how databases are used that will cause both Netezza and Vertica to end up in niches. (And the other emerging-DB vendors to end up out of business, at that...) As someone focused on (ok, obsessed with) reporting and analytics, I'd love to see Oracle and SQL Server relegated to the 'dinosaur' category that mainframes now enjoy. Unfortunately, despite my personal preferences, I just don't see it happening.

There are five factors that I think will lead to a trend toward niches:
• Inertia - Everybody starts with a traditional, OLTP-oriented database, and most of them are inclined to stay there. (Ok, ok, not everybody, I know, but the vast majority...) Remember, nobody gets fired for buying IBM (or Oracle, in this case).
• Dataset size - Only a certain number of datasets warrant a Netezza or Vertica. Granted, that number will grow, but with hardware and software improvements, Oracle, SQL Server, etc. will keep up to some degree.
• Cost - With SQL Server being on par with Oracle these days, you can get a good database cheap. The price gap between the traditional and emerging database systems can be so large that people are mentally, not financially, unwilling to make the jump. ("Yes, I get 50x the performance, but it costs me seven figures... I can get a hell of an Oracle system for that money!")

Thus, between human tendencies, database size and system cost, Netezza, Greenplum, Vertica, et al are immediately relegated to a small, but admittedly growing, portion of the market. But on with the list.

• ROI - Not everyone needs ad hoc or even flexible reporting. When your reporting needs are simple, performance generally isn't that much of an issue. Even if you are feeling adventurous, have lots of data and lots of money, moving to a non-traditional database system may sometimes not be worth the hassle.
• Competition - If the factors above create a specialized reporting/analytics database niche (psssst - one already exists), then multiple players within that niche that can substantially differentiate themselves will create smaller niches. Netezza is a category all its own. Paraccel and Dataupia are essentially the same thing. [Ed: This turns out to be very not true - see here] But Vertica... Vertica is notably different, on a variety of levels. And that makes for at least three sub-niches already.

Thus Vertica will be better for some things (ad hoc reporting comes to mind) while Netezza will be better for others. That suitability for different things within an already-small segment of the market will lead to specialization... which is just another word for niche.

Now, I will come right out and say that I can see the potential holes in my argument. Let me call out some of those now and save the rest of you the hassle. :-)

• The emerging databases will eventually get cheaper. Even so, Oracle, Teradata, etc. have pretty deep pockets and pretty healthy margins, so I don't see serious differentiation on price lasting forever.
• Data sets will continue to grow. But I don't think that means that smaller datasets will stop growing in quantity either, so the relative market percentages will likely stay about the same.
• Vertica seems to have mastered both scaling up and scaling down, so they may be able to attract a broader audience than Netezza (which does not scale down well), thus broadening their niche (and maybe breaking out of niche status altogether).
• Oracle or Microsoft is likely to come out with (aka buy) something to be competitive in this space, which will likely change things so drastically as to make my theory completely invalid.

So, what else am I missing?

The Transparent One-Way Door

I just stumbled upon this post by Andy Hayler talking about Dataupia's new MPP database appliance. (To learn more about Dataupia, see here - link stolen from Andy's comments).

I guess my thoughts don't have as much to do with Andy's comments as with Dataupia, though maybe my thoughts are an answer to his question:

assuming that the new product delivers on its promise

Given what I know of the system and the people involved, I think that it will ultimately deliver. There's one hitch though, at least as of a few months ago - the system is only transparent for read purposes. You have to load the data into the Dataupia system directly; from there it becomes visible via Oracle, DB2, whatever.

Long term I think they plan to allow data to be loaded through the host DB, but for now at least, this transparent one-way door will make it difficult to use the Satori server in a lot of places where I think it would otherwise shine.