It's important not to misinterpret Mike Stonebraker's comments about database systems and "one size no longer fitting all applications." Commercial data processing needs inspired early database design, and current relational products from mainstream vendors, such as Oracle and Microsoft, reflect this heritage. Furthermore, the development of commercial relational database systems in the late 1980s and early 1990s was driven almost entirely by the debit-credit benchmark (TPC/A and TPC/B). Consequently, traditional commercial offerings do an excellent job at transaction processing and a relatively poor job of handling ad hoc, decision support queries on multi-terabyte database systems (a space that Teradata has dominated at the high end). While the plummeting cost of hardware has enabled more and more organizations to mine their multi-terabyte databases, many companies find Teradata solutions to be out of their price range. As a result, vendors such as Vertica, Greenplum, and Netezza have found traction in the data warehousing space.
Mike's vision of the future in built upon the belief that: 1) the market will trend towards superior performance through specialization, and 2) that different data sets -- or maybe even the same data set -- will be managed by different types of database systems depending on the target application. For example, a stock tick stream might be managed initially by a stream database system in order to provide real time answers to continuous queries, but then it will be loaded into a data warehouse where it will be managed by a system like Vertica or Netezza for historical analysis.
It is a mistake to label a system that is tuned for a specific function such as data warehousing as serving a "niche" market. Just as Oracle and SQL Server are designed for transaction processing, Vertica, Netezza, Greenplum, and Teradata are built specifically for handling ad hoc queries on multi-terabyte data warehouses. And even within this latter space, the different products are specialized in different ways. Rather than relying on the indices used extensively by database systems targeted for the OLTP marketplace, Netezza focuses on very fast sequential scan performance using custom hardware. Vertica takes this specialization to the next performance level by observing that a database system based on a column-oriented architecture is much better for OLAP applications than one that uses a row-oriented architecture -- an architecture designed originally for OLTP applications.
[David DeWitt, professor of computer sciences at the University of Wisconsin and Advisor to Vertica Systems, Inc.]
Different database systems being used for different tasks is already the reality. (And not one that people are very happy about, I might add.) That's why there is a data warehousing market, as David points out - Oracle, SQL Server, etc. can't provide those capabilities acceptably. So I already agree with Mike Stonebraker's assertions, because they're already true.
I am confident that the dominance of the database market by traditional OLTP databases will continue to decrease, but I do not believe they they will ever be in the minority. Further, I think that it is the very splintering of how databases are used that will cause both Netezza and Vertica to end up in niches. (And the other emerging-DB vendors to end up out of business, at that...) As someone focused on (ok, obsessed with) reporting and analytics, I'd love to see Oracle and SQL Server relegated to the 'dinosaur' category that mainframes now enjoy. Unfortunately, despite my personal preferences, I just don't see it happening.
There are five factors that I think will lead to a trend toward niches:
• Inertia - Everybody starts with a traditional, OLTP-oriented database, and most of them are inclined to stay there. (Ok, ok, not everybody, I know, but the vast majority...) Remember, nobody gets fired for buying IBM (or Oracle, in this case).
• Dataset size - Only a certain number of datasets warrant a Netezza or Vertica. Granted, that number will grow, but with hardware and software improvements, Oracle, SQL Server, etc. will keep up to some degree.
• Cost - With SQL Server being on par with Oracle these days, you can get a good database cheap. The price gap between the traditional and emerging database systems can be so large that people are mentally, not financially, unwilling to make the jump. ("Yes, I get 50x the performance, but it costs me seven figures... I can get a hell of an Oracle system for that money!")
Thus, between human tendencies, database size and system cost, Netezza, Greenplum, Vertica, et al are immediately relegated to a small, but admittedly growing, portion of the market. But on with the list.
• ROI - Not everyone needs ad hoc or even flexible reporting. When your reporting needs are simple, performance generally isn't that much of an issue. Even if you are feeling adventurous, have lots of data and lots of money, moving to a non-traditional database system may sometimes not be worth the hassle.
• Competition - If the factors above create a specialized reporting/analytics database niche (psssst - one already exists), then multiple players within that niche that can substantially differentiate themselves will create smaller niches. Netezza is a category all its own. Paraccel and Dataupia are essentially the same thing. [Ed: This turns out to be very not true - see here] But Vertica... Vertica is notably different, on a variety of levels. And that makes for at least three sub-niches already.
Thus Vertica will be better for some things (ad hoc reporting comes to mind) while Netezza will be better for others. That suitability for different things within an already-small segment of the market will lead to specialization... which is just another word for niche.
Now, I will come right out and say that I can see the potential holes in my argument. Let me call out some of those now and save the rest of you the hassle.
• The emerging databases will eventually get cheaper. Even so, Oracle, Teradata, etc. have pretty deep pockets and pretty healthy margins, so I don't see serious differentiation on price lasting forever.
• Data sets will continue to grow. But I don't think that means that smaller datasets will stop growing in quantity either, so the relative market percentages will likely stay about the same.
• Vertica seems to have mastered both scaling up and scaling down, so they may be able to attract a broader audience than Netezza (which does not scale down well), thus broadening their niche (and maybe breaking out of niche status altogether).
• Oracle or Microsoft is likely to come out with (aka buy) something to be competitive in this space, which will likely change things so drastically as to make my theory completely invalid.
So, what else am I missing?