"New Breed" Database Extensibility

At present, I can think of at least 5 "new breed" database vendors that you allow you to extend their SQL language in some form or another:

Netezza
ParAccel
Kognitio
Greenplum (I think)
DATAllegro/Microsoft

Of course, the old guard is well-represented in this category as well - Oracle, DB2, SQL Server, Teradata, etc. all allow language extension via custom functions, plug-ins, etc.

I don't know enough about it yet to know, but I think that Greenplum and AsterData might also belong on this list due to their support for Map/Reduce.

It wasn't all that long ago (3 or 4 years) that most of these vendors didn't even support the full SQL standard, never mind compiled-code extensions to their SQL language. Oh, the possibilities.

Belated Thoughts on the DATAllegro Acquisition

Ok, so, my previously announced comeback may have been a bit premature. Live and learn. But that doesn't mean that I haven't been thinking or reading about the acquisition of DATAllegro by Microsoft.

In short, I think this acquisition was a master stroke by Microsoft. Not only does it put MS in a much better position - specifically versus Oracle - it keeps DATAllegro out of the hands of Oracle. Given the modularity of the DATAllegro system, I think they probably represented the fastest way for Oracle to make the leap to MPP. And unless the folks at DATAllegro have gone out of their way to mislead me, I expect the DATAllegro/MS integration will happen fast enough to cause Oracle some serious heartburn.

It seems, then, that MS has killed two birds with one stone.

No, actually, I think they may have killed three birds. Not only has MS put itself in a much better position (1 bird) while simultaneously hurting Oracle (2 birds), it has left everyone else with no other options to boot (3 birds). I for one don't see consolidation in the database industry as a given, and I especially don't believe that it's "imminent", as others have predicted. So the icing on the cake may be that by taking DATAllegro off the market, MS may have left the other vendors with nowhere to turn.

Why? Tune in tomorrow for my thoughts on that. :-)

Book Review: Snappy Interviews: 100 Questions to Ask Oracle DBAs

Even those familiar with Oracle can't always think of a good interview question offhand. Thinking of enough questions to effectively interview someone is even harder. Snappy Interviews: 100 Questions to Ask Oracle DBAs solves that problem by providing a very handy list of exactly those types of questions.

Put together by Chris Lawson, author of The Art and Science of Oracle Performance Tuning, it presents questions grouped into four categories; three based on difficulty (junor-, mid- and senior-level questions) and one focused on performance tuning. This makes it really easy to choose the right mix of questions for the particular interview you're conducting. Within each category the questions cover numerous topics ranging from database structure to security to recovery to SQL to Oracle architecture to... well, you name it, it's covered. The questions are precise enough, and the subject areas broad enough, to allow the interviewer to easily separate the wheat from the chaff.

For interviewers, this book is a great tool, and it will definitely take some of the stress out of preparing to interview a DBA candidate. But it will be a great tool for interviewees as well - running through a list of questions is a great way to put yourself into the proper frame of mind before an interview. For that matter, it will likely make a good quick reference for the working DBA too - if you don't often deal with certain aspects of an Oracle system (performance tuning, for example) this may be a great way to get answers to common questions in that area quickly.

Bottom line: A welcome addition to the world of Oracle books. Well worth the investment on either side of the interview table, and potentially useful on the DBA's desk as well.



RAM Storage Is Not The Answer

Or, Missing the Point 101

A colleague recently ran some tests with Oracle using only RAM storage in order to improve performance of a DSS application. If I/O is the bottleneck for reporting apps, then why not make the I/O infinitely fast, right? Who cares if there's a hard limit on how much RAM you can put in a box, or if all the data will be lost when the system is shut down?! Heck, this idea can't really be too crazy - there's a whole book on the subject by well-respected authors, after all. And there is at least one company making their business providing large-scale RAM storage with an eye toward database performance, for that matter, so it's not like this is the first time anyone's tried this.

My incredulity aside, the results of the tests were as first surprising but ultimately what I should have expected. Bottom line: in this particular experiment, storing everything (data, temp, undo, rollback, etc.) in RAM provided only a 5-6x performance improvement.

5-6x improvement is nothing to sneeze at, mind you, but it isn't the 10-100x improvement that Netezza has been claiming for years. Nor is it close to what I've seen from Vertica. But how can that be? Isn't I/O the bottleneck, and RAM infinitely fast?

Well, yes, on both accounts. But faster I/O doesn't solve the real problem when running DSS queries against large datasets, which is simply the amount of data that must be touched. Traditional, row-oriented systems must move and manipulate a lot of data in order to satisfy a typical DSS query. Even with infinitely fast I/O, therefore, they must still do an awful lot of work in order to satisfy that query.

In contrast, the new breed of databases - e.g. Netezza, Vertica, etc. - have been designed for reporting and analytics against large datasets. These new systems provide impressive performance improvements over older, transaction-oriented systems for one simple reason: they minimize data movement and manipulation. As a result, for DSS-type queries, they will always be able to outperform Oracle and other traditional DBMSs, even when I/O is infinitely fast.

Netezza and Vertica make great examples of this because they both do it but they go about it in very different ways. Netezza uses a minimalist data access approach that allows large datasets to be processed in small pieces with the least possible effort spent on each piece. Vertica uses a compressed column store that allows aggregate queries to be satisfied with mind-bogglingly small amounts of I/O. In both cases, the key is storing and/or accessing the least amount of data required to satisfy the query, and in a manner that scales as data size grows.

In short: RAM storage is not the answer. Moving and manipulating a lot of data really fast still requires moving and manipulating a lot of data. That will never be as fast as people want or need it to be. As a result, traditional databases will never satisfy reporting and analytics needs for large datasets as well as the newer systems designed for that purpose. Not even if I/O is free.



Coolest Database-Related Developments Ever

A while ago, the question of the coolest database developments ever came up. After some pondering, here's my list.

Disclaimer 1: The definition of the word "cool" is entirely subjective, and is not intended to imply significance, market impact, or anything other than what normal people generally understand "coolness" to be.

Disclaimer 2: I'm an RDBMS guy, and a reporting RDBMS guy at that. Hierarchical and object databases may be cooler than relational databases, but I don't know anything about them yet. I'm sure there are some cool OLTP-related developments I'm not thinking of too. Please don't complain about my lack of scope - it's a known deficiency. :-)

Without further ado...

  1. Netezza - A massively parallel database is cool enough. But a specialized hardware appliance that I can plug my own functions into? Please. How do you top that?

    Yes, yes, I know, Teradata has been doing the whole MPP database things for years. But NZ made it more affordable and a pleasure to work with. Can't say either of those things about Teradata.

  2. Vertica - Sybase has been doing the column oriented thing for a while, but much like Netezza made MPP practical, Vertica makes column orientation practical. Doing it with intelligent compression and whitebox hardware, in an MPP arrangement no less, is icing on an already tasty cake. If this were the 'most significant reporting developments list' this would probably be at the top. Even so, the sheer power and elegance of the things makes it incredibly cool.

  3. SQLite - A full featured, embeddable, easily extensible database engine in a 500k library? Free?! Unbelievable. I've gotten more use out of SQLite than just about any tool except my text editor, I think.

  4. The CASE function - If I had a nickel for every problem I've solved using the CASE function, I could probably put my kids through college with the money.

  5. Oracle for Linux - Oracle's support for Linux may have done more for the credibility of Linux in the corporate world than anything else. Plus, as a long-time Linux user, the availability of Oracle on Linux put me one very large step further away from Windows, which made me pretty happy.

What's on your list? What did I overlook? I'd love to hear others' opinions and extend/refine this list, so please, post a comment or send me an email.