A colleague recently ran some tests with Oracle using only RAM storage in order to improve performance of a DSS application. If I/O is the bottleneck for reporting apps, then why not make the I/O infinitely fast, right? Who cares if there's a hard limit on how much RAM you can put in a box, or if all the data will be lost when the system is shut down?! Heck, this idea can't really be too crazy - there's a whole book on the subject by well-respected authors, after all. And there is at least one company making their business providing large-scale RAM storage with an eye toward database performance, for that matter, so it's not like this is the first time anyone's tried this.
My incredulity aside, the results of the tests were as first surprising but ultimately what I should have expected. Bottom line: in this particular experiment, storing everything (data, temp, undo, rollback, etc.) in RAM provided only a 5-6x performance improvement.
5-6x improvement is nothing to sneeze at, mind you, but it isn't the 10-100x improvement that Netezza has been claiming for years. Nor is it close to what I've seen from Vertica. But how can that be? Isn't I/O the bottleneck, and RAM infinitely fast?
Well, yes, on both accounts. But faster I/O doesn't solve the real problem when running DSS queries against large datasets, which is simply the amount of data that must be touched. Traditional, row-oriented systems must move and manipulate a lot of data in order to satisfy a typical DSS query. Even with infinitely fast I/O, therefore, they must still do an awful lot of work in order to satisfy that query.
In contrast, the new breed of databases - e.g. Netezza, Vertica, etc. - have been designed for reporting and analytics against large datasets. These new systems provide impressive performance improvements over older, transaction-oriented systems for one simple reason: they minimize data movement and manipulation. As a result, for DSS-type queries, they will always be able to outperform Oracle and other traditional DBMSs, even when I/O is infinitely fast.
Netezza and Vertica make great examples of this because they both do it but they go about it in very different ways. Netezza uses a minimalist data access approach that allows large datasets to be processed in small pieces with the least possible effort spent on each piece. Vertica uses a compressed column store that allows aggregate queries to be satisfied with mind-bogglingly small amounts of I/O. In both cases, the key is storing and/or accessing the least amount of data required to satisfy the query, and in a manner that scales as data size grows.
In short: RAM storage is not the answer. Moving and manipulating a lot of data really fast still requires moving and manipulating a lot of data. That will never be as fast as people want or need it to be. As a result, traditional databases will never satisfy reporting and analytics needs for large datasets as well as the newer systems designed for that purpose. Not even if I/O is free.