Deprecated: Function split() is deprecated in /home/tbriggs/public_html/s9y/plugins/serendipity_event_metadesc/serendipity_event_metadesc.php on line 101

Highlights from New England Database Day '08

Today's New England Database Day event was a real treat. The venue was good, the presentations were great and the free lunch was tasty. How much more fan could a database geek have?!

The morning begun with - in the words of Mike Stonebraker - "commercial messages" by the event's sponsors, Vertica and Netezza. The comments made by Vertica's Andy Palmer and Netezza's John Metzger were brief, sanguine and to their credit not at all sales-pitchy, which bode well for the rest of the day. Humorously, they both used paraphrased forms of my "it's a great time to be a database geek" phrase, which I found both flattering and irritating - just quote me verbatim and give me credit already. ;-)

But enough of my megalomaniac delusions - here are the highlights of the day, in the order presentations were made.

•Dr. David DeWitt began the formal presentations with an overview of his Clustera project. Intentionally or otherwise this largely devolved into a discussion of Why MapReduce Is Evil, but Clustera looks interesting nonetheless. Maybe most interestingly it's a research project at least partially predicated on using commodity components rather than building everything from scratch. And here I thought that university research projects were for some reason obligated to continually reinvent the wheel... ya learn something new every day.

Philosophical considerations aside, however, I found Clustera very interesting not for what it does but what it resembles - Netezza. Aside from the pull vs. push nature of the compute nodes, what DeWitt described could very well have been a high-level description of Netezza if he'd changed a few labels. But then again I don't really think of Netezza as a database vendor anymore anyway.

•Gerome Miklau's presentation on Managing Historical Retention in Database Systems was probably not good for the paranoid privacy freak in me, but it was an interesting perspective on something we don't tend to think much about - what happens (or doesn't) when you delete data. This is something that's clearly a major concern for some companies in some situations, and I don't know that I'd have realized that without Gerome's presentation.

The first of the day's many mentions of SQLite also appeared in Gerome's talk. (Note to Gerome, if someone knows how to contact him: was auto-vacuum turned on when you tested SQLite?) I'm apparently not the only big fan of SQLite, which is both comforting and inspiring.

•The presentation on Requirements Engineering Databases by Brian Berenbach was another "huh, I would never have thought of that" type of presentation. It was also clearly the only presentation made by a non-academic - that isn't necessarily better or worse, just different.

•One of the talks I was most looking forward to was the one by Stavros Harizopolous on Designing a Next-generation OLTP Engine. Though I didn't really get what I was expecting, it was a very useful presentation nonetheless. Stavros discussed a project to progressively remove pieces from the Shore Storage Manager, which has apparently been proven to be similar enough to a database to count as one, in order to assess how much of a performance improvement might be possible for OLTP systems. The answer: a lot. I honestly don't think that needed to be proven. :-) Nor am I really sure I think the approach he described is really valid enough to justify his results verbatim. But I think the general idea is sound, even if the exact potential for improvement isn't quite as high as he suggests.

•The presentation that surprised me most may have been that by Ryan Johnson of CMU about sharing work to maximize throughput. (More information available here, I think. Not 100% positive.) What surprised me was that it may have been the only presentation I've ever seen that essentially shows that the main idea being discussed is actually a bad thing in most situations. Not necessarily all situations, but still... only in academia.

The highlight of the entire day may have come at the end of Ryan's presentation, when Mike Stonebraker rather tartly suggested that the results would have been dramatically different had the tests been based on a column store rather than a row store. I was surprised and somewhat impressed to hear Ryan return a volley about certain tables that don't work well with column stores and how confident he is that there's a place in the world for column stores. That one 10 second exchange had a very palpable tension to it... kinda gave me goosebumps.

After the four morning sessions we had about an hour to eat the very tasty lunch, during which I had a very enjoyable conversation with a Brown student about a project he's working on. Andy, if you're listening: avoid MySQL. Please. Please please. I know you don't know me from a hole in the wall, but please, trust me on this one.

•Following lunch came an overview of the SASE+ event processing system built by Yanlei Diao et al. I didn't know anything about event processing before this talk, and I still don't know much, but to Yanlei's credit I feel as though I got the basic idea of both event processing and the SASE+ project by the end of it. I at least understand enough to know why people are so excited about CEP these days.

•One of the more humorous talks of the day came from Daniel Abadi, who suggested that one could build a column store in a week (even if it didn't work very well). For those who understand column stores there wasn't really any news here, but it was entertaining. The concepts put forth would go a long way to educating those who aren't familiar with column stores about them, for that matter, so I hope the slides are made available somewhere.

This talk also gave rise to one of the many "must read about that later" topics, namely the Star Schema Benchmark. Comments to come in the near future, I hope...

•Liuba Shrira followed Daniel and talked about Split Snapshots: A New Approach to Old State Storage. There audience seemed entirely disinterested through almost the entire talk, which was a first for the day and rather sad. I have to admit that I wasn't really immune, unfortunately. Suddenly near the end of the presentation everything clicked, however, and I saw how elegant and useful the solution she'd described really was. I hope everyone else got it too, because it was pretty cool once it clicked.

•The final presentation I saw was by John Corwin about NanoDB, a modular and configurable database project being developed by the Yale Database Group. Corwin described a system comprised of pluggable components centered around a micro-kernel, such that individual modules could be replaced or reconfigured in isolation. This strikes me as a very powerful concept on a number of levels ranging from performance to pedagogy, so this is a project I intend to follow.

This session also brought out the only question of the day from the Netezza contingent. Whether that's because they think it's cool or scary I dunno.

Unfortunately I had to leave the event at this point due to family obligations, so I missed the final presentation of the day and the poster session that followed. Even so, this was a fabulous event full of interesting people and ideas, and I'm extremely glad I went. I think we owe Sam Madden and Mike Stonebraker many thanks for organizing the event as well as our gratitude to Vertica and Netezza for sponsoring the it. There are few better ways for a database geek to spend a day, I think, and I hope this event becomes a regular occurrence.

It would just be one more reason why it's a great time to be a database geek. :-P
Trackbacks

Trackback specific URI for this entry

As mentioned a few days ago, Daniel Abadi gave a great talk on how (or how not) to go about building a column store. He has graciously made the slides from that presentation available here. I believe that the talk, entitled "How to Create a New Column
At HPTS this past October, Michael Stonebreaker delivered a presentation called It's Time for a Complete Rewrite. The main point seems to be that the general purpose relational database management system has outlived its usefulness after 30-40 years, and the market needs specialized database management systems to meet current application requirements. A lot of these topics were covered in an interview published in the Sept/Oct issue of ACM Queue. However, Mike stops short of describing some of his new proposals for these specialist databases. Last Monday a lot of this was discussed at the New England Database Day session at MIT, where Michael now teaches. It looked to me as if about 100 people showed up, and I believe they said a majority were from industry. The presentations were very interesting. A good summary can be found here. A highlight was certainly Dave DeWitt's presentation on Clustera. Despite the...
I got an email yesterday announcing New England Database Day '09. Needless to say I was psyched, as I enjoyed last year's quite a bit, so I've already registered. If you're in the Boston area and interested in database technology I would heartily recomm
Comments
As you requested, my slides can be found at:

http://cs-www.cs.yale.edu/homes/dna/talks/abadi-nedbday.pdf
Add Comment



Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.