Getting to Know DATAllegro, Part I
To me, DATAllegro has always been the black sheep of the new database vendors. They've never done anything sexy (that I'm aware of), they're on the west coast (yes, that is a sin) and they don't really make any noise about performance. Actually, they don't seem to make any noise at all, save comments by CEO Stuart Frost on various blogs.So when the opportunity to learn more about DATAllegro popped up a few weeks ago I was both thrilled and apprehensive. On the one hand, I knew nothing about them, so I was glad to have the opportunity to fill in one of the (very few and small
It turns out that even though DATAllegro's not really making much noise, what they are doing is very, very interesting.
This post, as well as the one or two future posts, will cover what I learned about DATAllegro, why I think it's interesting and what I think it will mean long-term. As always, if you see anything amiss or a-missing, please let me know.
The Company (from 10,000 feet)
DATAllegro, based in Orange County, CA was founded in 2003 and first released a product 3 years ago. The current "V3" release of their product was released about a year and a half ago. They're still a fairly small company in the grand scheme of things, and like most other new database startups they're well funded. (They closed their fourth round of funding after I spoke with them, in fact.)
DATAllegro has important partnerships with EMC, Cisco and Dell. Who doesn't though, right? There's a bit more to it with DATAllegro though, as we'll see later.
The Product (also from 10,000 feet)
DATAllegro's V3 system is an MPP, shared-nothing, incrementally-scalable data warehouse appliance geared toward mixed workloads and not necessarily analytics and reporting. This is markedly different than the other new database vendors, whose systems are designed for and focused on analytics (not that that's a bad thing). Offhand I can think of only one or maybe two other vendors (namely Dataupia and maybe Netezza) who can make this claim, so this by itself sets DATAllegro apart.
DATAllegro systems are available in two main flavors: the SRA and the MRA.
The SRA, or Single Rack Appliance, is a single-rack, "whole kit" appliance designed to handle 12TB or less. This is a rather convenient all-in-one system ideal for smaller data warehouses, as a sandbox or a spoke in a hub/spoke architecture (more on that later). The SRA lists at $500k, which is competitive.
The MRA, or Multiple Rack Appliance, is the extensible version of the SRA. The MRA can be scaled in 15 or 25TB increments across multiple racks, and after a fee for the base system is priced at $15k/TB. The MRA also includes a "landing zone" node which provides a convenient way to load data quickly.
In addition, both the SRA and MRA systems offer optional (but always included) backup nodes.
What's really interesting about DATAllegro's offering, however, is not the packaging but the contents. The system is built using only high-end commodity hardware - namely EMC CX3s for storage, Dell compute nodes and Cisco InfiniBand switches - and is the reason DATAllegro frequently uses the phrase "enterprise-class platform". This approach has a number of advantages for both producer and consumer:
- First and probably foremost, it makes sales easier. Customers aren't afraid of top-end hardware.
- Interesting partnership opportunities are possible. In Europe, for example, DATAllegro partners with Bull, who substitutes their systems for the Dell compute nodes.
- Development systems are easier/cheaper to produce, as dev systems don't take production systems out of service.
The even more interesting implication to using commodity hardware, however, is that the MPP magic is all in the software. Even here DATAllegro leverages non-proprietary components - namely SuSE Linux and Ingres - to provide most of the necessary functionality. But on top of those pieces sits DATAllegro's IP, tying the hardware and software components together to turn it all into a data warehouse appliance. That's simple and elegant, quite frankly, and simple and elegant is cool. It's hard not to like it.
Especially appealing is the idea that, as a result of this modularity, any component can be changed, including the database software. This appeals to the software guy in me, but I think it has some interesting potential implications that I'll talk about in future posts.
The Performance (still from 10,000 feet)
The downside to commodity hardware and software is that it limits how clever you can be about accessing data. As a result, the performance of a DATAllegro system boils down to scan rates. This isn't necessarily a bad approach; good hardware plus intelligent I/O planning can make for some pretty impressive scan rates - 0.5 to 10.5TB/minute, according to their web site, though I wonder if those speeds are practical or theoretical. Combine that with multi-level (hash plus value) partitioning and simple scans make for a simple but very fast system. (I think Netezza has already proven that...)
On the other side of the performance equation, DATAllegro claims, quite simply, to "load faster than anybody". With reported load speeds of 1.2TB/hour or faster while queries are running, that's a legitimate claim. I believe that this may only be possible when using the 'landing zone' to load data rather than an external bulk loader, however. Nonetheless that's pretty impressive, and is an important part of the hub-and-spoke grid that I'll write more about in the future.
In summary: DATAllegro doesn't make a tremendous amount of noise about system performance, and to be honest I'm not sure that maximum possible raw performance is their ultimate goal. Given their focus on mixed workloads, consistency and predictability seem more important. (If operational systems and data warehouses merge, as I expect they ultimately will, that will certainly be the case.) Performance levels are pretty impressive nonetheless and are likely more than sufficient unless raw performance is your most important criterion.
Looking Ahead
That's all from the high level. In my next post about DATAllegro I plan to dive more into architectural details as well as talk about some of the other interesting things they're doing. Stay tuned.
UPDATE
The following corrections were made after this was originally posted:
- DATAllegro was founded in 2003, not 2000.
- There is a base cost for the MRA, plus a per-TB fee for each 15 or 25TB increment after that.
- The SRA does not include a "landing zone" node; the master node suffices in a single-rack system.


Trackbacks
Trackback specific URI for this entry
Comments
Add Comment