2013-05-22

Why Energy-Sucking Data Centers Are Not The Future Of The Cloud

Wasteful, expensive, and overbuilt, major data centers are an inelegant solution to remote storage--not to mention risky, in the all-eggs-in-one-basket sense. The secret to achieving "indestructable data" with minimal energy costs may lie in the way nature has distributed our own DNA. A small company called Space Monkey, raising money now on Kickstarter, thinks they have created the digital equivalent.



Anyone who has tried to swear off local storage can tell you that it's expensive--not to mention impractical. Offline availability is sketchy even with robust products like Dropbox, and the cost of storing a terabyte of data can approach thousands of dollars a year. By comparison, auxiliary hard drives from stores like Best Buy get cheaper and more capacious every month.

Space Monkey is a startup built by two former Mozy engineers who think they might have an answer to the cloud conundrum: to give their users an auxiliary drive with redundant copies of other users' data, much the way nature gives every organism in a species a copy of its own genetic information. We talked to Utah-based founders Clint Gordon-Carroll and Alen Peacock about creating a truly distributed network of "indestructible" information.

What's the problem you're solving?

Clint: The storage problem. Your generated content is growing, and at some point, it starts to move exponentially. A stored terabyte in the cloud on the low end--talking Amazon Web Services or Google prices--on the very low end adds up to about $800 a year. On the high end, if Apple's iCloud even offered a terabyte of storage, it would cost about $2,400 a year. These are prices that are just out of reach to the average consumer who is concerned about where to safely store photos and videos. I think today most people end up going to buy an external hard drive and giving up a lot of the cloudlike features because they can’t afford them.

Where did the Space Monkey project originate?

Clint: Alen and I met at Mozy’s Online Backup Company where we started doing work together. Alen built a back-end distribute storage system in the data center, and I worked with him and other engineers as a product manager at Mozy. That’s where we met and sparked a relationship. Prior [to this] . . . Alen was at the MIT Lincoln Labs in Massachusetts where he was taking some graduate studies courses at MIT in distributed computing, where he started working on a small project. It was really a peer-to-peer backup project where they were thinking about this key problem of building indestructible data. At some point at Mozy, we started recognizing that was one of the key problems with cloud computing.

What's wrong with using data centers for cloud storage?

Alen: One of the things we noticed when we started out at Mozy, we had an unlimited backup offering that was $5 a month. The company was making money doing that because the average consumer was only backing up about 30 gigs worth of data. Then smartphones exploded, high-def video cameras were everywhere, and we noticed this sharp increase in demand. It actually was making the business unprofitable. That trend, converging with this decrease in price of consumer mass hardware, made it suddenly look like maybe we can combine a software approach with a peer-to-peer system on top of custom hardware and actually make something work here. As Clint said, at MIT, I took a distributed systems course [and] did an open-source project, which I later called FLUD. We actually used that system to do some early prototypes.

This article branches off a larger story we're tracking: The Death Of The File System.

"Indestructibility" is sort of the point of the Internet, isn't it?

Alen: Yes. The idea was to create a system that could survive multiple nuclear strikes and still work. You can’t design a system like that if you centralize all of the assets, and that’s exactly what we do with cloud [data centers]. Some companies do try to set up two or three different data centers and peer the data between them. As we’ve seen, even providers who do that have outages. The reason is that it wasn’t designed with that principle in mind--just distributing the system as much as you can to the very edge of the network. That’s what we’re trying to recapture: Move as much of the system to the very, very edge of the network as we can, which gives it this insane amount of geographic diversity. That allows us to store data in a way that, as long as we do it, it won’t, it can’t be destroyed. It’ll survive multiple natural disasters, power outages, ISP outages. That doesn’t matter; it should still work.

How does the Space Monkey solution work; what are the real technical challenges?

Alen: The environment that we’re working in is quite challenging. First of all, creating a distributed storage system even in a data center where you control all the assets is a challenging problem and a specialized skill that you’ll only find in some of those bigger companies that Clint mentioned in general. That’s the first specialization. The second one is then you want to take it out of the data center and put it in people’s homes, where they now have physical possession of those devices. There are all sorts of malicious user or attacker models that you have to deal with there. There’s also the challenging part of just the network environment. Obviously in people’s homes, they have quite a wide range of different bandwidth characteristics that you have to deal with as well as firewalls and NATs that you have to be able to traverse and be able to make these things talk to each other. The fact that these are out on the very edge of the network where you don’t have control of them also means that there’s a different reliability model that we can’t control as well as we can in the data center. For example, we can't control when [devices] are powered on and when they’re powered off, and consumer ISPs can be flaky: Even if the device is on, it may not be online. Building models and a system that can survive all of that and if you keep performing well in the face of all of that has been an ongoing challenge. We’ve had to deal with issues like building models of pools of users with different bandwidth characteristics--how do you make it so that the storage users aren’t bottlenecked for the fastest users? I think we’ve tackled most of those problems, but we are definitely still pushing performance up as we get closer to launch.

Clint: Remote management is a unique piece of Space Monkey. There’s a lot of systems and tools made for managing this inside the data center where you have the ability to go in there and fix them or update systems, so we had to roll our own remote management interface so that if there’s any type of issue in the device, you can [push] updates to it and fix things. Then also development on ARM processors--we’re really breaking ground on what kind of system we’re building on top of the ARM architecture on these small-mass devices. It's a minefield.

What else can you tell us about the device itself?

Alen: We’re providing a terabyte of storage to people on devices where they obviously don’t have a terabyte of space. I’m sitting in front of a MacBook Air with only about 120 gigs of space. Providing a view to your files that lets you have access to that full terabyte without actually having a copy of those files locally--obviously that’s a thing that others have done and that's not exactly a new technology. But being able to do that in a way that doesn’t surprise the user is something we focused a lot on as well. It actually has a really interesting intersection with this debate about "What is the file system?" and what should be the view that users have with their files?

Where do you come down in that debate about the future of the file system?

Alen: Certainly as you go mobile, the file system starts to disappear a little bit. That’s great, but we also noticed on the desktop, it’s really challenging because most apps in fact on the desktop are written with the assumption that data is immediately available, that it doesn’t have to go over a network actually to get data when it opens a file. The most commonly used apps made by Apple or Microsoft expect that they can just read data, and they can block the UI while they’re reading data because [local storage] is so fast. We’ve done a lot of work there trying to make it so that even those applications work well with our system.

How do people experience this Space Monkey drive as a drive on their home network?

Alen: It’s much like Dropbox: A special folder that’s on your file system and all of your files are in there. We will probably provide in the first version, as well, a sort of a network drive access via SMP or AFP or even DLNA so that the different devices can access that data, especially home media centers.

What was it like getting into making a physical product?

Clint: We fought hard against each other about how to add this hardware component. In the early days, we thought, We’re software guys; why would we even consider using hardware? But when we modeled it out, there were dozens of factors. How much storage do you need in order to make the data redundant, and how much time [accessing] the user's machines? We just couldn’t make it happen using local drives. You’d need a terabyte of storage on a laptop to make it redundant enough, and we couldn’t get those reliability times without a piece of hardware. In the early days before we really funded it, we went to Best Buy. We went and bought a Seagate drive and a couple of other drives. We rooted them and ran our prototype software on top of them and it was this moment of wow! You can see they’re actually powerful enough. The chips that they’re running on are getting faster and now dual-core. This is actually a possibility. We were still really shying away from building our own product or hardware.

Kickstarter contributors aside, how do investors look upon a startup with a hardware component like this?

Clint: When we started pitching investors, they clearly didn't want to be a part of a hardware startup. One of our investors, an ex-Google executive, looked at us and said, “What do you think the BOM of your device would be?” We had no idea what the acronym BOM meant. We just looked at each other blind stare like, “Oh boy. We’ve got a lot to learn about hardware.” We not only learned a lot ourselves but we also hired--somewhat by accident--a guy who liked making software but is a hardware engineer by trade. Between the three us and some other outside help, we learned a lot about the hardware business. One of the key things to remember is that we’re not reinventing the wheel on the hardware. Once you get that ARM chipset down, everything is really pretty standard. It’s just the case of software guys having to learn what those standards are. It’s been a fun event and experience. We definitely designed our device to be friendly and certainly unique. Something that you can put on a media center and it becomes a talking point. Well, what is that? Oh, it’s my Space Monkey device. Oh, it streams music from iTunes. Oh, great, it stores photos and videos. I can share folders.

How do you know you're not making the same mistake as Mozy, in terms of your pricing strategy?

Alen: The key insight really is knowledge of what costs are like in the data center. Data center costs are not driven by the hardware. They’re not driven by the hard drives that you have to buy to put in the racks. It’s driven by the ongoing operational costs of power and cooling and bandwidth systems, the backup generators and network operation centers and 24/7 staffing and on and on and on and on. All of these things are fixed operational costs. Some of them have actually even decreased. Some of them are going up in price. Energy costs are an ongoing concern in the data center. Then every year they become a bigger component in the process. You get rid of all of them. You shed all of those costs by moving the device into the consumer’s home. It’s not just that we’re pushing off those costs on the consumer. We’re doing that a little bit, but it’s vastly cheaper to running a hard drive in a consumer’s home than it is in a data center, because you don’t have to force cooled air through it. Our device doesn’t have a fan in it. It’s just ambiently cooled by the cooling system that’s already there in the home. The power use is actually really low in terms of impact on the consumers. We measured it out. It’s going to average under a dollar per month for the consumer.

Clint: At Mozy, we even looked at how we do things in order to decrease costs and then find that about 10% to 15% of the data that’s being stored costs consumers. Should we do it? We still have this huge overhead of unique data that still has to be stored. A lot of things we tried to do at Mozy to be efficient, we just couldn’t catch up. Not at the pace of consumer demand.

How did we end up doing everything in data centers?

Alen: Well, it’s easier to create a system that runs in the data center. Right now today, that’s what all the offerings are. It makes sense today that’s what people are doing.

Clint: I think an example of it would be content that is hot, something that is going viral that thousands of people are watching, this home video that’s really funny. That is an example of something that works well in data centers today. But even in that scenario, in the future, you could have hundreds of thousands of these small devices out in our network. You have CDNs--Content Delivery Networks--that could swarm thousands of views from small nodes. The future really is distributed. Over the next five to 10 years, cloud will become distributed. When you look at Google Fiber and look at what’s happening there, more and more homes being connected with faster and faster speed, it invites more of the distributed system to comes in.

At some point, will you guys be able to phase out the independent Space Monkey device and just have enough space on the user’s hard drive or do you see their data production going up so quickly that that will never reach that point?

Alen: That’s possible. The main question there really is availability of those individual devices online. As Clint was mentioning, when you develop a network like this, one of the big inputs is what percentage of devices are online at any given time. If that percentage is low and you have to pay for that in terms of making more copies of the data off the network. A lot of people have tried this in the past actually. There have been companies that have done software-only peer-to-peer distributed storage systems. A small handful of them. Almost all of them have eventually run to the big cloud, and the reason is that it’s just not a very good value prop for consumers to say I’m going to give up 10 gigs of space for every gig of space that I want to actually use in a network, because we need that extra redundancy in the network to deal with the unavailability of nodes when offline. Perhaps with smartphones that will change--they aren’t really ever "off." Perhaps you could build a network like this on top of mobile devices, [if you can solve] power consumption issues.

Clint: Routers! We could build our software to run on different platforms so that you may never have a Space Monkey device, but we do have other devices that are Space Monkey enabled. Those are all possibilities and things that we’re already beginning to work on.

As you were talking about it, I was thinking, I have one of those FreedomPop modems. I was thinking: You put a 4G radio on that Space Monkey thing and you’re killing two birds with one stone.

Alan: Absolutely. We’re very interested in modems and wireless routers. Those are great places that we could just snap a few things together, and it’s one device that the user has to have to make that happen. DVR set boxes, those make a lot of sense as well. We’ve talked to some ISPs who are really interested just from the angle of we can peer consumers who are on the same ISP together, which provides huge cost savings for them because they don’t pay for bandwidth inside their own networks. They’ll get the bandwidth when it goes off their networks. There are a lot of angles that we think we can hit long-term with this as well.

In your view, is security an issue if a business or enterprise customer wanted to use a distributed system like this?

Alen: There’s definitely a perception issue there. It’s not much different than the original perception problems that the cloud had. We saw this earlier on at Mozy where people would blog and email us about "I’ll never trust some company with some server and some data center to have a backup of my stuff." We found over time that people just said, well, it works and it’s fine. We’ve just got to be very vigilant about our security model. We don’t have the shortcut that a lot of providers in the cloud have had with security. If you followed the fiasco with Dropbox in the FCC and some of their promises they made about security that weren’t true, that scenario is not uncommon in the cloud at all. It’s like any other software project. There are deadlines and there are trade-offs that you have to make when you’re developing the software and you say we can get to that, pardon the security model later because we have to deliver this product today and we’ll make up for that by the fact that we’ve got all of the servers behind this gate inside the data center. We can’t be that vulnerable, right? Our "data center" is out in the most malicious users' hands. We’ve got to make our security model airtight from day one. We think that’s actually an advantage. It will be user-education issues that we’ll have to tackle over time.

Clint: To your point on small businesses and enterprises, we’ve definitely targeted the consumer market today as the initial beachhead. You know that we’ve already had small businesses immediately on our Kickstarter campaign pledged so that they can have a device. I think small businesses in general feel they have a higher risk tolerance than the medium-size businesses and enterprises. Can we move in to the enterprise? Yeah, there are things that we can do there to eliminate risk as well as maybe a private peered network, those types of things but not something that we’re focused on today.

When do the first Space Monkey devices ship to Kickstarter backers?

Clint: Our Kickstarter just finished and we've reached our goal already, which we’re excited about. The first set of devices in the first hundred will go out early next month and then the next 5,000 devices will start shipping in July soon.

Need a little context around this story? Read our previous coverage on The Death Of The File System.

[Image: Flickr user Jeffrey]




Add New Comment

0 Comments