2013-08-19

Centralized Cloud Companies Are Betting Against History

One of the most important features of the Internet is the reliable transfer of data from one place to another. It may seem paradoxical, then, that the best way to achieve that goal is to make sure the network itself isn’t centrally responsible for reliability.



The creators of the Internet learned a counterintuitive lesson from the early Arpanet, the primary predecessor of today’s Internet. Like the Internet, Arpanet was composed of multiple computing devices connected together. Getting a message from point A to point B required pushing that message along several intermediate network hosts. Each Arpanet node guaranteed delivery by holding onto messages until the next node acknowledged receipt.

At first, centralizing these guarantees in the network itself seemed like a really good idea: A sender simply had to fire off a message and trust that the network would deliver it. In practice, message delivery was unreliable. In retrospect it’s not hard to see why: As the network grew, the number of intermediaries increased--and so did the probability that any one of them would fail, breaking the entire chain of message hand-offs.

The solution? Decentralize delivery. Make the endpoints in the network responsible for delivery across unreliable nodes, make the sender responsible for retransmitting data until it arrives where it needs to go, make the recipient responsible for acknowledging that it got the message, and let the network itself be blissfully ignorant of point-to-point success or failure. The result of this understanding is TCP/IP, the protocol that serves as the workhorse of today’s Internet.

The protocol employs what has become known as the End-to-End Principle, a distributed system design principle that offloads functionality from intermediaries to the endpoints of a network: Senders and receivers. Because TCP/IP uses this decentralized model, the Internet can successfully function even when fiber optic cables are severed, core routers malfunction, ISPs drop service to their end customers, servers blow up, and entire datacenters cease to function. In fact, the modern Internet has never fully “gone down.”

Over the years, many Internet services have employed this same decentralizing pattern. The World Wide Web itself could be viewed as a decentralized version of pre-Internet services like AOL, CompuServe, and Prodigy. Today, no central authority controls the creation of a web page or a website, which is in direct contrast to the gatekeeper role traditionally performed by those services. The World Wide Web spawned an entire decentralized publishing ecosystem, within which thousands of companies have flourished and from which a new era of democratized communication has changed our world.

Similarly, the Domain Name System (DNS), which serves as a sort of “phone book of the Internet” by mapping IP addresses to human-readable names, began as a highly centralized service. In the very early days, a single computer at SRI International contained the canonical HOSTS.TXT file--a full global listing of hostname mappings--which all participating computers on the network would download and use to resolve domain names. As the network grew, this centralized lookup mechanism quickly became untenable; with each new Internet participant, the HOSTS.TXT file grew in size, quickly becoming large and unwieldy. And because distribution from SRI became a bottleneck, nodes on the Internet consequently did not always keep up to date with the latest revisions, resulting in conflicting entries and stale information that impacted correct functioning of the network.

Modern DNS was designed to address these problems. It decentralized HOSTS.TXT by delegating responsibility for segments of the domain name lookup space to interested parties. Any participant on the network can cache parts of this lookup space, and anyone can run a DNS server on the Internet without permission or authority. The administration of DNS is still hierarchical in nature, but the implementation has evolved towards decentralization at a steady march since its inception.

At the top of the DNS hierarchy are Root Name Servers. Initially, there were 13 of these, and they were all located in the United States and administered by one central authority. As of February 2013, there were 359 root servers spanning the globe, operated by 13 independent organizations. These root name servers are vital to the correct operation of domain name lookup on the Internet, but because of the decentralized caching design of the system, lookups of existing entries will continue to succeed even if all of the root servers were to go offline temporarily.

The history of email reveals a similar pattern. Email existed prior to the Internet as we know it today, with different organizations using a number of different email systems for exchanging electronic mail. Most of these were completely incompatible with each other. Some relied on centralized services or authorities for correct operation. To glue these systems together, Jon Postel proposed Simple Mail Transfer Protocol (SMTP). SMTP defines no central email authority, no central service that hands out the rights to providing email. Anyone can add an email server, anywhere. Because of this, email never fails. Yes, email servers may disintegrate, network outages may prevent specific routes of email from being delivered temporarily, and individual email providers might disappear. However, the Internet’s email system as a whole has never sent out a “service unavailable” message.

Internet history says most services become more reliable when they are decentralized, especially where reliability, efficiency, and robustness are concerned. In many ways, however, the cloud as it exists today runs counter to these decades of learning. Most cloud services are pleading with us to re-centralize our data and services, to trust them to a single location in a datacenter. Some companies--among them BitTorrent, Spotify, and our company, Space Monkey--are trying to prove that commercial services can succeed with decentralized architectures. As the cloud continues to mature, we will again start to see the trade-offs and benefits of decentralization become apparent. Our bet is with history.

Clint Gordon-Carroll and Alen Peacock are the cofounders of Space Monkey.

[Image: Flickr user Steews4]