2013-07-22

Inside The Tech Stack Digg Used To Replace Google Reader

Digg’s dev team had a few months to build a capable replacement for Google Reader. Digg's CTO gave us a peek under the hood at the development tools they used to do it.



When the team at Digg learned about the impending demise of Google Reader, they knew they had to act fast--and build something killer. The resulting void would leave millions of potential users on the hunt for something new, and the recently Betaworks-acquired Digg was unusually well-positioned to build a product dedicated to reading things. But with a limited time frame in which to create Digg Reader, the team had to limit the product's scope and choose its tech tools wisely.

Leading the charge on the technical side at Digg is CTO Mike Young, who gave us a glimpse behind the scenes at the tech stack and development frameworks used to replace a beloved, eight-year-old service in a matter of months.

Keeping Things Scalable With Amazon's Cloud

In 2013, rolling one's own infrastructure might be a laudatory technical feat, but for most, not worth it when super-reliable cloud hosting is so readily available. Not surprisingly, Digg is leaning on Amazon Web Services (AWS) for its hosting and a few additional infrastructural needs.

"We currently have a mix of infrastructure/services that we built ourselves over the last year for Digg.com as well as some of the AWS-hosted services for storing data (DynamoDB) and queueing (SQS)," says Young. "Since we had such a short time frame to build Digg Reader we had to lean heavily on some of the hosted AWS services, like DynamoDB, versus rolling our own."

Here's a breakdown of what they're using:

  • Amazon Web Services for hosting and content delivery. Young says they're fully hosted on AWS, running most of Digg Reader off of instances of Amazon's Elastic Compute Cloud (EC2).
  • DynamoDB. Young's team uses Amazon's NoSQL database solution rather than building their own "since we had such a short time frame to build Digg Reader," he says.
  • Amazon’s Simple Queue Service (SQS) is used for queuing messages between components of Digg's Amazon-powered backend. For those unfamiliar with message queues, Wikipedia offers a pretty thorough primer.
  • Amazon Route 53 is used for DNS management. The service doesn't offer domain names, but rather lets users control the DNS settings then map the domains and subdomains to specific IP addresses and other standard DNS settings.
  • The AWS Elastic Load Balancer is necessary for sites expecting as many sudden visitors as Digg Reader was. It smartly distributes all that inbound traffic among Digg's EC2 instances for maximum fault tolerance and stability.
  • Amazon's Simple Storage Service (S3) and Cloudfront CDN are used for storage and delivery of images, JavaScript, and CSS.

"I don't think we would have been able to pull this off in the time frame we had without AWS and the ability to scale up a large number of machines in short period of time," says Young.

More Fun On The Backend of Digg Reader

"One of the things that was so amazing about Google Reader was how fast it was, both in serving up content when you loaded the page or paginated through feeds, but also on the feed aggregation/crawling side," Young says. "We spent some time talking with the original Google Reader team when we first started the project, and they were kind enough to give us some great insight into the product and infrastructure."

In addition to the Amazon's hosting and services, Young's team cobbled together a number of other backend engineering tools and techniques to get Digg Reader to run as quickly and responsively as possible.

"The backend is written in Python and is pretty standard in terms of the stack that we use," he says.

Here's a breakdown of the other backend tools they're using:

  • Python is the backend programming language of choice at Digg. The general purpose scripting language is used by everyone from Dropbox to NASA and is generally quite popular, so this is no surprise.
  • Memcached, redis, and twemproxy are used for caching. Redis and memcached are popular open source key-value stores designed to speed up web apps by lightening the load on databases. Twemproxy is a light-weight proxy for memcached and redis created by developers at Twitter.
  • MongoDB. In addition to Amazon's DynamoDB, Digg uses the uber-popular NoSQL database system MongoDB.
  • Tornado is the Python Web framework and networking library of choice at Digg Reader. Originally developed by FriendFreed, Tornado "can scale to tens of thousands of open connections, making it ideal for long polling, WebSockets, and other applications that require a long-lived connection to each user."
  • Beanstalkd. To keep processes moving in the background, Digg uses the work queue Beanstalkd to run tasks asynchronously and keep the user experience as speedy and interruption-free as possible.

"For code deployment and server monitoring, we are using a mix of tools right now: fabric, statsd + graphite, Amazon's Cloudwatch, sentry, munin, nagios, and pager duty," says Young. "And Chartbeat, of course."

"Gilad Lotan, the Betaworks Data Scientist, has built a system that allows us to score any URL based on a number of signals like tweets, Facebook shares, Diggs, etc.," Young explains. "This is currently shown in the list of 'Popular' stories in Digg Reader. He uses a mix of redis, memcache, zeromq, and hypertable."

Styles, Libraries and Frameworks: Making the Front-End Shine

The front-end of Digg Reader is, as Young puts it, "built with Javascript, CSS, and a lot of love."

"I'm really proud of what [design director] Justin Van Slembrouck and the dev team pulled off in such a short amount of time," says Young. "There is a lot left to do, but it's really exciting. We are really just getting started with this."

One of the team's biggest challenges was ensuring the Reader experience was as fast and responsive as possible. This isn't easy when you're aggregating and storing more than 8 million feeds and doing all kinds of heavy-lifting on the backend to make everything feel effortless. Thankfully, for the front-end guys, there's no shortage of frameworks and tools to help patch together something that meets users' not-always-patient demands.

"Jon Ferrer and Kevin Barnett have done a great job in making the site feel very responsive," says Young. "One of our big goals for launch was to make the "All" feed feel very fast for users. We wanted users to be able to scroll through page after page of their "All" feed and have it feel very smooth. Jon is using some tricks in terms of loading and then showing the data to make sure the scolling and pagination of the All feed feels smooth."

Here's a breakdown of what powers the front-end:

jQuery, the wildly popular JavaScript framework, is used by the Digg Reader team to simplify scripting tasks and just generally make life easier.

Backbone.js is huge. The RESTful JSON Javascript library is used on a wide variety of huge sites and web apps from Pandora and Hulu to LinkedIn and Pinterest. Digg joins the long list of prominent web properties that use it to make script-wrangling easier.

Require.js and r.js optimizer are used to load and manage dependencies between JavaScript modules.

Uglify.js is described as a "parser / compressor / minifier / beautifier" for JavaScript code. In other words, it takes often unwieldy-looking JavaScript and crunches it down into something cleaner and lighter weight.

Lo-Dash is a low-level utility library for JavaScript developers that enables customization, improves performance, and offers additional features. It's similar to underscore.js, which Digg also uses for JavaScript tempting.

LESS is a compiler that helps developers extend what's possible with Cascading Style Sheets (CSS), bringing a programmatic flavor to the code that powers how websites look and feel.

Node.js and Git pre-commit hooks are used to build out Digg's production CSS and JavaScript from their source file.

"Other than that, it's pretty much just GitHub, Asana, Hipchat, and code editors," says Young. "The old guys (like me) use Vi while the younger guys use fancy new tools like CodeKit and Sublime Text that I think pretty much write the code for you! Kids!"

[Image: Flickr user Johann Snyman]




Add New Comment

3 Comments