2014-04-28

Co.Labs

The Five Best Libraries For Building Data Visualizations

Starting out in data visualization can be a difficult thing to do. Here’s what you need to know.



An explosion in the number of available data sources and data-processing tools means that more people than ever are jumping into the world of data visualization. But with so much to learn, it can be intimidating to know just where to start. So which library is best, and what advice do the pros have? Read on and find out.

D3

Like telling the history of personal computers without mentioning Steve Jobs, it’s impossible to talk about data visualization without talking about D3. Arguably the most dominant and important programming library in the field, D3 (short for Data Driven Documents) is an open source JavaScript library usually used to generate SVG graphics. SVG is a vector image format long supported by web browsers, but also historically underutilized.

D3’s popularity owes a lot to the sudden interest in SVG by web designers—largely because vector graphics look superb on the kind of high-resolution screens (think Apple's Retina display), which are becoming increasingly common.

"Let’s face it, for SVG-based data visualization, there is no other library that comes even close," says Moritz Stefaner, an independent data visualization authority and owner of Truth & Beauty. "There are also many interesting projects building on top of D3, such as NVD3, which provides ready-made, but very customizable standard chart components—or Crossfilter, an impressive data filtering engine.

Scott Murray, a code artist and author of the book Interactive Data Visualization for the Web, agrees with him. "D3 is extremely powerful, and its strength is that it leverages all that new browsers have to offer," he says. "The flip side of that is that if a browser doesn't support something—like three-dimensional rendering with WebGL—then D3 won’t support it either."

While D3 is a generalist library, it’s not perfect for everything, however. "The main downside to D3—if you can call it that—is that it doesn't dictate or even recommend any particular visual representation," Murray continues. "It really is a tool for loading data into the browser, and then generating DOM elements based on that data."

Vega

Although D3 is a powerful tool for custom visuals, if you want to make a standard chart without thinking too much about the visual design aspect, a tool like Vega could be for you. As a framework built on top of D3, Vega provides an alternate syntax for defining chart elements. With Vega you can describe data visualizations in a JSON format instead of writing D3/JavaScript code, and then generate interactive views using either HTML5 Canvas or SVG. This greatly simplifies the code involved, so your "time-to-chart" is much shorter—if that’s an issue that concerns you. It also makes visualizations far more reusable and shareable, while greatly improving platform flexibility.

"The potential downside is that highly customized visualizations may be more difficult to achieve with Vega or many other frameworks," says Scott Murray. "For non-standard chart types, it may still be best to write D3 from scratch."

Processing

Processing has been around for a few years already. It’s easy to get started with—since it can be simply downloaded and installed on any platform. The language itself is also very easy to learn, so that with only one line of code you’ll already have something visible on-screen.

"Processing is a programming language, development environment, and online community which makes it a wonderful environment for writing generative, interactive, animated applications," says Benjamin Wiederkehr, partner at design and technology studio Interactive Things, and editor at Datavisualization.ch.

An offshoot of Processing is Processing.js—a sort of sister project that create data visualisations using web standards and minus the need for any plug-ins. "You don't need to know JavaScript in order to get started with Processing, because Processing has its own language," says Netherlands-based data visualization expert Jan Willem Tulp. As the user, you simply write code using the Processing language—include it on your web page, and let Processing.js take care of the rest.

"A negative point might be that once you start doing more complex projects, the IDE is a bit limited," Tulp continues.

With that being said, Processing’s innate simplicity (along with its sizable community user base, who are willing to lend a hand if you get stuck) more than makes up for this weakness, and means that Processing is one of the most approachable tools when it comes to data visualization.

Gephi

If both D3 and Processing are general tools that can be applied to a range of different types of data visualization, then Gephi has a more specific purpose. Its the number one free and open source tool for network visualization. Within this, however, is a world of possibilities. Whether you’re looking to model the relationship between individuals within a company, or passes during a football game, Gephi can help visualize how two different nodes are connected together.

Like Processing, Gephi is easy to install—and once installed it’s straightforward to import your data, clean it up, and start visualizing it. "The network visualizations can [also] be exported and embedded in any web document to be used and share by your audience," says Benjamin Wiederkehr.

Dygraphs

A fast and flexible open source JavaScript charting library, Dygraphs lets you explore and interpret incredibly dense data sets. Unlike Vega, it’s highly customizable—but it also has the plus of working in all major browsers. Finally, it’s interactive out of the box. What this means is that features like zoom, pan, and mouseover are on by default, while the ability to pinch-to-zoom on mobile devices is simply icing on the cake.

Consider The Fundamentals

Even with suggestions like these, however, the data visualization world can be a daunting one for the newcomer. So what advice do the experts have?

"My first tip would be to learn as much as you can about existing tools which help you create standard charts quickly," says Moritz Stefaner. "Especially at the beginning of a project, it is super-important to generate a lot of charts quickly, in order to explore the scope, depth and ‘texture’ of the data and find interesting stories to highlight. Personally, I use Tableau and Gephi quite a bit—but there’s also CartoDB which is amazing for maps, and the just released RAW, [which is] an amazing open source tools for easily generating interesting graphs."

Also be sure the library you pick is optimized for the information you want to display.

"It’s important to look at the format of your data and ask exactly what it is that you’re working with," says Scott Murray. "Are you interested in visualizing a time period? Is it categorical data? All of this might influence your decision. Certain libraries like D3 are generalists, which can work with a range of types of data. Others are more more data-type specific, such as Gephi or Sigma.js, which is an open source tool designed for network data visualization. If you know from the beginning what it is that you’re working with, it’s important to choose the tool that works best with that data."

The Benefits Of Community

For people coming fresh to data visualization, an important consideration might be selecting a library that has an enthusiastic (and helpful!) community attached to it.

"For people just starting out, I'd suggest starting either with Processing or D3," says Jan Willem Tulp. "Both of them have a very large user base and a great number of examples you can learn from."

The online data visualization community can be great for answering questions, but it also serves to highlight one of the central paradoxes at the heart of the subject. Because of the different backgrounds of its members, some data visualizers will approach the field from an aesthetic fine arts perspective, while others might come at it from more of a statistics background. Statisticians may know data, but still be learning the basics of design. Designers may know how to create something that’s aesthetically pleasing, but still be learning about statistics.

"It’s all about a contract of sorts with your audience," says data visualizer and creative coder Erik Cunningham. "If people are expecting scientific rigor then you need to be very, very careful about the way [you choose to visualize your data]. If someone’s expecting something visually spectacular that will blow their socks off—but not be as useful for analysis—that may also be a valid contract."

So long as the data presented is accurate and truthfully visualized, neither approach is wrong—but seeking out different examples will make you examine both your own sensibilities, and those of the audience you’re presenting to. It’s not just about the trade-off between using an off-the-shelf tool to create a stand visualization, versus using the likes of D3 to create something no one’s ever seen before. Data visualization is both an art and a science—and working out that balance for yourself can only make you better.

"I would suggest [users] learn more about visualization itself," says Jan Willem Tulp. "Look at good examples, read books about it, and following some online classes. Because even though the tools help you to create good visualizations, in the end it is you who designs a visualization."

"The tools are in that sense merely a means to an end."

[Image: Flickr user Cory M. Grenier]




Add New Comment

2 Comments

  • merrily

    One easy way to get started is to use a library with your processed data. This is a great option if you're already storing your data in the cloud. Libraries like ZingChart (http://www.zingchart.com) offer the flexibility in styling of D3 but the ease of CSS-like syntax.

    Stand alone libraries offer the power of dataviz without the cost of storage and processing that might come along with something like Tableau.

  • Bálint Seres

    Hi, how come you didn't mention OpenFrameworks? It's basically processing, with a lot more horsepower a lot more to offer. Cinder is also worth to mention.. C++ really is a powerful language...