An explosion in the number of available data sources and data-processing tools means that more people than ever are jumping into the world of data visualization. But with so much to learn, it can be intimidating to know just where to start. So which library is best, and what advice do the pros have? Read on and find out.
D3’s popularity owes a lot to the sudden interest in SVG by web designers--largely because vector graphics look superb on the kind of high-resolution screens (think Apple's Retina display), which are becoming increasingly common.
“Let’s face it, for SVG-based data visualization, there is no other library that comes even close,” says Moritz Stefaner, an independent data visualization authority and owner of Truth & Beauty. “There are also many interesting projects building on top of D3, such as NVD3, which provides ready-made, but very customizable standard chart components--or Crossfilter, an impressive data filtering engine.
Scott Murray, a code artist and author of the book Interactive Data Visualization for the Web, agrees with him. “D3 is extremely powerful, and its strength is that it leverages all that new browsers have to offer,” he says. “The flip side of that is that if a browser doesn't support something--like three-dimensional rendering with WebGL--then D3 won’t support it either.”
While D3 is a generalist library, it’s not perfect for everything, however. “The main downside to D3--if you can call it that--is that it doesn't dictate or even recommend any particular visual representation,” Murray continues. “It really is a tool for loading data into the browser, and then generating DOM elements based on that data.”
“The potential downside is that highly customized visualizations may be more difficult to achieve with Vega or many other frameworks,” says Scott Murray. “For non-standard chart types, it may still be best to write D3 from scratch.”
Processing has been around for a few years already. It’s easy to get started with--since it can be simply downloaded and installed on any platform. The language itself is also very easy to learn, so that with only one line of code you’ll already have something visible on-screen.
“Processing is a programming language, development environment, and online community which makes it a wonderful environment for writing generative, interactive, animated applications,” says Benjamin Wiederkehr, partner at design and technology studio Interactive Things, and editor at Datavisualization.ch.
“A negative point might be that once you start doing more complex projects, the IDE is a bit limited,” Tulp continues.
With that being said, Processing’s innate simplicity (along with its sizable community user base, who are willing to lend a hand if you get stuck) more than makes up for this weakness, and means that Processing is one of the most approachable tools when it comes to data visualization.
If both D3 and Processing are general tools that can be applied to a range of different types of data visualization, then Gephi has a more specific purpose. Its the number one free and open source tool for network visualization. Within this, however, is a world of possibilities. Whether you’re looking to model the relationship between individuals within a company, or passes during a football game, Gephi can help visualize how two different nodes are connected together.
Like Processing, Gephi is easy to install--and once installed it’s straightforward to import your data, clean it up, and start visualizing it. “The network visualizations can [also] be exported and embedded in any web document to be used and share by your audience,” says Benjamin Wiederkehr.
Even with suggestions like these, however, the data visualization world can be a daunting one for the newcomer. So what advice do the experts have?
“My first tip would be to learn as much as you can about existing tools which help you create standard charts quickly,” says Moritz Stefaner. “Especially at the beginning of a project, it is super-important to generate a lot of charts quickly, in order to explore the scope, depth and ‘texture’ of the data and find interesting stories to highlight. Personally, I use Tableau and Gephi quite a bit--but there’s also CartoDB which is amazing for maps, and the just released RAW, [which is] an amazing open source tools for easily generating interesting graphs.”
Also be sure the library you pick is optimized for the information you want to display.
“It’s important to look at the format of your data and ask exactly what it is that you’re working with,” says Scott Murray. “Are you interested in visualizing a time period? Is it categorical data? All of this might influence your decision. Certain libraries like D3 are generalists, which can work with a range of types of data. Others are more more data-type specific, such as Gephi or Sigma.js, which is an open source tool designed for network data visualization. If you know from the beginning what it is that you’re working with, it’s important to choose the tool that works best with that data.”
For people coming fresh to data visualization, an important consideration might be selecting a library that has an enthusiastic (and helpful!) community attached to it.
“For people just starting out, I'd suggest starting either with Processing or D3,” says Jan Willem Tulp. “Both of them have a very large user base and a great number of examples you can learn from.”
The online data visualization community can be great for answering questions, but it also serves to highlight one of the central paradoxes at the heart of the subject. Because of the different backgrounds of its members, some data visualizers will approach the field from an aesthetic fine arts perspective, while others might come at it from more of a statistics background. Statisticians may know data, but still be learning the basics of design. Designers may know how to create something that’s aesthetically pleasing, but still be learning about statistics.
“It’s all about a contract of sorts with your audience,” says data visualizer and creative coder Erik Cunningham. “If people are expecting scientific rigor then you need to be very, very careful about the way [you choose to visualize your data]. If someone’s expecting something visually spectacular that will blow their socks off--but not be as useful for analysis--that may also be a valid contract.”
So long as the data presented is accurate and truthfully visualized, neither approach is wrong--but seeking out different examples will make you examine both your own sensibilities, and those of the audience you’re presenting to. It’s not just about the trade-off between using an off-the-shelf tool to create a stand visualization, versus using the likes of D3 to create something no one’s ever seen before. Data visualization is both an art and a science--and working out that balance for yourself can only make you better.
“I would suggest [users] learn more about visualization itself,” says Jan Willem Tulp. “Look at good examples, read books about it, and following some online classes. Because even though the tools help you to create good visualizations, in the end it is you who designs a visualization.”
“The tools are in that sense merely a means to an end.”
[Image: Flickr user Cory M. Grenier]