Who internally uses the projects you’re building?
Our entire Explore team is constantly chomping at this data and our analytics team is just in love with this. I hooked up our geographic data yesterday to DMAs and they were thrilled. I also hooked up our venue data to get a better sense of the U.S. metros.
What is your role, and how many people do you have working on maps here at Foursquare?
I'm the primary developer of a lot of our geographic infrastructure. There are three people who touch geo here. Everyone has a slightly different way that they think about how the geographic data works. We're also working with a well-regarded open source cartography guy, who is helping us make sense of the world. I've learned a lot about how the U.S. fits together, but his brain is able to hold how the whole world fits together. He understands that there was a law passed in England that changed the way that cities and counties and states fit together 10 years ago and how this influences the way that we're going to draw and model cities.
A lot of your geographic infrastructure consists of things you built and maintain internally. But you of course can’t do it all. What are some outside tools you use to build out Foursquare’s maps and geolocation functionality?
We use a really awesome open source library from Google called S2. It's just a really great geographic spherical geometry library. It's very good at drawing these boxes. What S2 lets us do is it overlays a multi-level grid on the world. This is a really great abstraction. Even just tiling the whole world at one constant level is a nice way to divide up the world. You don't need polygon data to work with S2. It just covers the whole world. We've built tooling on top of it. S2Map is a site that I wrote so we could go visualize coverings and visualize the data. A lot of our Explore technology is currently rooted in S2. We can also do things like take all your check-ins and build an S2 covering of that. Now we have a rough approximation of where you've been. We can name each of these cells. They're easier to save in the database. We use S2 quite a bit.
Check-ins from around the whole world is a pretty big data set. How do you break it down?
With MapReduce, you can break up a big task into small tasks. This is a really great way to break up any world-sized tasks into small tasks. Each reducer is one bucket and you're seeing the buckets. Each of these cells you can think of as a bucket. This is actually how we do some geographic processing right now. We take the polygon, we do this S2 covering, and then we put those into our reduce buckets. And then we take check-ins or venues and we take the S2 covering of those and we put them into the same buckets. And now I have matched up venues and polygons in my reduce bucket. It's a great, efficient library. It's very high level. It's solely for building geographic databases.
I imagine you get geographic data from a variety of sources. Which ones are the easiest to work with?
The U.S. government is really amazing. It gives out this TIGER project, which is part of the census. It's basically a full digital map of the United States. It's every road, every city, metros, these things that are sort of like Zip codes, just a ton of data. If [Foursquare] was just the U.S. our job would be done. There are other open geographic data repositories. There are these guys Natural Earth. It's a project that grew out of some cartography that happened at the Washington Post. Honestly, every geographic visualization you see, it's likely that the outlines of the countries you're seeing are the Natural Earth countries. They're a set of country and state polygons, basically for printing, at the largest, at wall size.
What programming languages do you guys use?
Foursquare is a Scala shop. Scala has a really great benefit of being built on the JVM [Java Virtual Machine] so we get to pull in all the Java libraries that we want. And there are some really great Java libraries for geographic work as well. The Java topology suite is a standard one. Most of what we do is in Scala Java. The PostGIS database is a really popular tool in a lot of geo work. We're migrating away from it in favor of code that we've written. The project we're working on to create city polygons for the world is all built on PostGIS. It's an easy way to take geographic data, stick it in a database, query it with geographic primitives. When you start with Open Street Map, you download planet.osm and then the first thing you do is you spend 12 hours just putting it into a PostGIS database. That lets you treat geographic data as a database. And then we use the Geonames project, and we use our own homegrown suite of tools: our geocoder, our reverse geocoder, and then the various Map Reduces and other data infrastructure we built on that.
What else do you use to build on PostGIS?
We have a geographic data repository. Before you build your geocoder you want to get all of your data into a nice, standardized format that things like a MapReduce can run over. So we've built MapReduces on top of all of our geographic data as well. Here’s another fun tool we built that we'll probably open source when we have some time: There’s a standard interchange for geographic data called Shapefiles. This [program] reads any Shapefile and then writes back to a Shapefile, which means you can also visualize it in the standard suite of geographic tools. This is called quantum GIS (qGIS). It's *the* open source suite for working with Shapefiles, which is how most geographic data is exchanged.
Most of this data is going toward building Foursquare's geographic infrastructure as it exists today. What are some other interesting things you wind up doing with that data?
The more we have the data, the longer the data sticks around, the more use we find for it. It just starts finding its way into every database and we have all these tables of aggregations: what cities our users have been to, what venues they've been to, what artists they've seen. One of the fundamental things has been how we build the user geo aggregation. This is letting us do that. We can start thinking about cities you've been to before and places you’ve been to before and say, “Hey, you should explore this other place today.” That'd be great, but we need the understanding first.
What kind of experimental apps are you thinking of building internally?
I've been thinking: Can we build a tool that lets us crowdsource neighborhoods? Because neighborhoods are so weird. Let's just ask people. Let's just build a phone app. We're thinking about web UI's for it. We're thinking about doing a cut of neighborhoods with Foursquare and Flickr data and then seeding those and then people hopefully over time can refine it. I just was looking at an open source map toolkit online that's pretty good at drawing polygons. This is basically all I need. I just need to ask people, do you want to help me with Williamsburg? And then just start drawing. The MapBox guys have this huge commitment to satellite data. And satellite data is cool. It's super-open. A lot of Open Street Map is drawn on top of satellite data. So I was thinking this morning, I should call up the MapBox folks, we'll run this on top of MapBox satellite and start drawing neighborhood polygons.
It sounds like you need the help of the hive-mind. How else can the community pitch in?
Lots of people think about this. There's Livehoods, which tries to figure out not necessarily the neighborhoods that you and I think of, but they're taking Foursquare check-ins and trying to figure out the neighborhoods that people live in. Where are things clustered? You might think the Upper West Side is like this, but really it bleeds into Inwood. They're looking at where people go between Foursquare check-ins. There's a project out of Boston that's doing crowdsourced neighborhoods totally separate from Foursquare. I would love more hours in my day to both be building out our geographic data and the open geographic data. The best we can do is help each other. It's too big of a problem. I need people in Russia and France thinking about places in France and Russia. I just can't hold every city in the world in my head and how they fit together.