2013-07-11

Inside Google's Infinite Music Intelligence Machine

In May, Google launched a music service that will challenge Spotify and Pandora for radio domination. We asked Google research scientist Doug Eck how it works.



The Beatles have a way of keeping Doug Eck up at night. Specifically, the research scientist at Google grapples with one of the core problems of automated music discovery: With a band whose catalog is as evolutionary and nuanced as The Beatles's, how can computers truly understand the artist and recommend relevant music to fans? After all, not everybody who loves A Hard Day's Night necessarily has a soft spot for the weirdest moments on The White Album. For humans, detecting the difference is easy. For machines, it's not so simple.

Solving problems like these resolves only part of a larger, more complex equation at play when Google attempts to help its users discover music. The company's ability to do so is now more important than ever, having recently launched a Spotify-style music subscription service with Pandora-esque Internet radio layered on top. The awkwardly named Google Play Music All Access has only been live for eight weeks and naturally has some catching up to do with digital music incumbents. Music discovery is a crucial piece of that puzzle and one that's notoriously challenging to lock into place.

In taking its own stab at music recommendation, Google blends technical solutions like machine listening and collaborative filtering with good, old-fashioned human intuition. Employing both engineers and music editors, the service continually tries to understand what people are listening to, why they enjoy it, and what they might want to hear next.

How Google Music Intelligence Works

Eck's team is focused on the technical side of this equation, relying on a dual-sided machine learning methodology. One component of that is collaborative filtering of the variety employed by Netflix and Amazon to recommend horror flicks and toasters. The other involves machine listening. That is, computers "listen" to the audio and try to pick out specific qualities and details within each song.

Since All Access is an on-demand music subscription service, users can listen to it all day without ever encountering the fruits of Google's music discovery engine. But once they venture into the "Explore" tab, tap "Instant Mix," or start an artist or song-based radio station, the secret sauce kicks in. By that point, the system understands at least something about the person's taste in music. That's because the minute they start using All Access, their activity becomes part of the algorithm that helps Google understand who they are and what music they enjoy.

"We use a kind of neural network backend to do this," says Eck. "You're sort of living numerically in the same space as tracks, artists, and albums. Albums know that they're made up of tracks and artists are made up of albums and tracks. The more we understand you, the more we're able to pick up these subtle distinctions about, for example, what kind of Beatles tracks you like."

Collaborative filtering works wonders for the Amazons of the world. But since this type of If-you-like-that-you'll-also-like-this logic works better for kitchen appliances than it does for art, the system needs a way to learn more about the music itself. To teach it, Eck's team leverages Google's robust infrastructure and machine-listening technology to pick apart the granular qualities of each song.

"By and large, audio-based models are very good at timbre," says Eck. "So they're very good at recognizing instruments, very good at recognizing things like distorted guitars, very good at recognizing things like whether it's a male or female vocalist."

Curious about the future of radio? It's an ongoing story we're tracking here on Co.Labs. You can follow along on our original tracking stub here.

These are precisely the kinds of details that Pandora relies on human, professionally trained musicians to figure out. The Internet radio pioneer has long employed musicologists to listen to songs and help build out a multipoint, descriptive data set designed to place each track into a broader context and appropriately relate it to other music. For Pandora, the results have been extremely valuable, but mapping out this musical intelligence manually doesn't scale infinitely. Thankfully, machine listening has come a long way in recent years. Much like Google indexes the Web, the company is able to index a massive database of audio, mapping the musical qualities found within. Since it's automated, this part of Google's music recommendation technology can be scaled to a much larger set of data.

"If the four of us decided we were going to record a jazz quartet right here and now and we uploaded it to Play Music, our system will be aware that were talking about that," explains Eck. "By pulling these audio features out of every track that we work with, it gives us a kind of musical vocabulary that we can work with for doing recommendation even if it’s a very long tail."

Indeed, when it comes to music, the tail has never been longer. The world's selection of recorded music was never finite, but today creating and distributing new songs is virtually devoid of friction and financial cost. However much human intelligence as Pandora feeds into its algorithm, its Music Genome Project will never be able to keep up and understand everything. That's where machine learning gets a leg up.

The Limits Of Machine Listening

Still, there's a reason Pandora has more than 70 million active listeners and continues to increase its share of overall radio listening time. Its music discovery engine is very good. It might not know about my friend's band on a small Georgia-based record label, but the underlying map of data that Pandora uses to create stations is still incredibly detailed. When I start a radio station based on Squarepusher, an acclaimed but not particularly popular electronic music artist, the songs it plays are spun for very specific reasons. It plays a track by Aphex Twin because it features "similar electronica roots, funk influences, headnodic beats, the use of chordal patterning, and acoustic drum samples." Then, when I skip to the next song, it informs me that, "We're playing this track because it features rock influences, meter complexity, unsyncopated ensemble rhythms, triple meter style, and use of modal harmonies."

Pandora knows this much about these tracks thanks to those aforementioned music experts who sat down and taught it. Automated machine listening, by comparison, can't get quite as specific. At least, not yet.

"It’s very hard and we haven’t solved the problem with a capital S," says Eck, whose has an academic background in automated music analysis. "Nor has anybody else."

Computers might be able to pick out details about timbre, instruments used, rhythm, and other on-the-surface sonic qualities, but they can only dig so deep.

"You can learn a lot from one second of audio. Certainly you can tell if there’s a female voice there or if there’s distorted guitar there. What about when we stretch out and we look what our musical phrase is. What’s happening melodically? Where’s this song going? As we move out and have longer time scale stretches that we’re trying to outline, it becomes very hard to use machines alone to get the answer."

Thanks Algorithms, But The Humans Can Take It From Here.

That's where the good, old-fashioned human beings come in. To help flesh out the music discovery and radio experiences in All Access, Google employs music editors who have an intuition that computers have yet to successfully mimic. Heading up this editor-powered side of the equation is Tim Quirk, a veteran of the online music industry who worked at the now-defunct Listen.com before Napster was a household name.

"Algorithms can tell you what are the most popular tracks in any genre, but an algorithm might not know that "You Don't Miss Your Water" was sort of the first classic, Southern soul ballad in that particular time signature and that it became the template for a decade's worth of people doing the same thing," says Quirk. "That’s sort of arcane human knowledge."

Google's blend of human and machine intelligence is markedly different from Pandora's. Rather than hand-feeding tons of advanced musical knowledge directly into its algorithms, Google mostly keeps the human-curated stuff in its own distinct areas, allowing the computers to do the heavy lifting elsewhere. Quirk and his team of music editors are the ones who define the most important artists, songs and albums in a given genre (of which there are hundreds in Google Play Music).

"We have a lot of experts in every genre whose responsibility is to say 'Okay. If you want to understand the history of the genre and how its sound evolved, these are the 10 to 25 or 50 tracks," Quirk explains. "You have to listen to them in order to understand that particular genre and the albums that really define that genre. Then Doug's team can take those singles and say "Okay, if these 25 artists and songs define classic soul, then I now know more about classic soul than I knew before and I can go do amazing things with that.'"

Quirk's team also creates curated playlists and make specific, hand-picked music recommendations. To the extent that these manually curated parts of the service influence its users' listening behavior, the human intelligence does find its way back into the algorithms. It just loops back around and takes a longer road to get there.

Google's employees aren't the only people feeding intelligence into this semiautomated music machine. Google is also constantly learning from its users. Like Pandora and its many copycats, Google Play Music's Radio feature has thumbs up and thumbs down buttons, which help inform the way the radio stations work over time. In fact, those buttons are found next to every on-demand track on the service, whether it's coming from a personalized radio station or not. The more I tap those buttons, the more Google knows what I'm truly into. The hope is, of course, that millions of people will flock to this service and feed it ever greater troves of data - through plays, thumbs-upping and uploading their own collections to the service (a feature that truly sets it apart from the incumbents).

Pandora versus Google’s Internet Radio

The radio part of Google Play All Access is pretty good, as Internet radio services go. At times, the inherent limitations of collaborative filtering are on full display. As an example, I made a station based on "Mayonaise" by The Smashing Pumpkins. It's a slow-paced shoegaze-y song with heavy fuzz on the guitars. It sounds more like something off of an album by My Bloody Valentine than any of the '90s grunge rock acts with whom the Smashing Pumpkins shared the airwaves in the mid-90s. Yet, the Mayonaise radio station presents a predictable list of popular radio bands from the 90s, including Nirvana, Foo Fighters, Alice in Chains and Rage Against the Machine. The list makes sense to some extent: These are indeed related artists, but some of the songs feel truly mismatched. Mayonaise sounds nothing like "Down Rodeo" by Rage Against the Machine, an angry, more hyped-up rap-rock track with an entirely different vibe. I don't mind the song, but it wasn't what I had in mind when I was listening to "Mayonaise" and decided "I want to hear more music like this." I would have been better off digging into my own catalog and making a playlist.

To test things further, I went through and made radio stations off of different tracks by The Smashing Pumpkins. Some loud and aggressive. Some slow and spaced out. They all yielded songs from the same list of about a 15 alternative rock bands from the '90s.

To be fair, Pandora serves up similarly predictable results for its own "Mayonaise"-based station. It feels like it has less to do with the sonic qualities of each song than with what others who listen to The Smashing Pumpkins also listen to. They were a huge radio band from the '90s. People who listen to their stuff also listen to lots of other popular alternative rock bands from the same era. Virtually all of those people are familiar with Nirvana and Rage Against the Machine. A much smaller number know who My Bloody Valentine is, even if the band was a big influence on The Smashing Pumpkins. Large-scale collaborative filtering has a way of missing fine details like that.

Google is aware of the "related artists" problem. Like The Beatles issue, it's something they think hard about.

"There’s lots of different signals that my team can feed back into Doug’s team," says Quirk, referring to his small army of music editors. "At the artist level, that has to do with saying "Okay, this is the artist, this is who influenced them. This is who they influence and this is who their contemporaries are." And knowing in which context you want to favor one or another pieces of that metadata. Not all similar artists are created alike."

In general, Pandora still seems to do a better job of creating song-based radio stations than Google. There's something to be said for hiring living, breathing experts to sit down and describe music in a way that teaches machines how to better DJ for us. At the same time, by letting the machines do the listening, Google has the potential to solve the scalability problem. And by letting people curate a few parts of its music experience, Google can indirectly infuse the code-driven part with human smarts over time, especially as more people use the service.

Some day, computers will be better able to understand not just that I like The Beatles, but why I like The Beatles. They'll know that John is my favorite, which songs on Revolver I skip, and that of all the heavily Beatles-influenced bands in the world, I love Tame Impala, but loathe Oasis. The machines will get smarter, much like a toddler does: by watching and listening. As users spend time with the product and tap buttons, the nuanced details will become more obvious.

Meanwhile, the ability of computers to actually hear what's contained within each song can only improve over time, just as voice recognition continues to do. The promise of Google Play Music is the same thing that made Google successful to begin with: its ability to use massive data to understand who we are and what we want. If anybody can crack the notoriously hard nut of music discovery in a hugely scalable fashion, it's them. Just don't expect them to do it with machines alone.




Add New Comment

4 Comments

  • jasonhrt

    A couple comments...
    1. The Beatles are not even on Google Music...yet.
    2. The recommendation feature is fantastic, I've been using it since May or June
    3. Google Music scans all of my mp3s on any computer I use when I log in to grab all content not natively on their servers and delivers them for the appropriate stations. I just heard a bootleg song that I recorded myself.
    4. I'm excited where this is headed next. 

  • AB

    "Some day, computers will be better able to understand not just that I like The Beatles, but why I like The Beatles."

    Huh, that's exactly what Pandora does.

    "Pandora knows this much about these tracks thanks to those aforementioned music experts who sat down and taught it."

    Experts ? For millions songs ? It's an algorithm, no expert.

    "Automated machine listening, by comparison, can't get quite as specific. At least, not yet."

    Yes it does. because it's actually a machine, not "experts".

  • Venkat Subramanian

    Brilliant article. Music discovery is a tricky space. It is not so much about listing a group of artists/bands based on a search string but feeding listeners with the kind of music they like. That's the reason very few have mastered it. Great insight overall
    http://www.iapploud.com

  • Greg Bailey

    This article offers a great comparison between the two services and shows the complexity involved in determining musical tastes.  I don't pretend to know every facet of the process, but Pandora does a credible job of playing songs that pique my curiosity.  I haven't explored Google's service, but I'll try it as well.  It's heartening to know humans are still required to hone the patterns.

    My only dissent?  Paul is my favorite. ;-)