2014-04-07

Co.Labs

Why Facebook Invented A New PHP-Derived Language Called "Hack"

Instead of throwing out years of legacy code, Facebook built a new branch of the language that originally underpinned TheFacebook.com. Here's the story behind a two-year labor of love.



When Mark Zuckerberg’s Harvard classmates first logged in to TheFacebook in February 2004, the site’s servers ran PHP, which had beat out Perl to become the hottest language on the web.

Using a now-popular framework like Ruby on Rails or Django wasn’t an option—Rails’ first public release was a few months later, and Django wasn’t unveiled until the following year. A decade later, PHP’s been widely derided for having a sprawling library of inconsistently named and defined built-in functions, syntax and semantics just different enough from related languages to confuse multilingual programmers, and a history of design decisions that made it easy to write insecure code.

“Every PHP programmer is familiar with day-to-day tasks that can be tricky or cumbersome,” Facebook developers Julien Verlaguet and Alok Menghrajani recently wrote on the company’s engineering blog.

But PHP hasn’t gone away—Facebook and other big organizations and projects have millions of lines of code written in the language, and programmers still appreciate it for rapid development and deployment, even as they try to steer clear of its messier features.

To ease the pain of PHP programmers without making them abandon the language and years of software development, Facebook developed Hack, a new, PHP-derived language that’s largely compatible with existing code and augmented with new safety features derived from functional programming languages and academic research.

“It has been specifically designed to interoperate seamlessly with PHP,” says Verlaguet, the technical lead on the Hack project, whose background includes a mix of formal academic study of programming languages and industry experience. Facebook’s been using and developing Hack internally for about two years, and has recently made the project open source and scheduled a public “developer day” for April 9.

"What we're doing is basically making Hack available out there to hopefully gather feedback from the community, and work with the open source community to make Hack a good experience for people outside Facebook," says Verlaguet.

Perhaps chief among Hack’s innovations is the introduction of automatic type inference, a concept familiar to users of more esoteric programming languages such as Haskell and ML but less common in more mainstream languages.

Traditional PHP is dynamically typed, meaning the basic nature of a variable used in code—that is, whether it’s a number or a text string or some other type of data—isn’t specified until the program’s actually running. Programmers enjoy that flexibility, but it creates room for errors that aren't possible in statically typed languages like Java or C, where the type of each variable is explicitly defined when code is written.

Hack takes a middle road: It lets programmers specify the types of some variables in their code and uses logic to infer the rest based on how variables are used together, issuing an error if the code’s logically inconsistent. That concept itself isn’t new, but it’s previously been used in compiled languages, where programmers are used to waiting for their source code to translate into a form executable by the machine, and not in languages like PHP where programmers expect their code to be executable as soon as they hit save, says Verlaguet.

“The solution lies in building the type checker as a daemon,” he says, referring to a background process that runs on a developer’s computer. Instead of waiting for the programmer to explicitly run a compiler, the type checker asks the operating system to notify it when source code files have changed, similar to how services like Dropbox get signaled when a synced file needs an update.

"The kernel event that says that a file has changed is the starting point," he says. “Then, the new file is processed, and once the new file is processed, the two versions are compared to deduce what must be rechecked at a very fine-grained level: at the method level, not at the file level.”

Individual methods that have changed are re-examined by the type checker, which makes sure they’re still consistent with what it already knows about the rest of the code. Looking only at what’s actually changed makes the type-checking process fast enough that programmers don’t have to wait for it to run, even when they switch to new branches in a version control system like Git, Verlaguet says.

Hack also introduces other new features, like enhanced collection types such as vectors and sets to augment the PHP array and better support for short, anonymous functions used in functional programming. The new language lets Facebook gradually update its existing PHP codebase and still benefit from its longtime investment in PHP, says Ed Smith, the technical lead on Facebook’s HHVM runtime engine, which will now support both Hack and PHP.

“Hack enables us to dynamically convert our code one file at a time," Smith says. "Switching to another language would be a lot more difficult."

It’s too soon to say which other companies and projects will jump on the Hack bandwagon, with the project just made open source, Verlaguet says, though he notes the reception so far has been positive.

[Image: Flickr user Bull3t Hughes]




Add New Comment

0 Comments