In the latest post to the HHVM (HipHop VM) blog Sara Golemon recounts the journey of a thousand bytecodes and the process that it takes to decompose a PHP file and optimize it for execution in the HHVM environment.
Compilers are fun. They take nice, human readable languages like PHP or Hack and turn them into lean, mean, CPU executin’ turing machines. Some of these are simple enough a CS student can write one up in a weekend, some are the products of decades of fine tuning and careful architecting. Somewhere in that proud tradition stands HHVM; In fact it’s several compilers stacked in an ever-growing chain of logic manipulation and abstractions. This article will attempt to take the reader through the HHVM compilation process from PHP-script to x86 machine code, one step at a time.
The process is broken down into six different steps, each with a description and some code examples where relevant:
- Lexing the PHP to get its tokens
- Parsing the token results into an AST (and optimizing it along the way)
- Compilation to Bytecode
- HHBBC Optimization
- Intermediate Representation
- Virtual Assembly
- Emitting machine code