PHPDeveloper: PHP News, Views and Community

Subscribe

@phpdeveloper.org

News Archive

Community News: Latest PECL Releases (04.22.2025)

Community News: Latest PECL Releases (04.15.2025)

Community News: Latest PEAR Releases (04.14.2025)

Community News: Latest PECL Releases (04.08.2025)

Community News: Latest PEAR Releases (04.07.2025)

Community News: Latest PECL Releases (04.01.2025)

Community News: Latest PEAR Releases (03.31.2025)

Community News: Latest PECL Releases (03.25.2025)

Community News: Latest PECL Releases (03.18.2025)

Community News: Latest PEAR Releases (03.17.2025)

Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Joe Watkins:
Hacking PHP 7

byChris Cornutt Mar 16, 2016 @ 15:16:38

In this post to his site PHP (core) developer Joe Watkins talks about "hacking PHP 7" based on two screencasts he's made on the subject.

Writing extensions is fun, but it's not as fun as hacking PHP. So, we're going to focus on hacking, we're going to imagine that we are introducing some new language feature, by RFC.
Without focusing on the RFC process itself, you need to know which are the relevant parts of PHP you need to change, in order to introduce new language features. You also need to know how PHP 7 works, about each stage of turning text into Zend opcodes.

After talking a bit about some of his thoughts and troubles with screencasting in general he looks at "The Beginning" of PHP's translation from text to functionality: the lexing. He introduces the basic concept around how a lexer works and how it migrates the pieces over to tokens. He then starts in on the parsing of these tokens and, finally, the AST (abstract syntax tree) resulting from the combination of these pieces, executed against a piece of code.

With that out of the way, he starts in about the "hack" - a hipster expression that only works with strings and throws an exception otherwise. He shows the pieces he had to edit to create this new expression and it's matching token/AST node.

Anthony Ferrara:
Prefix Trees and Parsers

byChris Cornutt May 19, 2015 @ 15:13:18

Anthony Ferrara has a new post, following up from his previous look at tries and lexers, continuing along the path to apply what he learned to a HTTP routing system.

In my last post, Tries and Lexers, I talked about an experiment I was doing related to parsing of JavaScript code. By the end of the post I had shifted to wanting to build a HTTP router using the techniques that I learned. Let's continue where we left off...

He starts off with thinking that lexing and parsing the routes out into their respective tokens instead of breaking them up as many do (i.e. splitting on the slashes). He shows the results of this lexing and some parser code to handle these results and turn them into something useful. He did find that the current setup caused a lot of overhead (255 new states per character) so he optimizes the processing with a "default" trie but it was still pretty intensive.

He decided to go a different way at this point, opting for the radix tree structure instead. He includes the implementation of this tree for parsing the routes and his matching lexer updates. Finally he shows how to apply code generation to the results of these changes and how coming back to the "slash splitting" could help...

tagged: lexer parser example prefix tree radixtree route matching slashes

Link: http://blog.ircmaxell.com/2015/05/prefix-trees-and-parsers.html

Anthony Ferrara:
Tries and Lexers

byChris Cornutt May 18, 2015 @ 14:47:32

Anthony Ferrara has an interesting new post to his site talking about tries and lexers, two pieces of a puzzle that are used during script execution. In this case, he's tried his hand at writing a parser which, naturally, lead to needing a lexer.

Lately I have been playing around with a few experimental projects. The current one started when I tried to make a templating engine. Not just an ordinary one, but one that understood the context of a variable so it could encode/escape it properly. [...] So, while working on the templating engine, I needed to build a parser. Well, actually, I needed to build 4 parsers. [...] I decided to hand write this dual-mode parser. It went a lot easier than I expected. In a few hours, I had the prototype built which could fully parse Twig-style syntax (or a subset of it) including a more-or-less standards-compliant HTML parser. [...] But I ran into a problem. I didn't have a lexer...

He starts with a brief description of what a lexer is and provides a simple example of an expression and how it would be parsed into its tokens. He then talks about the trie, a method for "walking" the input and representing the results in a tree structure. He shows a simple implementation of it in PHP, iterating over a set of tokens and the array results it produces. He then takes this and expands it out a bit into a "lex" function that iterates over the string and compiles the found tokens.

From there he comes back to the subject of Javascript, pointing out that it's a lot looser than PHP in how it even just allows numbers to be defined. His testing showed a major issue though - memory consumption. He found that a regular expression method consumed too much and tried compiling out to classes instead (and found it much faster once the process was going).

tagged: lexer parser example javascript tries tree data structure

Link: http://blog.ircmaxell.com/2015/05/tries-and-lexers.html

Michael Nitschinger's Blog:
Writing a simple lexer in PHP

byChris Cornutt May 10, 2012 @ 17:57:00

In this new post to his blog Michael Nitschinger shows you how to create a simple lexer to parse incoming content (like custom configuration files or anything that uses its own domain-specific language).

A lot of developers avoid writing parsers because they think it's pretty hard to do so. Writing an efficient parser for a general purpose language (like PHP, Ruby, Java,...) is hard, but fortunately, most of the time we don't need that much complexity. Typically we just want to parse input coming from config files or from a specific problem domain (expressed through DSLs). DSLs (Domain Specific Languages) are pretty cool, because they allow you to express logic and flow in a very specific and convenient way for a limited set of tasks.

He illustrates with an example based on the Lithium framework's routing engine and how it could parse a text file that relates a route to a controller/action combination. He creates a "Lexer" class that defines a few regular expressions to parse the incoming text strings for matches on things like whitespace, URLs and identifiers (words) and return each in the lexer's output.

tagged: lexer parse configuration regularexpression tutorial

Link:

Sameer Borate's Blog:
Building a simple Parser and Lexer in PHP

byChris Cornutt Nov 17, 2011 @ 17:57:59

In a new post to his blog Sameer Borate shows how to create a lexer and parser in PHP to work directly with the tokens of a PHP script.

After looking around for a while [for a good resource on compilers] I settled for Terence Parr's Language Implementation Patterns. This is exactly what I needed – bit sized patterns on compiler and parser design with working code. The book provides a recipe style approach, gradually moving from simple to complex compiler/parser design issues. As I primarily work with PHP, I thought of porting some code to PHP to see how it works.

He shows examples using his custom tool to show a basic lexer output for a list and a complete listing of the code involved. Ultimately, though, he finds that PHP isn't overly suited to the task - anything more than his simple example could be more trouble than it's worth.

tagged: lexer parser tutorial language implement token

Link:

Erling Alf Ellingsen's Blog:
PHP Must Die

byChris Cornutt Jan 11, 2010 @ 19:49:41

In a (slightly inflammatory) post to his blog today Erling Alf Ellingsen shares why he thinks that "PHP must die", mostly due to some of the inconsistencies his has with other languages.

His examples include:

String vs. numeric handling
That PHP supports octal numbers "by accident"
A lexer bug with hex values
A parser bug involving the ternary operator

Comments on the post include those supporting the "die" opinion - that PHP just doesn't have it together like other languages - and those taking a bit more balanced approach on PHP's strengths and weaknesses.

tagged: opinion lexer parser octal ternary

Link:

Abhinav Singh's Blog:
PHP tokens & opcodes: 3 useful extensions for understanding the Zend Engine

byChris Cornutt Nov 24, 2009 @ 17:32:31

Abhinav Singh has a recent post to his blog looking at three extensions that you can use to help understand the inner workings of the core Zend Engine.

“PHP tokens and opcodes” – When a PHP script is executed it goes through a number of processes, before the final result is displayed. These processes are namely: Lexing, Parsing, Compiling and Executing. In this blog post, I will walk you through all these processes with a sample example. In the end I will list some useful PHP extensions, which can be used to analyze results of every intermediate process.

He touches on the steps the average PHP script takes in its processing - lexing, parsing/compiling and the actual execution of the opcodes. The tokenizer, parsekit and VLD (Vulcan Logic Disassembler) extensions can help you get down into the nuts and bolts of the language and the engine that makes it work.

tagged: zendengine extension lexer compile opcode

Link:

Wez Furlong's Blog:
parser and lexer generators for PHP

byChris Cornutt Nov 27, 2006 @ 15:34:00

When finding he was in need of a parser and lexer, Wez Furlong decided to work up one that was PHP-based and a take off of the popular lemon parser and JLex lexer.

From time to time, I find that I need to put a parser together. Most of the time I find that I need to do this in C for performance, but other times I just want something convenient, like PHP, and have been out of luck.

His result is two new packages - lemon-php and JLexPHP (under a BSDish license) you can download and compile on your own system.

Also, if you'll remember a while back, Greg Beaver had wanted something similar (as mentioned in the comments) and created his own lexer/generator as well.

tagged: parser lexer jlex lemon port download compile java parser lexer jlex lemon port download compile java

Link:

Wez Furlong's Blog:
parser and lexer generators for PHP

byChris Cornutt Nov 27, 2006 @ 15:34:00

When finding he was in need of a parser and lexer, Wez Furlong decided to work up one that was PHP-based and a take off of the popular lemon parser and JLex lexer.

From time to time, I find that I need to put a parser together. Most of the time I find that I need to do this in C for performance, but other times I just want something convenient, like PHP, and have been out of luck.

His result is two new packages - lemon-php and JLexPHP (under a BSDish license) you can download and compile on your own system.

Also, if you'll remember a while back, Greg Beaver had wanted something similar (as mentioned in the comments) and created his own lexer/generator as well.

tagged: parser lexer jlex lemon port download compile java parser lexer jlex lemon port download compile java

Link:

Greg Beaver's Blog:
PHP_ParserGenerator and PHP_LexerGenerator

byChris Cornutt Jun 25, 2006 @ 22:00:41

Greg Beaver has blogged today with more about the port he's been wokring on of the Lemon parser generator to PHP5, this time discussion the creation of two packages - PHP_ParserGenerator and PHP_LexerGenerator.

Last week, I blogged about completing a port of the Lemon parser generator to PHP 5, which I thought was pretty cool. However, in an email, Alex Merz pointed out that without a lexer generator to accompany lemon, it's pretty difficult to write a decent parser.

After Alex's email, I started thinking about what it would take to write a lexer generator. Basically, a lexer generator requires parsing and compiling regular expressions, then scanning the source one character at a time to find matches. So, it occurred to me that perhaps simply combining regular expressions with sub-patterns could accomplish this task quite easily.

He goes on to explain this process, showing how a simple regular expresion call (and a look at its return arguments) could create a simple, easy solution. Since the re2c format is still unsupported in PHP (without a goto to go to), he opts to stick with the regular expressions and creates a "lex2php" format instead.

He's packaged up both halves of this setup and has already posted proposals for them to the PEAR site:

tagged: pear lexer generator parser package lemon port php5 pear lexer generator parser package lemon port php5

Link: