In this new post to his blog Michael Nitschinger shows you how to create a simple lexer to parse incoming content (like custom configuration files or anything that uses its own domain-specific language).
A lot of developers avoid writing parsers because they think it's pretty hard to do so. Writing an efficient parser for a general purpose language (like PHP, Ruby, Java,...) is hard, but fortunately, most of the time we don't need that much complexity. Typically we just want to parse input coming from config files or from a specific problem domain (expressed through DSLs). DSLs (Domain Specific Languages) are pretty cool, because they allow you to express logic and flow in a very specific and convenient way for a limited set of tasks.
He illustrates with an example based on the Lithium framework's routing engine and how it could parse a text file that relates a route to a controller/action combination. He creates a "Lexer" class that defines a few regular expressions to parse the incoming text strings for matches on things like whitespace, URLs and identifiers (words) and return each in the lexer's output.