Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

David's Blog:
PHP, XML, and Unicode
Mar 01, 2006 @ 18:22:42

David has posted the results from some of his testing with the Unicode support that's available in PHP, specifically in the context of XML.

A couple of weeks ago, Tim Bray posted about PHP and received a firestorm of comments.

As Tim updated his posting with comments, he linked to a two-year-old posting by Steve Minutillo about PHP4's inability to detect character encodings in XML files and other Unicode bugs. That caught me by surprise — after all, PHP uses the venerable Expat as its XML parsing engine (the same engine used in most programming environments other than Java), and if Expat wasn’t getting things right, then the PHP people must have gone way out of their way to misconfigure it.

So, in the rest of the post, he sets about testing these results for himself, using PHP version 4.4.0 and 5.0.5 to test with. He shows the code that he used to create the tests - it produces UTF-8 encoded text regardless of what type it's outputted as. There were some issues that he ran up against, but some of that it just due to the large abiguity that XML creation/handling has in PHP.

tagged: xml unicode utf-8 test support version 4.4.0 5.0.5 xml unicode utf-8 test support version 4.4.0 5.0.5