<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>PHPDeveloper.org</title>
    <link>http://www.phpdeveloper.org</link>
    <description>Up-to-the Minute PHP News, views and community</description>
    <language>en-us</language>
    <pubDate>Tue, 21 May 2013 05:23:05 -0500</pubDate>
    <ttl>30</ttl>
    <item>
      <title><![CDATA[Ahmed Shreef's Blog: iconv misunderstands UTF-16 strings with no BOM]]></title>
      <guid>http://www.phpdeveloper.org/news/15035</guid>
      <link>http://www.phpdeveloper.org/news/15035</link>
      <description><![CDATA[<p>
<i>Ahmed Shreef</i> has <a href="http://shreef.com/2010/08/iconv-misunderstands-utf-16-strings-with-no-bom/">a recent post</a> to his blog about an issue he had converting UTF-16 strings over to UTF-8 with the <a href="http://php.net/iconv">iconv</a> functionality in PHP. Specifically, he ended up with "rubbish unreadable characters" after the conversion.
</p>
<blockquote>
I had a problem last week with converting UTF-16 encoded strings to UTF-8 using PHP's iconv library on a Linux server. my code worked fine on my machine but the same code resulted in a rubbish unreadable characters on our production server.
</blockquote>
<p>
In his example (a basic "Hello World" in Arabic) he notes that there's no <a href="http://en.wikipedia.org/wiki/Byte-order_mark">byte order mark</a> on the string and, because of this, the iconv feature tries to guess if it's big-endian or little-endian. This guessing varies from machine to machine resulting in the inconsistencies he saw. The solution is to define the "to" and "from" for the conversion manually rather than letting it just guess.
</p>]]></description>
      <pubDate>Fri, 27 Aug 2010 13:36:56 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Danne Lundqvist's Blog: Detecting UTF BOM - byte order mark]]></title>
      <guid>http://www.phpdeveloper.org/news/14435</guid>
      <link>http://www.phpdeveloper.org/news/14435</link>
      <description><![CDATA[<p>
In a new post to his blog <i>Danne Lundqvist</i> looks at <a href="http://www.dotvoid.com/2010/04/detecting-utf-bom-byte-order-mark/">a common pitfall</a> that could trip you up if you're not careful with your UTF-8 data - not looking for the UTF byte order mark that tells the application if it needs to be handled as UTF content.
</p>
<p>
One such thing is the occurrence of the UTF byte order mark, or BOM. [...] For UTF-8, especially on Windows, it has become more and more common to use it to indicate that the file is indeed UTF. Most text editors handle this well and you won't ever see these bytes. As it should be.
</p>
<p>
He points out what could cause an issue - using <a href="http://php.net/strcmp">strcmp</a> or <a href="http://php.net/substr">substr</a> but it can be prevented by looking at and removing those first three bytes if needed. He includes a snippet of code that does just that.
</p>]]></description>
      <pubDate>Thu, 29 Apr 2010 11:47:03 -0500</pubDate>
    </item>
  </channel>
</rss>
