Text formats use character encodings to map characters to their binary representation. When using only characters in the ASCII range (US English), character encodings seem to 'just work'. You may never even be aware of the character encoding you are using. This is because the ASCII characters are represented the same way in all of the popular character encodings used on the Web, so if you never need a foreign character you won't ever encounter a problem. However, once you deviate from this common denominator of ASCII characters and start using characters from foreign languages, their representation in binary form may depend on the character encoding used, and if you get the encoding confused you can end up with invalid characters.
In this new posting on Dynamically Typed today, Thomas Rutter explains some of the difficulties that differing character sets can cause in your applications. Fortunately, as he states, there is the mbstring extension that you can enable to make the transition between character sets.
I'm going to have to agree with the last suggestion, though - the use of UTF-8 can really make things a lot easier and is much more flexible when it comes to the output from your scripts...




