News Feed
Sections




News Archive
feed this:

Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

PHPMaster.com:
Working with Multibyte Strings
July 18, 2013 @ 10:12:55

On PHPMaster.com there's a tutorial posted that helps you understand how to work with multibyte strings in PHP. Multibyte strings could be a set of characters from a non-English language. They have to be treated differently than normal strings using the mbstring functionality.

A written language, whether it's English, Japanese, or whatever else, consists of a number of characters, so an essential problem when working with a language digitally is to find a way to represent each character in a digital manner. Back in the day we only needed to represent English characters, but it's a whole different ball game today and the result is a bewildering number of character encoding schemes used to represent the characters of many different languages. How does PHP relate to and deal with these different schemes?

He goes through a bit of introduction to multibyte strings - how they're represented internally, character schemes and Unicode. He also talks about the PHP support for the strings, noting that it's not really made to deal with them by default and the two methods you might use - iconv and mbstring. He shows how to enable the latter and introduces some of the most common functions you'll use with it (complete with some code examples).

0 comments voice your opinion now!
multibyte strings tutorial mbstring introduction unicode

Link: http://phpmaster.com/working-with-multibyte-strings

Project:
Patchwork-UTF8 - UTF8 Support for PHP
January 27, 2012 @ 11:38:40

Nicolas Grekas has shared another tool that he's pulled out of his "Patchwork" framework to make it a stand-alone tool: the Patchwork-UTF8 helper that provides matching functions to those PHP already has for regular strings, but a little smarter to work with UTF8 correctly.

The PatchworkUtf8 class implements the quasi complete set of string functions that need UTF-8 grapheme clusters awareness. These functions are all static methods of the PatchworkUtf8 class. The best way to use them is to add a use PatchworkUtf8 as u; at the beginning of your files, then when UTF-8 awareness is required, prefix by u:: when calling them.

In the README for the tool he talks about the functions included in the current release that match PHP's string functions as well as some additional methods like "isUtf8", "bestFit" and "strtocasefold". It relies on the mbstring, iconv and intl extensions being installed, and if they aren't, it falls back to other functionality (list of those methods included).

0 comments voice your opinion now!
utf8 support string patchwork framework helper mbstring iconv intl


Yannick's Blog:
mbstring vs iconv benchmarking
October 06, 2008 @ 12:50:20

Recently on his blog Yannick has done some benchmarking comparing mbstring and iconv in PHP 5.2.4 release.

Following up on my previous post about the differences between the mbstring and iconv international characters libraries (which resulted in a tentative conclusion that nobody knew anything about those differences), and particularly the comments by Nicola, we have combined forces (mostly efforts from Nicola, actually) to provide you with a little benchmarking, if that can help you decide.

His code for the test script is included (for you to gather your own results) and a full listing of his results comparing the effects of possible caching, running up to ten executions. You can download the text file that he ran the script on here.

0 comments voice your opinion now!
mbstring iconv benchmark php5 text file statistic


Vinu Thomas' Blog:
mbstring Functions by default in PHP
July 18, 2008 @ 07:57:16

In a new post to his blog, Vinu Thomas talks about a set of functions that can make your life easier when handling unicode strings - the mb_* methods of the mbstring extension.

When dealing with multiple languages and internalization in PHP, some of the default functions in PHP end up mangling up the unicode characters in PHP. This is evident when you have a lot of funny looking characters coming up on your web page instead of the actual characters. [...] There is an extensions called mbstring which you can install in PHP which gives you a set of functions which are unicode ( actually multibyte ) ready.

He mentions some of the replacements like mb_send_mail instead o fmail and mb_strlen instead of the usual strlen. Thankfully, there's a simple way to make use of these functions without having to replace a lot of code - a setting in your php.ini (mbstring.func_overload) that tells your application to seamlessly replace things behind the scenes.

0 comments voice your opinion now!
mbstring function utf8 unicode multibyte replace


Dokeos Blog:
mbstring vs iconv
April 24, 2008 @ 11:18:08

In this post on the Dokeos blog, there's a comparison of the mbstring function and the iconv library as it pertains to their use on multi-byte strings.

I was wondering today why use mbstring rather than iconv in Dokeos, and honestly I didn't remember exactly why I had chosen mbstring in the past, but finding information about the *differences* between the two. [...] Searching a bit more, I found a PPT presentation from Carlos Hoyos on Google.

Essentially, it boils down to how the library is integrated - mbstring is bundled and iconv is pulled from an external source. So, if you're looking for maximum portability, he recommends mbstring.

0 comments voice your opinion now!
mbstring iconv multibyte character string compare internal external


Alessandro Crugnola's Blog:
AMFPHP and mbstring
October 12, 2007 @ 09:23:00

Alessandro Crugnola was struggling with an application he was developing (with Flex and PHP) where his local PHP installation worked just fine but his remote system errored on the same code:

Connecting to the service browser I was receiving the error "Channel.Ping.Failed" error and investingating a bit more in the fault message I discovered that the source error was: "The class {Amf3Broker} could not be found under the class path {/var/htdocs/amfphp/services/amfphp/Amf3Broker.php}" and the Amf3Broker php class does not exists anywhere in amfphp!

Despite some default settings he found, though, things still weren't loading correctly. Finally, he found the culprit - mbstring. One server had the setting to overload the strings and the other didn't resulting in the return of corrupted data from the amfphp stream.

0 comments voice your opinion now!
amfphp mbstring flex application error amfphp mbstring flex application error


SitePoint PHP Blog:
Hot PHP UTF-8 tips
August 10, 2006 @ 14:50:03

Following up on some of his previous posts to the SitePoint PHP Blog, Harry Fuecks has posted this quick guide with some "hot UTF-8 tips" to share with the community.

As a result of all the noise about UTF-8, got an email from Marek Gayer with some very smart tips on handling UTF-8. What follows is a discussion illustrating what happens when you get obsessed with performance and optimizations (be warned - may be boring, depending on your perspective).

He talks mainly about using the native PHP functionality to avoid the mbstring issues that could arise by restricting locale behavior and using a fast case conversion function to handle strings correctly. The other tip involves delivery methods to those not able to recieve UTF-8 formatted content - checking their character set and responding accordingly.

0 comments voice your opinion now!
utf8 tips mbstring native locale behavior case conversion character set utf8 tips mbstring native locale behavior case conversion character set


Matthew Weir O'Phinney's Blog:
mbstring comes to the rescue
May 17, 2006 @ 05:49:23

Character encodings, especially when dealing with XML, in PHP can be a pain to say the least. Matthew Weir O'Phinney found this out first-hand when a script he was working with had a mixed character set in one of its strings, giving the XML parser in the SimpleXML functionality problems.

I tried a number of solutions, hoping actually to automate it via mbstring INI settings; these schemes all failed. iconv didn't work properly. The only thing that did work was to convert the encoding to latin1 -- but this wreaked havoc with actual UTF-8 characters.

Then, through a series of trial-and-error, all-or-nothing shots, I stumbled on a simple solution.

The discovery was to detect the encoding of the string itself (not really the content) and convert eveything in it to that encoding. How, you might ask? With the handy mb_detect_encoding and mb_convert_encoding functions. Of course, this functionality has to be compiled into PHP, but it's well worth it if it's exactly what you need.

0 comments voice your opinion now!
mbstring xml simplexml encoding utf-8 detect convert mbstring xml simplexml encoding utf-8 detect convert


SitePoint PHP Blog:
PHP UTF-8 0.1
February 28, 2006 @ 06:54:57

In this post from the SitePoint PHP Blog, Harry Fuecks talks about a new package of software he's worked up to make it possible for PHP to handle UTF-8 encoded strings - PHP UTF-8.

Been messing around with bits of this code for a long time, in fact since first really getting to grips with Dokuwiki, but finally got the first release out.

PHP UTF-8 is intended to make it possible to handle UTF-8 encoded strings in PHP, without requiring the mbstring extension (although it uses mbstring if it's available). In short, it provides versions of PHP's string functions (pretty much everything you'll find on this list), prefixed with utf_ and aware of UTF-8 encoding (that 1character >= 1 byte). It also gives you some tools to help check UTF-8 strings for "well formedness", strip bad sequences and some "ASCII helpers".

He continues the post, mentioning where some of the code for it was pulled from and a note about the documentation (there, but scarce). He also includes a warning for the use of it - not to use it "blindly" and only to use it when you need it, not to replace the standard PHP str_* functions.

2 comments voice your opinion now!
sitepoint utf-8 mbstring handle string encoded sitepoint utf-8 mbstring handle string encoded


SitePoint PHP Blog:
Living Dangerously with PHP and UTF-8
December 07, 2005 @ 07:45:38

In this new post on the SitePoint PHP Blog today, Harry looks at why it's "living dangerously" to use PHP with UTF-8.

Quick one'"knocked up a list of "dangerous" functions and functionality in PHP, in relation to the use of UTF-8, available at http://www.phpwact.org/php/i18n/utf-8. These are for a "default" PHP setup without the mbstring overloading or PHP6 (where charset problems "magically vanish" ;) ).

This follows on from (unfinished) stuff here on charsets (tending towards UTF-8), which should help explain some of this.

He also notes that you can't rely on mbstring to be there, so he offers an alternative...

0 comments voice your opinion now!
utf-8 mbstring functions utf-8 mbstring functions



Community Events





Don't see your event here?
Let us know!


framework list introduction library opinion deployment podcast interview release language community install zendserver series api tips update laravel symfony package

All content copyright, 2014 PHPDeveloper.org :: info@phpdeveloper.org - Powered by the Solar PHP Framework