Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Efficient Chinese Search with Elasticsearch
Dec 19, 2014 @ 17:56:41

On the SitePoint PHP blog a new tutorial has been posted showing you how to effectively search Chinese content with ElasticSearch. ElasticSearch is a "powerful open source search and analytics engine that makes data easy to explore" and plays nice with PHP via a JSON based query format.

If you have played with Elasticsearch, you already know that analyzing and tokenization are the most important steps while indexing content, and without them your pertinency is going to be bad, your users unhappy and your results poorly sorted. Even with English content you can lose pertinence with a bad stemming, miss some documents when not performing proper elision and so on. And that’s worse if you are indexing another language; the default analyzers are not all-purpose. When dealing with Chinese documents, everything is even more complex, even by considering only Mandarin which is the official language in China and the most spoken worldwide.

He starts by explaining exactly what the problem is with searching Chinese content including the fact that some words can actually be a combination of two or more characters (words). He then lists out a few plugins and tools that can be integrated with ElasticSearch to help with analyzing the content. He goes through each of them and provides instructions on installation and usage. He ends the post with a sample of the results for a set of three search terms, comparing the matches each found.

tagged: chinese search elasticsearch tutorial tokenization analysis

Link: http://www.sitepoint.com/efficient-chinese-search-elasticsearch/

The Northclick Blog:
A comma is a comma is a comma...or is it?
Sep 20, 2007 @ 14:32:00

Internationalizing a website can bring all sorts of challenges, as Markus Wolff found out when working on a recent project:

When you're building international websites, there's always something new to learn. Especially if one of the languages your website is available in uses a character set different from anything you're used to. For jimdo.com, the greatest challenge as of yet is the chinese version.

His focus isn't so much on the content of the page but on one small character that caused him headaches - the comma. Unfortunately, it seems that Unicode has its own commas that don't quite adhere to the "normal" rules to make them easy to work with (and, in his case, split with a regular expression). The fix to the situation was simple, though - adding a "u" modifier after the expression made it Unicode-aware and split the information correctly.

tagged: comma unicode support regularexpression preg chinese comma unicode support regularexpression preg chinese

Link:

The Northclick Blog:
A comma is a comma is a comma...or is it?
Sep 20, 2007 @ 14:32:00

Internationalizing a website can bring all sorts of challenges, as Markus Wolff found out when working on a recent project:

When you're building international websites, there's always something new to learn. Especially if one of the languages your website is available in uses a character set different from anything you're used to. For jimdo.com, the greatest challenge as of yet is the chinese version.

His focus isn't so much on the content of the page but on one small character that caused him headaches - the comma. Unfortunately, it seems that Unicode has its own commas that don't quite adhere to the "normal" rules to make them easy to work with (and, in his case, split with a regular expression). The fix to the situation was simple, though - adding a "u" modifier after the expression made it Unicode-aware and split the information correctly.

tagged: comma unicode support regularexpression preg chinese comma unicode support regularexpression preg chinese

Link:

Rob Allen's Blog:
Simplified Chinese Version of my Zend Auth Tutorial
Apr 09, 2007 @ 13:22:00

Rob Allen points out a new Simplified Chinese version of his Zend Framework tutorial today:

Just a quick heads up to let you know that that Jason Qi has published the Simplified Chinese version of my Getting Started with Zend Auth tutorial. You can find the PDF on his download site.

He also mentions an update that's been made to it (correcting a few issues) to bring its version up to 1.0.4.

tagged: zendframework zendauth tutorial simplified chinese zendframework zendauth tutorial simplified chinese

Link:

Rob Allen's Blog:
Simplified Chinese Version of my Zend Auth Tutorial
Apr 09, 2007 @ 13:22:00

Rob Allen points out a new Simplified Chinese version of his Zend Framework tutorial today:

Just a quick heads up to let you know that that Jason Qi has published the Simplified Chinese version of my Getting Started with Zend Auth tutorial. You can find the PDF on his download site.

He also mentions an update that's been made to it (correcting a few issues) to bring its version up to 1.0.4.

tagged: zendframework zendauth tutorial simplified chinese zendframework zendauth tutorial simplified chinese

Link:

Evans Data Corporation:
More Developers Are Using AJAX & PHP in Emerging Markets Than North America
Aug 29, 2006 @ 12:42:45

According to this new release from the Evans Data Corporation, certain technologies are being more well used in other areas besides in North America - including PHP.

There is a greater disparity with PHP use in Emerging Markets. The largest number of developers using the language can be found in Eastern Europe (39%). India and Brazil have similar usage at a little over 31%. In contrast, only 21% of Chinese developers are utilizing PHP. In North America, 35% of developers are using PHP.

They also talk about the usage rates of a few other technologies, including Flash, Ajax, Windows, and Linux.

Cal Evans also mentions these trends in this post on the Zend Developer Zone as well as linking to the full EDC report.

tagged: usage north america europe india brazil chinese energing market usage north america europe india brazil chinese energing market

Link:

Evans Data Corporation:
More Developers Are Using AJAX & PHP in Emerging Markets Than North America
Aug 29, 2006 @ 12:42:45

According to this new release from the Evans Data Corporation, certain technologies are being more well used in other areas besides in North America - including PHP.

There is a greater disparity with PHP use in Emerging Markets. The largest number of developers using the language can be found in Eastern Europe (39%). India and Brazil have similar usage at a little over 31%. In contrast, only 21% of Chinese developers are utilizing PHP. In North America, 35% of developers are using PHP.

They also talk about the usage rates of a few other technologies, including Flash, Ajax, Windows, and Linux.

Cal Evans also mentions these trends in this post on the Zend Developer Zone as well as linking to the full EDC report.

tagged: usage north america europe india brazil chinese energing market usage north america europe india brazil chinese energing market

Link:


Trending Topics: