News Feed
Jobs Feed
Sections




News Archive
Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Stefan Koopmanschap's Blog:
PHP Hidden Gem similar_text()
July 13, 2009 @ 09:37:50

Stefan Koopmanschap has written about a hidden gem he discovered in PHP to help locate blocks of text that seem similar from one or more sources - similar_text.

I am working on a hobby project where I aggregate feeds from several different sources. With the blogs I work it right now, it often happens that an author posts the same post to a few different sites. However, because of site formats and sometimes also quick edits an author makes on one site but not on the author, the article contents are usually not identical strings. So I needed something that would help me figure out whether or not two strings are nearly identical.

After Googling around and finding things like the xdiff extension and soundex, he discovered the two functions he needed - levenshtein and similar_text.

I am still trying to figure out which percentage will catch the duplicates but not catch too many posts which are only similar but not actually duplicates, but with the above 75% I seem to catch quite a few duplicates so far.
0 comments voice your opinion now!
similartext gem hidden


blog comments powered by Disqus

Similar Posts

Zend Developer Zone: Zend Framework Hidden Gems: Zend_Config

Zend Developer Zone: Zend Framework Hidden Gems: Introduction

DZone.com: Creating a virtual server with Vagrant: a practical walkthrough

Zend Developer Zone: Zend Framework Hidden Gems: Zend_Db

Stefan Koopmanschap's Blog: PHP Hidden Gem: similar_text()


Community Events











Don't see your event here?
Let us know!


hhvm install composer package language framework performance application introduction facebook series hack code unittest podcast component release security symfony2 opinion

All content copyright, 2014 PHPDeveloper.org :: info@phpdeveloper.org - Powered by the Solar PHP Framework