Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Marco Tabini's Blog:
BeebleX is Crawling
Aug 16, 2005 @ 10:44:16

On Marci Tabini's weblog today, he has a quick post with more details on the development of the BeebleX search engine.

Over the weekend, I wrote a multi-threaded crawler for BeebleX so that it can index those resources that do not warrant a dedicated indexer (either because they don’t have a proper RSS interface or because it’s just more convenient to access them as web pages).

The big task ahead right now consists of writing extraction routines for each resource that needs to be crawled. The idea is that, instead of just indexing web pages wholesale and then poisoning the database with misplaced terms, we want to index the actual content of the pages itself.

With more and more improvements happening every day, the BeebleX PHP search engine is becoming more and more of a great resource for any PHP developer out there...

tagged:

Link:


Trending Topics: