Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Tobias Schlitt's Blog:
Indexing XML - What Would You Do?
Aug 01, 2005 @ 14:26:56

Tobias Schlitt has an interesting post over on his weblog today that asks the question "indexing XML - what would you do?". (I know, it's not specifically PHP, but it's interesting in a webdev kind of way).

In a recent project one of the duties is to make a huge amount of XML documents searchable. I dealt with XML in several situations and used parts of XPath, XSL and Co., but this is a new challange. Be now I have made up my mind and thought about what to do, with so clear result, so I'm trying to ask the public for opinions.

Let me introduce the situation a bit more. I'm dealing with 5 different XML structures (basically) of which about 10.000 to 100.000 files and above have to become searchable. With that, I mainly need fulltext indexes on the contents of single tags and/or multiple tags at once. Search should include phrase and boolean expressions. The XML files maybe updated about once a day (only small parts per update).

So, after setting it all up for us, he has four different options that woul dmake life just a bit easier on him, but they all have their issues:

  • Using an XML database
  • Parsing the XML into a DB (same structure)
  • Parsing the XML into a DB (tree structure)
  • Using a real search engine

All of the ideas have potential, but there are problems down each path...he's definitely open to suggestions, and I'd like to see what your repsonses are to this one. XML is becoming more and more popular all the time, and indexing all of these documents is going to become a larger and larger pain in the neck...

tagged:

Link:


Trending Topics: