<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>PHPDeveloper.org</title>
    <link>http://www.phpdeveloper.org</link>
    <description>Up-to-the Minute PHP News, views and community</description>
    <language>en-us</language>
    <pubDate>Tue, 08 Jul 2008 23:47:09 -0500</pubDate>
    <ttl>30</ttl>
    <item>
      <title><![CDATA[Markus Wolff's Blog: Fulltext search as a webservice]]></title>
      <guid>http://www.phpdeveloper.org/news/10134</guid>
      <link>http://www.phpdeveloper.org/news/10134</link>
      <description><![CDATA[<p>
In a <a href="http://blog.wolff-hamburg.de/archives/22-Fulltext-search-as-a-webservice.html">recent blog entry</a> about a fulltext searching solution, <i>Markus Wolff</i> hacked together in a few hours with Zend_Search_Lucene:
</p>
<blockquote>
While working at some really old code that provided a fulltext search feature, I was at one point incredibly pissed rather unsatisfied due to the fact that said code resisted all attempts to debug it. This lead to the decision to sit down on a rainy weekend to try if I couldn't come up with something more useful, and most importantly, scalable.
</blockquote>
<p>
<a href="http://blog.wolff-hamburg.de/archives/22-Fulltext-search-as-a-webservice.html">His method</a> allowed for separation between the indexing and the main app and how he changes some of his methods when he learned that <a href="http://lucene.apache.org/solr/">Solr</a> did something very similar. He also lays out some example XML content and how it's handled in his script (via a SimpleXML object).
</p>]]></description>
      <pubDate>Wed, 07 May 2008 12:57:47 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Maarten Balliauw's Blog: Indexing Word 2007 (docx) files with Zend_Search_Lucene]]></title>
      <guid>http://www.phpdeveloper.org/news/9569</guid>
      <link>http://www.phpdeveloper.org/news/9569</link>
      <description><![CDATA[<p>
<i>Maarten Balliauw</i> has <a href="http://blog.maartenballiauw.be/post/2008/02/Indexing-Word-2007-(docx)-files-with-Zend_Search_Lucene.aspx">written about</a> a method he's developed to convince the Zend_Search_Lucene component of the <a href="http://framework.zend.com">Zend Framework</a> to index the contents of a Word 2007 document.
</p>
<blockquote>
Lucene basically is an indexing and search technology, providing an easy-to-use API to create any type of application that has to do with indexing and searching. If you provide the right methods to extract data from any type of document, Lucene can index it. [...] Sounds like a challenge!
</blockquote>
<p>
He works through the three step process to getting the searching working, the key being his readDocXContents() function that goes through the Word file and returns all the text it can find. This is passed back out so the Zend Framework component can pull the data in and search (their example is on the string "Code Access Security").
</p>
<p>
You can grab the <a href="http://examples.maartenballiauw.be/LuceneIndexingDOCX/LuceneIndexingDOCX.zip">the full code here</a>.
</p>]]></description>
      <pubDate>Tue, 05 Feb 2008 10:24:00 -0600</pubDate>
    </item>
    <item>
      <title><![CDATA[Kapustabrothers.com: Indexing PDF Documents with Zend_Search_Lucene]]></title>
      <guid>http://www.phpdeveloper.org/news/9472</guid>
      <link>http://www.phpdeveloper.org/news/9472</link>
      <description><![CDATA[<p>
As <a href="http://devzone.zend.com/article/3000-kapustabrothers.com---Indexing-PDF-Documents-with-Zend_Search_Lucene">mentioned</a> on the Zend Developer Zone, there's a <a href="http://www.kapustabrothers.com/2008/01/20/indexing-pdf-documents-with-zend_search_lucene/">new post</a> on kapustabrothers.com about a method for indexing all of those PDF files your site uses with the help of the Zend Framework's Zend_Search_Lucene component.
</p>
<blockquote>
along with many others have been trying and asking how to index and search PDF files. Once Zend released its Framework, which is a port of Java Lucene to PHP, I decided to jump on board and find a way to index and search PDF files.
</blockquote>
<p>
He uses the <a href="http://www.foolabs.com/xpdf/">XPDF</a> software to parse out the PDF files and the ZF component to do the actual indexing and searching. XPDF extracts key information from the PDF and puts it out to a new file where Zend_Search_Lucene can get to it. Example code is included to show the automatic creation of these details and how to add them to the component's index.
</p>]]></description>
      <pubDate>Wed, 23 Jan 2008 07:58:00 -0600</pubDate>
    </item>
  </channel>
</rss>
