<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>PHPDeveloper.org</title>
    <link>http://www.phpdeveloper.org</link>
    <description>Up-to-the Minute PHP News, views and community</description>
    <language>en-us</language>
    <pubDate>Fri, 04 Jul 2008 17:26:50 -0500</pubDate>
    <ttl>30</ttl>
    <item>
      <title><![CDATA[Developer Tutorials Blog: Extracting text from Word Documents via PHP and COM]]></title>
      <guid>http://www.phpdeveloper.org/news/9861</guid>
      <link>http://www.phpdeveloper.org/news/9861</link>
      <description><![CDATA[<p>
In a <a href="http://www.developertutorials.com/blog/php/extracting-text-from-word-documents-via-php-and-com-81/">recent blog post</a> <i>Akash Mehta</i> showed how to reach into a Microsoft document (a Word file) and pull out the content inside via a PHP script.
</p>
<blockquote>
Communicating via COM in PHP is easy as ever; especially for people coming from a VB background where executing complex tasks in MS-applications is a piece of cake, you will feel right at home in PHP. In fact, VB COM calls can be converted to PHP COM calls in just a few simple search and replaces.
</blockquote>
<p>
He shows how to use the COM extension in a (Windows) PHP installation to access the text inside the document and manipulate the contents however you'd like (even writing them back out to another Word file).
</p>]]></description>
      <pubDate>Wed, 26 Mar 2008 12:02:06 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Maarten Balliauw's Blog: Indexing Word 2007 (docx) files with Zend_Search_Lucene]]></title>
      <guid>http://www.phpdeveloper.org/news/9569</guid>
      <link>http://www.phpdeveloper.org/news/9569</link>
      <description><![CDATA[<p>
<i>Maarten Balliauw</i> has <a href="http://blog.maartenballiauw.be/post/2008/02/Indexing-Word-2007-(docx)-files-with-Zend_Search_Lucene.aspx">written about</a> a method he's developed to convince the Zend_Search_Lucene component of the <a href="http://framework.zend.com">Zend Framework</a> to index the contents of a Word 2007 document.
</p>
<blockquote>
Lucene basically is an indexing and search technology, providing an easy-to-use API to create any type of application that has to do with indexing and searching. If you provide the right methods to extract data from any type of document, Lucene can index it. [...] Sounds like a challenge!
</blockquote>
<p>
He works through the three step process to getting the searching working, the key being his readDocXContents() function that goes through the Word file and returns all the text it can find. This is passed back out so the Zend Framework component can pull the data in and search (their example is on the string "Code Access Security").
</p>
<p>
You can grab the <a href="http://examples.maartenballiauw.be/LuceneIndexingDOCX/LuceneIndexingDOCX.zip">the full code here</a>.
</p>]]></description>
      <pubDate>Tue, 05 Feb 2008 10:24:00 -0600</pubDate>
    </item>
  </channel>
</rss>
