<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>PHPDeveloper.org</title>
    <link>http://www.phpdeveloper.org</link>
    <description>Up-to-the Minute PHP News, views and community</description>
    <language>en-us</language>
    <pubDate>Thu, 04 Dec 2008 15:07:42 -0600</pubDate>
    <ttl>30</ttl>
    <item>
      <title><![CDATA[Raphael Stolt's Blog: Scraping websites with Zend_Dom_Query]]></title>
      <guid>http://www.phpdeveloper.org/news/11239</guid>
      <link>http://www.phpdeveloper.org/news/11239</link>
      <description><![CDATA[<p>
<i>Raphael Stolt</i> has a <a href="http://raphaelstolt.blogspot.com/2008/10/scraping-websites-with-zenddomquery.html">new blog post</a> today with a tutorial showing how to take the Zend_Dom_Query component out of the Zend Framework and use it to scrape content from another web site.
</p>
<blockquote>
Today I stumbled upon an interesting and reportable scenario were I had to extract information of the weekly published Drum and Bass <a href="http://www.bbc.co.uk/1xtra/drumbass/chart/">charts</a> provided by BBC 1Xtra. As this information currently isn't available in any consumer friendly format like for example a RSS feed, I had to go that scraping route but didn't want to hustle with a regex approach. Since version 1.6.0 the <A href="http://framework.zend.com/manual/en/zend.dom.query.html">Zend_Dom_Query</a> component has been added to the framework mainly to support functional testing of MVC applications, but it also can be used for rolling custom website scrapers in a snap. Woot, perfect match!
</blockquote>
<p>
He includes the code for his Bbc_DnbCharts_Scraper class he's created to show how the data is pulled in (via curl) and pushed into an object to be parsed.
<p>]]></description>
      <pubDate>Fri, 17 Oct 2008 14:31:34 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Developer Tutorials Blog: Easy Screen Scraping in PHP with the Simple HTML DOM Library]]></title>
      <guid>http://www.phpdeveloper.org/news/10771</guid>
      <link>http://www.phpdeveloper.org/news/10771</link>
      <description><![CDATA[<p>
On the Developer Tutorials blog today <i>Akash Mehta</i> looks at screen scraping using a <a href="http://simplehtmldom.sourceforge.net/">simple DOM library</a> to help make it easy.
</p>
<blockquote>
In PHP, regular expressions tend to get rather messy, DOM calls can be confusing and verbose, and often the string functions just aren't enough. In this tutorial, I'll show you how to use the middle ground - the open source PHP Simple HTML DOM Parser library, which provides jQuery-grade awesomeness for easy screen scraping without messy regular expressions.
</blockquote>
<p>
In his <a href="http://www.developertutorials.com/blog/php/easy-screen-scraping-in-php-simple-html-dom-library-simplehtmldom-398/">examples</a> he shows how to grab the contents of a page (google.com), find and replace information in the HTML and perform a query on the data (like finding each of the news stories out of the front page of slashdot.org).
</p>]]></description>
      <pubDate>Wed, 06 Aug 2008 08:40:15 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Matthew Turland's Blog: How-To (and How-Not-To) on Web Scraping]]></title>
      <guid>http://www.phpdeveloper.org/news/9798</guid>
      <link>http://www.phpdeveloper.org/news/9798</link>
      <description><![CDATA[<p>
<i>Matthew Turland</i> has a few things to say about web scraping (and <a href="http://php.dzone.com/news/writing-website-scrapers-php">recent articles</a> covering it) on <a href="http://ishouldbecoding.com/2008/03/12/scraping-html-with-dom">his blog today</a> as an author of a previous article published in <a href="http://www.phparch.com">php|architect</a> covering the same topic:
</p>
<blockquote>
A friend of mine who shall remain nameless pointed a <a href="http://php.dzone.com/news/writing-website-scrapers-php">post</a> out to me on the <a href="http://php.dzone.com/">PHP DZone</a> web site recently. Noting that the article's content was misinformed at best and downright ignorant at worst, even when examining it sheerly from the author's knowledge of PHP as a language, this friend asked that I set the author straight.
</blockquote>
<p>
He mentions his <a href="http://php.dzone.com/news/writing-website-scrapers-php#comment-1497">comments</a> on the post correcting the author on some points as well as a more "clued in" <a href="http://www.xml.lt/Blog/2008/03/11/Scraping+html+with+DOM">post</a> on the xml.lt website talking about using PHP's DOM functionality instead.
</p>]]></description>
      <pubDate>Fri, 14 Mar 2008 11:18:44 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[DeveloperTutorials.com: Scraping Links With PHP  ]]></title>
      <guid>http://www.phpdeveloper.org/news/9414</guid>
      <link>http://www.phpdeveloper.org/news/9414</link>
      <description><![CDATA[<p>
The Developer Tutorials site has posted a new article <a href="http://www.developertutorials.com/tutorials/php/scraping-links-with-php-8-01-05/page1.html">covering the creation</a> of a small application that can help you scrape content from a remote page and pull it into your script.
</p>
<blockquote>
In this tutorial you will learn how to build a PHP script that <a href="http://en.wikipedia.org/wiki/Web_scraping">scrapes</a> links from any web page.
</blockquote>
<p>
You'll learn to use cURL, the DOM functions, XPath and a bit of MySQL to get the job done. It's nice to see that they also include <a href="http://www.developertutorials.com/tutorials/php/scraping-links-with-php-8-01-05/page9.html">a section</a> looking at one of the more touchy aspects of web page scraping - "is it legal?"
</p>]]></description>
      <pubDate>Mon, 14 Jan 2008 08:44:00 -0600</pubDate>
    </item>
    <item>
      <title><![CDATA[Matthew Turland's Blog: Web Scraping Article Published]]></title>
      <guid>http://www.phpdeveloper.org/news/9296</guid>
      <link>http://www.phpdeveloper.org/news/9296</link>
      <description><![CDATA[<p>
<i>Matthew Turland</i> has <a href="http://ishouldbecoding.com/2007/12/20/web-scraping-article-published/">mentioned his article</a> in the latest issue of <a href="http://www.phparch.com">php|architect magazine</a> (the <a href="http://www.phparch.com/c/magazine/issue/63">December 2007 issue</a>) covering web scraping.
</p>
<blockquote>
Just a quick post to announce (albeit a little late) that the <a href="http://www.phparch.com/c/magazine/issue/63">December 2007 issue of php|architect</a>, which includes my article on web scraping. Please buy a copy, give it a read, and feel free to post comments on the <a href="http://forum.phparch.com/421">forum thread</a> for the article. I'd love to hear some reader feedback!
</blockquote>
<p>
Other articles in <a href="http://www.phparch.com/c/magazine/issue/63">the issue</a> include "Writing an Object/Relational Mapper in PHP", "Understanding CLI" and the cover story "Going native with mysqlnd".
</p>]]></description>
      <pubDate>Fri, 21 Dec 2007 10:22:00 -0600</pubDate>
    </item>
    <item>
      <title><![CDATA[Zend Developer Zone: PHP Abstract Episode 22: Screen Scraping]]></title>
      <guid>http://www.phpdeveloper.org/news/8863</guid>
      <link>http://www.phpdeveloper.org/news/8863</link>
      <description><![CDATA[<p>
The Zend Developer Zone has <a href="http://devzone.zend.com/article/2631-PHP-Abstract-Episode-22-Screen-Scraping">posted the latest episode</a> of their PHP Abstract podcast series with a focus this time on screen scraping (as recorded by <i>Matthew Turland</a>).
</p>
<blockquote>
Matt has been building PHP applications since 2002. He joined the ranks of PHP-Oracle developers in 2006 when he became a programmer for METHODS Technology Solutions. He contributes to open source projects such as the Zend and Forkr frameworks. [...] We are going to veer off the beaten path today as Matthew talks to us about Screen Scraping.
</blockquote>
<p>
You can download the <a href="http://s3.amazonaws.com/php_abstract_2007/php_abstract_episode_022.mp3">mp3 directly</a> or you can <a href="http://feeds.feedburner.com/phpabstract"subscribe to their feed</a> to get this and the most recent episodes.
</p>]]></description>
      <pubDate>Thu, 18 Oct 2007 18:49:36 -0500</pubDate>
    </item>
  </channel>
</rss>
