<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>PHPDeveloper.org</title>
    <link>http://www.phpdeveloper.org</link>
    <description>Up-to-the Minute PHP News, views and community</description>
    <language>en-us</language>
    <pubDate>Fri, 24 May 2013 18:29:16 -0500</pubDate>
    <ttl>30</ttl>
    <item>
      <title><![CDATA[Raphael Stolt's Blog: Scraping websites with Zend_Dom_Query]]></title>
      <guid>http://www.phpdeveloper.org/news/11239</guid>
      <link>http://www.phpdeveloper.org/news/11239</link>
      <description><![CDATA[<p>
<i>Raphael Stolt</i> has a <a href="http://raphaelstolt.blogspot.com/2008/10/scraping-websites-with-zenddomquery.html">new blog post</a> today with a tutorial showing how to take the Zend_Dom_Query component out of the Zend Framework and use it to scrape content from another web site.
</p>
<blockquote>
Today I stumbled upon an interesting and reportable scenario were I had to extract information of the weekly published Drum and Bass <a href="http://www.bbc.co.uk/1xtra/drumbass/chart/">charts</a> provided by BBC 1Xtra. As this information currently isn't available in any consumer friendly format like for example a RSS feed, I had to go that scraping route but didn't want to hustle with a regex approach. Since version 1.6.0 the <A href="http://framework.zend.com/manual/en/zend.dom.query.html">Zend_Dom_Query</a> component has been added to the framework mainly to support functional testing of MVC applications, but it also can be used for rolling custom website scrapers in a snap. Woot, perfect match!
</blockquote>
<p>
He includes the code for his Bbc_DnbCharts_Scraper class he's created to show how the data is pulled in (via curl) and pushed into an object to be parsed.
<p>]]></description>
      <pubDate>Fri, 17 Oct 2008 14:31:34 -0500</pubDate>
    </item>
  </channel>
</rss>
