Raphael Stolt has a new blog post today with a tutorial showing how to take the Zend_Dom_Query component out of the Zend Framework and use it to scrape content from another web site.
Today I stumbled upon an interesting and reportable scenario were I had to extract information of the weekly published Drum and Bass charts provided by BBC 1Xtra. As this information currently isn't available in any consumer friendly format like for example a RSS feed, I had to go that scraping route but didn't want to hustle with a regex approach. Since version 1.6.0 the Zend_Dom_Query component has been added to the framework mainly to support functional testing of MVC applications, but it also can be used for rolling custom website scrapers in a snap. Woot, perfect match!
He includes the code for his Bbc_DnbCharts_Scraper class he's created to show how the data is pulled in (via curl) and pushed into an object to be parsed.