<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>PHPDeveloper.org</title>
    <link>http://www.phpdeveloper.org</link>
    <description>Up-to-the Minute PHP News, views and community</description>
    <language>en-us</language>
    <pubDate>Thu, 23 May 2013 10:40:16 -0500</pubDate>
    <ttl>30</ttl>
    <item>
      <title><![CDATA[Robert Basic's Blog: Book review - Guide to Web Scraping with PHP]]></title>
      <guid>http://www.phpdeveloper.org/news/16411</guid>
      <link>http://www.phpdeveloper.org/news/16411</link>
      <description><![CDATA[<p>
In <a href="http://robertbasic.com/blog/book-review-guide-to-web-scraping-with-php/">this new post to his blog</a> <i>Robert Basic</i> has a review of a book from php|architect (by <i>Matthew Turland</i>), "Guide to Web Scraping with PHP".
</p>
<blockquote>
It took me a while to grab myself a copy of <a href="http://matthewturland.com/">Matthew Turland</a>'s "<a href="http://www.phparch.com/books/phparchitects-guide-to-web-scraping-with-php/">Guide to Web Scraping with PHP</a>", but a few weeks ago a copy finally arrived and I had the pleasure of reading it. [...] My overall impression of the book is that it was worth the time and I'm really glad that I bought it. Matthew did a great job explaining all the tools we have at our disposal for writing web scrapers and how to use them.
</blockquote>
<p>
He talks about the content of a few specific chapters (the HTTP protocol, client libraries you can use and how to prepare documents for parsing) and notes that there's not much bad he can think of about the book:
</p>
<blockquote>
It is a guide, clear and straight-to-the-point, explaining what tools are there, which one to use and how for writing scrapers and that's exactly what I wanted to know.
</blockquote>]]></description>
      <pubDate>Wed, 01 Jun 2011 09:28:42 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Matthew Turland's Blog: "Web Scraping with PHP" Now Available in Print!]]></title>
      <guid>http://www.phpdeveloper.org/news/15157</guid>
      <link>http://www.phpdeveloper.org/news/15157</link>
      <description><![CDATA[<p>
If you've been waiting for the print edition of <i>Matthew Turland</i>'s "Web Scraping with PHP" book (from php|architect Press) your <a href="http://matthewturland.com/2010/09/18/web-scraping-with-php-now-available-in-print/">wait is over</a>. According to a new post on his blog the print version is now <a href="http://www.phparch.com/wp-content/themes/phpa/helpers/book/buy.php?book_web_scraping">available for order</a>.
</p>
<blockquote>
I know a number of my readers have been waiting for this announcement: my book, Web Scraping with PHP, is now available for sale in hard copy form! That's right, you can now <a href="http://www.phparch.com/wp-content/themes/phpa/helpers/book/buy.php?book_web_scraping">finally order your very own print edition copy</a>. [...] To those who felt forced into buying the PDF edition to get access to the content because a print edition was not available until now, you have my most sincere and profound apologies. 
</blockquote>
<p>
His web scraping book covers topics like understanding HTTP requests on a base level, working with several HTTP clients like cURL, pecl_http, Zend_Http_Client and how to analyze the remote page's information with things like SimpleXML, the DOM functions and the XMLReader extension. If the print version's not your thing, you can still <a href="http://www.phparch.com/books/phparchitects-guide-to-web-scraping-with-php/">get the PDF</a> from the php|architect store too.
</p>]]></description>
      <pubDate>Mon, 20 Sep 2010 12:03:49 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Community News: php|architect Releases "Guide to Web Scraping"]]></title>
      <guid>http://www.phpdeveloper.org/news/14390</guid>
      <link>http://www.phpdeveloper.org/news/14390</link>
      <description><![CDATA[<p>
php|architect has officially released one of their latest guides - this time it's <i>Matthew Turland</i>'s "<a href="http://www.phparch.com/books/phparchitects-guide-to-web-scraping-with-php/">Guide to Web Scraping</a>".
</p>
<p>
<i>Matthew</i> talks a bit about it in his <a href="http://matthewturland.com/2010/04/20/web-scraping-with-php-now-available/">latest blog entry</a>:
</p>
<blockquote>
What I'm announcing in this blog post has been in the works since early 2008 when I first pitched the idea. It was rejected by several major publishers who basically said the same thing: the idea was in too small of a niche or simply wasn't marketable. <a href="http://www.phparch.com/books/">php|architect Press</a> respectfully disagreed with them and decided to publish what is now a book written by me that you can purchase.
</blockquote>
<p>
The book covers all things related to pulling content from remote pages including an understanding of HTTP codes, a look at tools you can use (including cURL, pecl_http and Zend_Http_Client) and how to use technologies like DOM, SimpleXML and regular expressions to match content.
</p>]]></description>
      <pubDate>Thu, 22 Apr 2010 08:25:36 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Sameer Borate's Blog: Web scraping tutorial]]></title>
      <guid>http://www.phpdeveloper.org/news/12088</guid>
      <link>http://www.phpdeveloper.org/news/12088</link>
      <description><![CDATA[<p>
In <a href="http://www.codediesel.com/php/web-scraping-in-php-tutorial/">a new tutorial</a> on his blog today, <i>Sameer</i> shows a library that you can use (<a href="http://simplehtmldom.sourceforge.net/">simplehtmldom</a>) to parse remote sites and pull out just the information you need (aka "web scraping").
</p>
<blockquote>
There are three ways to access a website data. One is through a browser, the other is using a API (if the site provides one) and the last by parsing the web pages through code. The last one also known as Web Scraping is a technique of extracting information from websites using specially coded programs. In this post we will take a quick look at writing a simple scraper using the <a href="http://simplehtmldom.sourceforge.net/">simplehtmldom</a> library.
</blockquote>
<p>
His three (really more) step process guides yo through installing the library, installing Firebug and some example code to create your first scraper - an example that pulls some of the "Featured Links" from the Google search results sidebar. The second example illustrates grabbing the list of the table of contents from the most recent issue of <a href="http://wired.com">Wired</a>.
</p>]]></description>
      <pubDate>Mon, 09 Mar 2009 07:52:43 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Juozas Kaziukenas' Blog: Web scraping with PHP and XPath]]></title>
      <guid>http://www.phpdeveloper.org/news/11972</guid>
      <link>http://www.phpdeveloper.org/news/11972</link>
      <description><![CDATA[<p>
In <a href="http://dev.juokaz.com/php/web-scraping-with-php-and-xpath">this new post</a> to his blog <i>Juozas Kaziukenas</i> takes a look at one method for getting the information out of a remote page - parsing it with PHP and XPath (assuming the page is correctly formatted).
</p>
<blockquote>
When I was writing about <a href="http://dev.juokaz.com/php/web-scraping-easy-way-to-monitor-market">how I use web scraping</a>, I was still hadn't tried using Xpath (shame on me). [...] It turned out, that using Xpath is extremely easy, really. When you master it, you can do everything in seconds. Yes, you need to know how XML works and how to write correct Xpath queries (brief explanation of Xpath syntax is available at <a href="http://www.w3schools.com/XPath/xpath_syntax.asp">W3Schools</a>), but hey - these topics are in 1st year of university. 
</blockquote>
<p>
He includes both some sample code (to fetch a titles and prices for cameras from bhphotovideo.com) and a link to a <a href="https://addons.mozilla.org/en-US/firefox/addon/1095">XPath checker</a> you can use to ensure that your query is correctly formatted. It's good that he also includes a quick reminder about the ethical issue with web scraping - it could be considered stealing depending on where the information comes from and who is providing it.
</p>]]></description>
      <pubDate>Wed, 18 Feb 2009 10:28:08 -0600</pubDate>
    </item>
    <item>
      <title><![CDATA[Raphael Stolt's Blog: Scraping websites with Zend_Dom_Query]]></title>
      <guid>http://www.phpdeveloper.org/news/11239</guid>
      <link>http://www.phpdeveloper.org/news/11239</link>
      <description><![CDATA[<p>
<i>Raphael Stolt</i> has a <a href="http://raphaelstolt.blogspot.com/2008/10/scraping-websites-with-zenddomquery.html">new blog post</a> today with a tutorial showing how to take the Zend_Dom_Query component out of the Zend Framework and use it to scrape content from another web site.
</p>
<blockquote>
Today I stumbled upon an interesting and reportable scenario were I had to extract information of the weekly published Drum and Bass <a href="http://www.bbc.co.uk/1xtra/drumbass/chart/">charts</a> provided by BBC 1Xtra. As this information currently isn't available in any consumer friendly format like for example a RSS feed, I had to go that scraping route but didn't want to hustle with a regex approach. Since version 1.6.0 the <A href="http://framework.zend.com/manual/en/zend.dom.query.html">Zend_Dom_Query</a> component has been added to the framework mainly to support functional testing of MVC applications, but it also can be used for rolling custom website scrapers in a snap. Woot, perfect match!
</blockquote>
<p>
He includes the code for his Bbc_DnbCharts_Scraper class he's created to show how the data is pulled in (via curl) and pushed into an object to be parsed.
<p>]]></description>
      <pubDate>Fri, 17 Oct 2008 14:31:34 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Developer Tutorials Blog: Easy Screen Scraping in PHP with the Simple HTML DOM Library]]></title>
      <guid>http://www.phpdeveloper.org/news/10771</guid>
      <link>http://www.phpdeveloper.org/news/10771</link>
      <description><![CDATA[<p>
On the Developer Tutorials blog today <i>Akash Mehta</i> looks at screen scraping using a <a href="http://simplehtmldom.sourceforge.net/">simple DOM library</a> to help make it easy.
</p>
<blockquote>
In PHP, regular expressions tend to get rather messy, DOM calls can be confusing and verbose, and often the string functions just aren't enough. In this tutorial, I'll show you how to use the middle ground - the open source PHP Simple HTML DOM Parser library, which provides jQuery-grade awesomeness for easy screen scraping without messy regular expressions.
</blockquote>
<p>
In his <a href="http://www.developertutorials.com/blog/php/easy-screen-scraping-in-php-simple-html-dom-library-simplehtmldom-398/">examples</a> he shows how to grab the contents of a page (google.com), find and replace information in the HTML and perform a query on the data (like finding each of the news stories out of the front page of slashdot.org).
</p>]]></description>
      <pubDate>Wed, 06 Aug 2008 08:40:15 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Matthew Turland's Blog: How-To (and How-Not-To) on Web Scraping]]></title>
      <guid>http://www.phpdeveloper.org/news/9798</guid>
      <link>http://www.phpdeveloper.org/news/9798</link>
      <description><![CDATA[<p>
<i>Matthew Turland</i> has a few things to say about web scraping (and <a href="http://php.dzone.com/news/writing-website-scrapers-php">recent articles</a> covering it) on <a href="http://ishouldbecoding.com/2008/03/12/scraping-html-with-dom">his blog today</a> as an author of a previous article published in <a href="http://www.phparch.com">php|architect</a> covering the same topic:
</p>
<blockquote>
A friend of mine who shall remain nameless pointed a <a href="http://php.dzone.com/news/writing-website-scrapers-php">post</a> out to me on the <a href="http://php.dzone.com/">PHP DZone</a> web site recently. Noting that the article's content was misinformed at best and downright ignorant at worst, even when examining it sheerly from the author's knowledge of PHP as a language, this friend asked that I set the author straight.
</blockquote>
<p>
He mentions his <a href="http://php.dzone.com/news/writing-website-scrapers-php#comment-1497">comments</a> on the post correcting the author on some points as well as a more "clued in" <a href="http://www.xml.lt/Blog/2008/03/11/Scraping+html+with+DOM">post</a> on the xml.lt website talking about using PHP's DOM functionality instead.
</p>]]></description>
      <pubDate>Fri, 14 Mar 2008 11:18:44 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[DeveloperTutorials.com: Scraping Links With PHP  ]]></title>
      <guid>http://www.phpdeveloper.org/news/9414</guid>
      <link>http://www.phpdeveloper.org/news/9414</link>
      <description><![CDATA[<p>
The Developer Tutorials site has posted a new article <a href="http://www.developertutorials.com/tutorials/php/scraping-links-with-php-8-01-05/page1.html">covering the creation</a> of a small application that can help you scrape content from a remote page and pull it into your script.
</p>
<blockquote>
In this tutorial you will learn how to build a PHP script that <a href="http://en.wikipedia.org/wiki/Web_scraping">scrapes</a> links from any web page.
</blockquote>
<p>
You'll learn to use cURL, the DOM functions, XPath and a bit of MySQL to get the job done. It's nice to see that they also include <a href="http://www.developertutorials.com/tutorials/php/scraping-links-with-php-8-01-05/page9.html">a section</a> looking at one of the more touchy aspects of web page scraping - "is it legal?"
</p>]]></description>
      <pubDate>Mon, 14 Jan 2008 08:44:00 -0600</pubDate>
    </item>
    <item>
      <title><![CDATA[Matthew Turland's Blog: Web Scraping Article Published]]></title>
      <guid>http://www.phpdeveloper.org/news/9296</guid>
      <link>http://www.phpdeveloper.org/news/9296</link>
      <description><![CDATA[<p>
<i>Matthew Turland</i> has <a href="http://ishouldbecoding.com/2007/12/20/web-scraping-article-published/">mentioned his article</a> in the latest issue of <a href="http://www.phparch.com">php|architect magazine</a> (the <a href="http://www.phparch.com/c/magazine/issue/63">December 2007 issue</a>) covering web scraping.
</p>
<blockquote>
Just a quick post to announce (albeit a little late) that the <a href="http://www.phparch.com/c/magazine/issue/63">December 2007 issue of php|architect</a>, which includes my article on web scraping. Please buy a copy, give it a read, and feel free to post comments on the <a href="http://forum.phparch.com/421">forum thread</a> for the article. I'd love to hear some reader feedback!
</blockquote>
<p>
Other articles in <a href="http://www.phparch.com/c/magazine/issue/63">the issue</a> include "Writing an Object/Relational Mapper in PHP", "Understanding CLI" and the cover story "Going native with mysqlnd".
</p>]]></description>
      <pubDate>Fri, 21 Dec 2007 10:22:00 -0600</pubDate>
    </item>
  </channel>
</rss>
