 | News Feed |
 | Jobs Feed |
Sections
|
| feed this: |  |
Robert Basic's Blog: Book review - Guide to Web Scraping with PHP
by Chris Cornutt June 01, 2011 @ 09:28:42
In this new post to his blog Robert Basic has a review of a book from php|architect (by Matthew Turland), "Guide to Web Scraping with PHP".
It took me a while to grab myself a copy of Matthew Turland's "Guide to Web Scraping with PHP", but a few weeks ago a copy finally arrived and I had the pleasure of reading it. [...] My overall impression of the book is that it was worth the time and I'm really glad that I bought it. Matthew did a great job explaining all the tools we have at our disposal for writing web scrapers and how to use them.
He talks about the content of a few specific chapters (the HTTP protocol, client libraries you can use and how to prepare documents for parsing) and notes that there's not much bad he can think of about the book:
It is a guide, clear and straight-to-the-point, explaining what tools are there, which one to use and how for writing scrapers and that's exactly what I wanted to know.
voice your opinion now!
web scraping book review matthewturland
Matthew Turland's Blog: "Web Scraping with PHP" Now Available in Print!
by Chris Cornutt September 20, 2010 @ 12:03:49
If you've been waiting for the print edition of Matthew Turland's "Web Scraping with PHP" book (from php|architect Press) your wait is over. According to a new post on his blog the print version is now available for order.
I know a number of my readers have been waiting for this announcement: my book, Web Scraping with PHP, is now available for sale in hard copy form! That's right, you can now finally order your very own print edition copy. [...] To those who felt forced into buying the PDF edition to get access to the content because a print edition was not available until now, you have my most sincere and profound apologies.
His web scraping book covers topics like understanding HTTP requests on a base level, working with several HTTP clients like cURL, pecl_http, Zend_Http_Client and how to analyze the remote page's information with things like SimpleXML, the DOM functions and the XMLReader extension. If the print version's not your thing, you can still get the PDF from the php|architect store too.
voice your opinion now!
webscraping scraping book phparchitect matthewturland pdf
Community News: php|architect Releases "Guide to Web Scraping"
by Chris Cornutt April 22, 2010 @ 08:25:36
php|architect has officially released one of their latest guides - this time it's Matthew Turland's "Guide to Web Scraping".
Matthew talks a bit about it in his latest blog entry:
What I'm announcing in this blog post has been in the works since early 2008 when I first pitched the idea. It was rejected by several major publishers who basically said the same thing: the idea was in too small of a niche or simply wasn't marketable. php|architect Press respectfully disagreed with them and decided to publish what is now a book written by me that you can purchase.
The book covers all things related to pulling content from remote pages including an understanding of HTTP codes, a look at tools you can use (including cURL, pecl_http and Zend_Http_Client) and how to use technologies like DOM, SimpleXML and regular expressions to match content.
voice your opinion now!
guide web scraping book release matthewturland
Sameer Borate's Blog: Web scraping tutorial
by Chris Cornutt March 09, 2009 @ 07:52:43
In a new tutorial on his blog today, Sameer shows a library that you can use (simplehtmldom) to parse remote sites and pull out just the information you need (aka "web scraping").
There are three ways to access a website data. One is through a browser, the other is using a API (if the site provides one) and the last by parsing the web pages through code. The last one also known as Web Scraping is a technique of extracting information from websites using specially coded programs. In this post we will take a quick look at writing a simple scraper using the simplehtmldom library.
His three (really more) step process guides yo through installing the library, installing Firebug and some example code to create your first scraper - an example that pulls some of the "Featured Links" from the Google search results sidebar. The second example illustrates grabbing the list of the table of contents from the most recent issue of Wired.
voice your opinion now!
web scraping tutorial simplehtmldom google search results wired tableofcontents
Juozas Kaziukenas' Blog: Web scraping with PHP and XPath
by Chris Cornutt February 18, 2009 @ 10:28:08
In this new post to his blog Juozas Kaziukenas takes a look at one method for getting the information out of a remote page - parsing it with PHP and XPath (assuming the page is correctly formatted).
When I was writing about how I use web scraping, I was still hadn't tried using Xpath (shame on me). [...] It turned out, that using Xpath is extremely easy, really. When you master it, you can do everything in seconds. Yes, you need to know how XML works and how to write correct Xpath queries (brief explanation of Xpath syntax is available at W3Schools), but hey - these topics are in 1st year of university.
He includes both some sample code (to fetch a titles and prices for cameras from bhphotovideo.com) and a link to a XPath checker you can use to ensure that your query is correctly formatted. It's good that he also includes a quick reminder about the ethical issue with web scraping - it could be considered stealing depending on where the information comes from and who is providing it.
voice your opinion now!
web scraping xpath tutorial price title ethical steal information
Raphael Stolt's Blog: Scraping websites with Zend_Dom_Query
by Chris Cornutt October 17, 2008 @ 14:31:34
Raphael Stolt has a new blog post today with a tutorial showing how to take the Zend_Dom_Query component out of the Zend Framework and use it to scrape content from another web site.
Today I stumbled upon an interesting and reportable scenario were I had to extract information of the weekly published Drum and Bass charts provided by BBC 1Xtra. As this information currently isn't available in any consumer friendly format like for example a RSS feed, I had to go that scraping route but didn't want to hustle with a regex approach. Since version 1.6.0 the Zend_Dom_Query component has been added to the framework mainly to support functional testing of MVC applications, but it also can be used for rolling custom website scrapers in a snap. Woot, perfect match!
He includes the code for his Bbc_DnbCharts_Scraper class he's created to show how the data is pulled in (via curl) and pushed into an object to be parsed.
voice your opinion now!
scraping website zendframework zenddomquery component tutorial
Developer Tutorials Blog: Easy Screen Scraping in PHP with the Simple HTML DOM Library
by Chris Cornutt August 06, 2008 @ 08:40:15
On the Developer Tutorials blog today Akash Mehta looks at screen scraping using a simple DOM library to help make it easy.
In PHP, regular expressions tend to get rather messy, DOM calls can be confusing and verbose, and often the string functions just aren't enough. In this tutorial, I'll show you how to use the middle ground - the open source PHP Simple HTML DOM Parser library, which provides jQuery-grade awesomeness for easy screen scraping without messy regular expressions.
In his examples he shows how to grab the contents of a page (google.com), find and replace information in the HTML and perform a query on the data (like finding each of the news stories out of the front page of slashdot.org).
voice your opinion now!
screen scraping simple html dom library example tutorial
|
Community Events
Don't see your event here? Let us know!
|