News Feed
Sections




News Archive
feed this:

Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Phil Sturgeon:
Benchmarking Codswallop NodeJS v PHP
November 12, 2013 @ 09:21:29

Phil Sturgeonhas posted about some Node.js vs PHP benchmarks that someone linked him to concerning web scraping. The article suggests that Node.js "owns" PHP when it comes to this but, as Phil finds out, there's a bit more to the story than that.

Sometimes people link me to articles and ask for my opinions. This one was a real doozy. Oh goody, a framework versus language post. Let's try and chew through this probable linkbait [where] we're benchmarking NodeJS v PHP. Weird, but I'll go along with it. Well, now we're testing cheerio v PhpQuery which is a bit different, but fine, let's go along with it.

Through a little discovery, Phil noticed phpQuery using file_get_contents, a blocking method for fetching the remote pages to scrape. Node.js instead uses a non-blocking method, meaning multiple files can be fetched at the same time. In answer to this blocking vs non-blocking, he decided to run benchamrks against a few cases - Node.js/Cherrio, PHP/phpQuery and his own, more correct comparison to the Node option - PHP/ReactPHP/phpQuery. He's shared his results, showing a major difference between the straight phpQuery and the React-based version.

It seems likely to me that people just assume PHP can't do this stuff, because by default most people arse around PHP with things like MAMP, or on their shitty web-host where is is hard to install things and as such get used to writing PHP without utilizing many extensions. It is probably exactly this which makes people think PHP just can't do something, when it easily can.
0 comments voice your opinion now!
nodejs reactphp webpage scraping benchmark compare

Link: http://philsturgeon.co.uk/blog/2013/11/benchmarking-codswallop-nodejs-v-php

Gary Sieling:
Scraping Google Maps Search Results with Javascript and PHP
July 29, 2013 @ 12:23:21

Gary Sieling has a new post to his site about scraping Google Maps data with a combination of PHP and some simple Javascript. It makes use of callbacks and timers to get the data already returned from their API.

Google Maps provides several useful APIs for accessing data: a geocoding API to convert addresses to latitude and longitude, a search API to provide locations matching a term, and a details API for retrieving location metadata. For many mapping tasks it is valuable to get a large list of locations (restaurants, churches, etc) - since this is valuable, Google places a rate limiter on the information, and encourages caching query results.

He includes the code (both front- and back-end) that you'll need to make the system work. It makes a request to the Google Maps API as usual but then adds a listener with a callback. This takes the latitude/longitude data and runs a "get details" method to get more information. The result is then POSTed to PHP and written out to a file.

0 comments voice your opinion now!
googlemaps google search results scraping api javascript tutorial

Link: http://garysieling.com/blog/scraping-google-maps-search-results-with-javascript-and-php

Robert Basic's Blog:
Book review - Guide to Web Scraping with PHP
June 01, 2011 @ 09:28:42

In this new post to his blog Robert Basic has a review of a book from php|architect (by Matthew Turland), "Guide to Web Scraping with PHP".

It took me a while to grab myself a copy of Matthew Turland's "Guide to Web Scraping with PHP", but a few weeks ago a copy finally arrived and I had the pleasure of reading it. [...] My overall impression of the book is that it was worth the time and I'm really glad that I bought it. Matthew did a great job explaining all the tools we have at our disposal for writing web scrapers and how to use them.

He talks about the content of a few specific chapters (the HTTP protocol, client libraries you can use and how to prepare documents for parsing) and notes that there's not much bad he can think of about the book:

It is a guide, clear and straight-to-the-point, explaining what tools are there, which one to use and how for writing scrapers and that's exactly what I wanted to know.
0 comments voice your opinion now!
web scraping book review matthewturland


Matthew Turland's Blog:
"Web Scraping with PHP" Now Available in Print!
September 20, 2010 @ 12:03:49

If you've been waiting for the print edition of Matthew Turland's "Web Scraping with PHP" book (from php|architect Press) your wait is over. According to a new post on his blog the print version is now available for order.

I know a number of my readers have been waiting for this announcement: my book, Web Scraping with PHP, is now available for sale in hard copy form! That's right, you can now finally order your very own print edition copy. [...] To those who felt forced into buying the PDF edition to get access to the content because a print edition was not available until now, you have my most sincere and profound apologies.

His web scraping book covers topics like understanding HTTP requests on a base level, working with several HTTP clients like cURL, pecl_http, Zend_Http_Client and how to analyze the remote page's information with things like SimpleXML, the DOM functions and the XMLReader extension. If the print version's not your thing, you can still get the PDF from the php|architect store too.

0 comments voice your opinion now!
webscraping scraping book phparchitect matthewturland pdf


Community News:
php|architect Releases "Guide to Web Scraping"
April 22, 2010 @ 08:25:36

php|architect has officially released one of their latest guides - this time it's Matthew Turland's "Guide to Web Scraping".

Matthew talks a bit about it in his latest blog entry:

What I'm announcing in this blog post has been in the works since early 2008 when I first pitched the idea. It was rejected by several major publishers who basically said the same thing: the idea was in too small of a niche or simply wasn't marketable. php|architect Press respectfully disagreed with them and decided to publish what is now a book written by me that you can purchase.

The book covers all things related to pulling content from remote pages including an understanding of HTTP codes, a look at tools you can use (including cURL, pecl_http and Zend_Http_Client) and how to use technologies like DOM, SimpleXML and regular expressions to match content.

0 comments voice your opinion now!
guide web scraping book release matthewturland


Sameer Borate's Blog:
Web scraping tutorial
March 09, 2009 @ 07:52:43

In a new tutorial on his blog today, Sameer shows a library that you can use (simplehtmldom) to parse remote sites and pull out just the information you need (aka "web scraping").

There are three ways to access a website data. One is through a browser, the other is using a API (if the site provides one) and the last by parsing the web pages through code. The last one also known as Web Scraping is a technique of extracting information from websites using specially coded programs. In this post we will take a quick look at writing a simple scraper using the simplehtmldom library.

His three (really more) step process guides yo through installing the library, installing Firebug and some example code to create your first scraper - an example that pulls some of the "Featured Links" from the Google search results sidebar. The second example illustrates grabbing the list of the table of contents from the most recent issue of Wired.

5 comments voice your opinion now!
web scraping tutorial simplehtmldom google search results wired tableofcontents


Juozas Kaziukenas' Blog:
Web scraping with PHP and XPath
February 18, 2009 @ 10:28:08

In this new post to his blog Juozas Kaziukenas takes a look at one method for getting the information out of a remote page - parsing it with PHP and XPath (assuming the page is correctly formatted).

When I was writing about how I use web scraping, I was still hadn't tried using Xpath (shame on me). [...] It turned out, that using Xpath is extremely easy, really. When you master it, you can do everything in seconds. Yes, you need to know how XML works and how to write correct Xpath queries (brief explanation of Xpath syntax is available at W3Schools), but hey - these topics are in 1st year of university.

He includes both some sample code (to fetch a titles and prices for cameras from bhphotovideo.com) and a link to a XPath checker you can use to ensure that your query is correctly formatted. It's good that he also includes a quick reminder about the ethical issue with web scraping - it could be considered stealing depending on where the information comes from and who is providing it.

1 comment voice your opinion now!
web scraping xpath tutorial price title ethical steal information


Raphael Stolt's Blog:
Scraping websites with Zend_Dom_Query
October 17, 2008 @ 14:31:34

Raphael Stolt has a new blog post today with a tutorial showing how to take the Zend_Dom_Query component out of the Zend Framework and use it to scrape content from another web site.

Today I stumbled upon an interesting and reportable scenario were I had to extract information of the weekly published Drum and Bass charts provided by BBC 1Xtra. As this information currently isn't available in any consumer friendly format like for example a RSS feed, I had to go that scraping route but didn't want to hustle with a regex approach. Since version 1.6.0 the Zend_Dom_Query component has been added to the framework mainly to support functional testing of MVC applications, but it also can be used for rolling custom website scrapers in a snap. Woot, perfect match!

He includes the code for his Bbc_DnbCharts_Scraper class he's created to show how the data is pulled in (via curl) and pushed into an object to be parsed.

1 comment voice your opinion now!
scraping website zendframework zenddomquery component tutorial


Developer Tutorials Blog:
Easy Screen Scraping in PHP with the Simple HTML DOM Library
August 06, 2008 @ 08:40:15

On the Developer Tutorials blog today Akash Mehta looks at screen scraping using a simple DOM library to help make it easy.

In PHP, regular expressions tend to get rather messy, DOM calls can be confusing and verbose, and often the string functions just aren't enough. In this tutorial, I'll show you how to use the middle ground - the open source PHP Simple HTML DOM Parser library, which provides jQuery-grade awesomeness for easy screen scraping without messy regular expressions.

In his examples he shows how to grab the contents of a page (google.com), find and replace information in the HTML and perform a query on the data (like finding each of the news stories out of the front page of slashdot.org).

0 comments voice your opinion now!
screen scraping simple html dom library example tutorial


Matthew Turland's Blog:
How-To (and How-Not-To) on Web Scraping
March 14, 2008 @ 11:18:44

Matthew Turland has a few things to say about web scraping (and recent articles covering it) on his blog today as an author of a previous article published in php|architect covering the same topic:

A friend of mine who shall remain nameless pointed a post out to me on the PHP DZone web site recently. Noting that the article's content was misinformed at best and downright ignorant at worst, even when examining it sheerly from the author's knowledge of PHP as a language, this friend asked that I set the author straight.

He mentions his comments on the post correcting the author on some points as well as a more "clued in" post on the xml.lt website talking about using PHP's DOM functionality instead.

0 comments voice your opinion now!
web scraping howto correction article misinformed



Community Events





Don't see your event here?
Let us know!


list series conference opinion deployment interview threedevsandamaybe framework introduction language release community symfony laravel zendserver podcast api bugfix library tips

All content copyright, 2014 PHPDeveloper.org :: info@phpdeveloper.org - Powered by the Solar PHP Framework