Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

SitePoint PHP Blog:
Image Scraping with Symfony's DomCrawler
Mar 31, 2014 @ 14:06:43

On the SitePoint PHP blog today there's a new post showing you how to use the Symfony DomCrawler component to scrape content, images mostly, from a remote website. The DomCrawler is one component of the Symfony framework.

A photographer friend of mine implored me to find and download images of picture frames from the internet. I eventually landed on a web page that had a number of them available for free but there was a problem: a link to download all the images together wasn't present. I didn't want to go through the stress of downloading the images individually, so I wrote this PHP class to find, download and zip all images found on the website.

He talks briefly about how the class works and then gets into the contents of the class. He walks through all the code and explains in chunks what each part does in the lifecycle of the request. The end result is a Zip archive file of all images from the remote website, packaged up for easy transport.

tagged: domcrawler symfony framework component tutorial image scrape

Link: http://www.sitepoint.com/image-scraping-symfonys-domcrawler/

php|architect:
Introducing four new PHP 5.3 components and Goutte, a simple web scraper
Apr 23, 2010 @ 19:50:18

On the php|architect blog today there's a new post from Ryan Weaver about some of the new components that've been added to the Symfony framework - BrowserKit, CssSelector, DomCrawler, Process and a project that uses them all, Goutte.

To support symfony 2’s development, Fabien Potencier – the lead developer of the symfony framework – has released four new PHP 5.3 based components. [...] Though these components will be used by Symfony 2, they’re built to be standalone components that can be easily used in any PHP 5.3 project.

The post lists each of them out including features and some sample code as well as a brief look (mostly code) at using to locate specific contents in a page.

tagged: browserkit cssselector domcrawler process symfony component goutte

Link:


Trending Topics: