From Justin Laing over at Merchant OS there's a new tutorial on creating a simple link scraper with the help of PHP and the cURL extension.
In this tutorial you will learn how to build a PHP script that scrapes links from any web page. You learn how to use cURL, call PHP DOM functions, use XPath and store the links in MySQL.
You'll have to have PHP5 and the cURL extension enabled on your web server to make it all work, but the code is all there ready for you to cut and paste. The application grabs the page with cURL (including the possibility to fake your user agent), parses through the HTML with the DOM and XPath functionality to grab the links and uses the MySQL methods to store them into your database.