Scraping content from other sites (while slightly controversial) can be a helpful way to pull information into your application without the overhead of manual interaction. In this new post to his site Sameer Borate shows how to use a DOM parser to extract data from a remote site.
In this post I have explained some elements to scrap data from external websites. Simple HTML DOM parser is a PHP 5+ class which is useful to manipulate HTML elements. This class can work with both valid HTML and HTML pages that do not pass W3C validation. You can find elements by ids, classes, tags and many more. You can also add, delete or alter DOM elements. The only one thing you should care about is memory leaks – but you can avoid memory leaks as explained later.
He starts by walking through some of the basics of creating a new instance of the class and loading the content (either as a string or as a file) to be parsed. He then give several examples of how to query the contents of the document and locate multiple or single elements (including the use of CSS-type selectors for fuzzy attribute matching). He finishes out the post showing how to access element attributes and append content back to the original HTML.