 | News Feed |
 | Jobs Feed |
Sections
|
| feed this: |  |
Script-Tutorials.com: How to parse web pages using XPath
by Chris Cornutt April 21, 2011 @ 09:49:59
On the Script-Tutorials.com site there's a new article showing you how to use XPath to parse web pages, complete with screenshots of the page and the code to make it happen.
Today I will tell you how you can make parsers of remote HTML pages (in PHP). In this article I will show you how to perform xpath queries to Web pages. XPath - a query language to elements of xml or xhtml document. To obtain the necessary data, we just need to create the necessary query. For the work, we also need: browser Mozilla Firefox, firebug and firepath plugins. For our experiment, I suggest this webpage Google Sci/Tech News. Of course you can choose any other web page too.
They provide two demos and a downloadable package with everything you need. The script pulls in the page as a DOM document (which works as long as it's correctly formatted XML) and spits back out the matches from a few different XPath expressions. There's all sorts of sites out there that can help you with examples of other XPath expressions and syntax.
voice your opinion now!
xpath tutorial parse html website
Qafoo.com: Practical PHPUnit Testing XML generation
by Chris Cornutt September 17, 2010 @ 13:51:02
On the Qafoo blog today there's a new post from Tobias Schlitt about a method you can use to unit test methods that generate XML without messing with a lot of extra overhead just to test the results.
Testing classes which generate XML can be a cumbersome work. At least, if you don't know the right tricks to make your life easier. In this article, I will throw some light upon different approaches and show you, how XML generation can be tested quite easily using XPath.
He includes a sample class, qaPersonVisitor, that has methods inside it to create a simple XML documents based on the first and last name data into a DOM element. He sets up the basic test case that creates a simple person - including gender and date of birth - and offer a few different suggestions on handling the check (in PHPUnit tests):
- the naive way of rebuilding the DOM object and assert that they are equal
- testing the resulting XML from the DOM object against a pre-generated XML document
- matching the contents via CSS selectors
- using the tag matching assertions
- using XPath in a custom assertion (with short and long uses of it included)
voice your opinion now!
unittest phpunit xml generation xpath dom
Thomas Weinert's Blog: Using PHP DOM With XPath
by Chris Cornutt April 13, 2010 @ 13:18:32
Thomas Weinert has a recent post to his blog showing how to use one of the more powerful XML-handling features that PHP's DOM extension includes - XPath.
Often I hear people say "We use SimpleXML, because DOM is so noisy and complex". Well, I don't think so. This article explains how you can parse a XML (an Atom feed) using the PHP DOM extension. No other libraries are involved.
In his example he loads an external feed (his own) into a DOM object, blocks any errors with a few handy functions and creates a DOMXPath object on the DOM object to get ready for his queries. He shows how to make searches for titles, subtitles, looping over attributes and an element list returned from one of the first queries. A full code listing is also provided to show how it all fits together.
voice your opinion now!
dom xpath domxpath tutorial search atom
PHPBuilder.com: Build a PHP Link Scraper with cURL
by Chris Cornutt January 15, 2010 @ 10:02:45
On PHPBuilder.com today there's a new tutorial posted about building a link scraping script with the combination of PHP and cURL (the script pulls in a page, grabs all of the links off of it and follows them, etc).
I actually built this a few years ago because I had grandiose visions of becoming the next Google. Clearly, that did not happen, mostly because my localhost, database, and bandwidth are not infinite. Yet this little robot has quite interesting applications and uses if you really have the time to play with and fine-tune it.
You'll need to have cURL support built into your PHP installation to get the scripts working, but the actual code itself is pretty simple. Curl and XPath do most of the heavy lifting of finding and following the links and its easy enough to drop them into a MySQL table from there. You can download the source here.
voice your opinion now!
link scraper curl xpath mysql tutorial
Juozas Kaziukenas' Blog: Web scraping with PHP and XPath
by Chris Cornutt February 18, 2009 @ 10:28:08
In this new post to his blog Juozas Kaziukenas takes a look at one method for getting the information out of a remote page - parsing it with PHP and XPath (assuming the page is correctly formatted).
When I was writing about how I use web scraping, I was still hadn't tried using Xpath (shame on me). [...] It turned out, that using Xpath is extremely easy, really. When you master it, you can do everything in seconds. Yes, you need to know how XML works and how to write correct Xpath queries (brief explanation of Xpath syntax is available at W3Schools), but hey - these topics are in 1st year of university.
He includes both some sample code (to fetch a titles and prices for cameras from bhphotovideo.com) and a link to a XPath checker you can use to ensure that your query is correctly formatted. It's good that he also includes a quick reminder about the ethical issue with web scraping - it could be considered stealing depending on where the information comes from and who is providing it.
voice your opinion now!
web scraping xpath tutorial price title ethical steal information
PHP in Action: Get links with XPath
by Chris Cornutt October 07, 2008 @ 08:43:33
In response to this tutorial over on the PHPro.org website a new post has been made to the PHP in Action blog with an "even cooler" way to do the same sort of DOM fetching - XPath.
I'm a little bit surprised at the claim [the tutorial makes] that it's the "correct" (only) way, since there's at least one more that I find even cooler: XPath. Admittedly, it's slower, yet it's a more powerful language.
A quick example is included, building up from a simple search for anchor tags up to a custom query looking for just the anchor tags with a class of "bookmark".
voice your opinion now!
xpath tutorial dom link find search anchor attribute
Debuggable Blog: XPath on PHP Arrays (Setextract)
by Chris Cornutt September 26, 2008 @ 10:25:23
On the Debuggable blog there's an interesting post where Felix talks a bit about something I've seen requested quite a bit - a method for locating information in an array. His answer is an XPath-style query system to root out your custom information.
One of the requirements [of the original Set::extract method] was that the new method would need to be faster or at least as fast as the old implementation. My first attempts were big failures. Not only did the solutions I came up with contain tons of bugs. No, they were are also a lot slower the old extract function. A few benchmarks later and I discovered the biggest bottleneck in my implementation: Recursiveness.
He notes that no doing things recursively (not just in this situation, but ever) can help with a speed boost. In his example, a small change made all the differences and the XPath implementation in the CakePHP core makes grabbing information from any array simple.
While the implementation does not support full XPath (and probably won't in future), feel free to make suggestions on additional selectors or the idea in general.
voice your opinion now!
xpath array cakephp framework search recursion
|
Community Events
Don't see your event here? Let us know!
|