Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Codeception Blog:
The Locator Class
Sep 27, 2012 @ 14:50:34

The Codeception blog (a BDD testing tool) has a new post on using their Locator class to create more complex tests based on XPath or CSS selectors.

In the latest Codeception 1.1.4 new Locator class was introduced. Basically it should simplify your life in writing complex XPath or CSS locators. Right now it has minimal, yet useful functionality.

The post shows how to combine more than one item for location with either a tag, CSS-based or XPath-based selector. It also shows how to use the "tabindex" method to simulate the movement through a page using the "Tab" key. Finally, there's a quick piece of code showing how to locate an item based on the "href" value. For more information on using Codeception, check out their documentation or the project's main site.

tagged: locator class css xpath combine tab href

Link:

James Morris' Blog:
Parsing HTML with DOMDocument and DOMXPath::Query
Jun 27, 2012 @ 15:19:35

In the latest post to his blog James Morris looks at using XPath's query() function to locate pieces of data in your XML.

The other day I needed to do some html scraping to trim out some repeated data stuck inside nested divs and produce a simplified array of said data. My first port of call was SimpleXML which I have used many times. However this time, the son of a bitch just wouldn’t work with me and kept on throwing up parsing errors. I lost my patience with it and decided to give DomDocument and DOMXpath a go which I’d heard of but never used.

He includes a code (and XML document) example showing how to extract out some content from an HTML structure - grabbing each of the images from inside a div and associating them with their description content.

tagged: dom domdocument domxpath xpath tutorial html

Link:

PHPMaster.com:
PHP DOM: Using XPath
Jun 26, 2012 @ 13:16:08

On PHPMaster.com today there's a new tutorial showing you how to use the XPath functionality that's built into PHP's DOM functionality to query your XML.

In a recent article I discussed PHP’s implementation of the DOM and introduced various functions to pull data from and manipulate an XML structure. I also briefly mentioned XPath, but didn’t have much space to discuss it. In this article, we’ll look closer at XPath, how it functions, and how it is implemented in PHP. You’ll find that XPath can greatly reduce the amount of code you have to write to query and filter XML data, and will often yield better performance as well.

They start with some basic XPath queries to find a simple path and locating the record for a specific book. There's also an example of using XPath versus the "find" functions in the DOM functionality (like getElementsByTagName). There's also a bit close to the end about using functions in XPath and how you can pull back in PHP functionality and use native PHP functions in your XPath queries.

tagged: xpath tutorial dom introduction

Link:

Working Software Blog:
Escaping single and double quotes for use with XPath queries in PHP
Aug 19, 2011 @ 18:50:14

On the Working Software blog there's a new post showing a solution to a issue with escaping quotes in XPath queries that's not just an issue in PHP.

I've been working with the Basecamp API to plugin our IRC bot that we use for time tracking and I'm astounded to learn that escaping single and/or double quotes for XPath queries in PHP does not have a well documented, best practices solution. In fact, it seems as though this is not peculiar to PHP. I took a look around and found this excellent article by "Kushal": http://kushalm.com/the-perils-of-xpath-expressions-specifically-escaping-quotes.

He's put together his own (PHP) solution to the problem - running the entire XPath query through a filtering method that splits it up, replaces the quote characters and combines it back down to a single string.

tagged: escape quote double single xpath query tutorial

Link:

Script-Tutorials.com:
How to parse web pages using XPath
Apr 21, 2011 @ 14:49:59

On the Script-Tutorials.com site there's a new article showing you how to use XPath to parse web pages, complete with screenshots of the page and the code to make it happen.

Today I will tell you how you can make parsers of remote HTML pages (in PHP). In this article I will show you how to perform xpath queries to Web pages. XPath – a query language to elements of xml or xhtml document. To obtain the necessary data, we just need to create the necessary query. For the work, we also need: browser Mozilla Firefox, firebug and firepath plugins. For our experiment, I suggest this webpage Google Sci/Tech News. Of course you can choose any other web page too.

They provide two demos and a downloadable package with everything you need. The script pulls in the page as a DOM document (which works as long as it's correctly formatted XML) and spits back out the matches from a few different XPath expressions. There's all sorts of sites out there that can help you with examples of other XPath expressions and syntax.

tagged: xpath tutorial parse html website

Link:

Qafoo.com:
Practical PHPUnit: Testing XML generation
Sep 17, 2010 @ 18:51:02

On the Qafoo blog today there's a new post from Tobias Schlitt about a method you can use to unit test methods that generate XML without messing with a lot of extra overhead just to test the results.

Testing classes which generate XML can be a cumbersome work. At least, if you don't know the right tricks to make your life easier. In this article, I will throw some light upon different approaches and show you, how XML generation can be tested quite easily using XPath.

He includes a sample class, qaPersonVisitor, that has methods inside it to create a simple XML documents based on the first and last name data into a DOM element. He sets up the basic test case that creates a simple person - including gender and date of birth - and offer a few different suggestions on handling the check (in PHPUnit tests):

  • the naive way of rebuilding the DOM object and assert that they are equal
  • testing the resulting XML from the DOM object against a pre-generated XML document
  • matching the contents via CSS selectors
  • using the tag matching assertions
  • using XPath in a custom assertion (with short and long uses of it included)
tagged: unittest phpunit xml generation xpath dom

Link:

Thomas Weinert's Blog:
Using PHP DOM With XPath
Apr 13, 2010 @ 18:18:32

Thomas Weinert has a recent post to his blog showing how to use one of the more powerful XML-handling features that PHP's DOM extension includes - XPath.

Often I hear people say "We use SimpleXML, because DOM is so noisy and complex". Well, I don't think so. This article explains how you can parse a XML (an Atom feed) using the PHP DOM extension. No other libraries are involved.

In his example he loads an external feed (his own) into a DOM object, blocks any errors with a few handy functions and creates a DOMXPath object on the DOM object to get ready for his queries. He shows how to make searches for titles, subtitles, looping over attributes and an element list returned from one of the first queries. A full code listing is also provided to show how it all fits together.

tagged: dom xpath domxpath tutorial search atom

Link:

PHPBuilder.com:
Build a PHP Link Scraper with cURL
Jan 15, 2010 @ 16:02:45

On PHPBuilder.com today there's a new tutorial posted about building a link scraping script with the combination of PHP and cURL (the script pulls in a page, grabs all of the links off of it and follows them, etc).

I actually built this a few years ago because I had grandiose visions of becoming the next Google. Clearly, that did not happen, mostly because my localhost, database, and bandwidth are not infinite. Yet this little robot has quite interesting applications and uses if you really have the time to play with and fine-tune it.

You'll need to have cURL support built into your PHP installation to get the scripts working, but the actual code itself is pretty simple. Curl and XPath do most of the heavy lifting of finding and following the links and its easy enough to drop them into a MySQL table from there. You can download the source here.

tagged: link scraper curl xpath mysql tutorial

Link:

Juozas Kaziukenas' Blog:
Web scraping with PHP and XPath
Feb 18, 2009 @ 16:28:08

In this new post to his blog Juozas Kaziukenas takes a look at one method for getting the information out of a remote page - parsing it with PHP and XPath (assuming the page is correctly formatted).

When I was writing about how I use web scraping, I was still hadn’t tried using Xpath (shame on me). [...] It turned out, that using Xpath is extremely easy, really. When you master it, you can do everything in seconds. Yes, you need to know how XML works and how to write correct Xpath queries (brief explanation of Xpath syntax is available at W3Schools), but hey - these topics are in 1st year of university.

He includes both some sample code (to fetch a titles and prices for cameras from bhphotovideo.com) and a link to a XPath checker you can use to ensure that your query is correctly formatted. It's good that he also includes a quick reminder about the ethical issue with web scraping - it could be considered stealing depending on where the information comes from and who is providing it.

tagged: web scraping xpath tutorial price title ethical steal information

Link:

Community News:
XPath Explained (by Tobias Schlitt and Jacob Westhoff)
Feb 12, 2009 @ 16:28:04

Tobias Schlitt and Jacob Westhoff have written up and article (and posted it over on Tobias' blog) that wants to help you understand XPath better - whether you're a novice or have been using it for a while.

This paper will give an overview on XPath an addressing language for XML documents. XPath is a W3C recommendation currently in version 1.0. XPath was created in relation to the XSL recommendation and is intended to be used with XSLT and XPointer. Beside that, XPath can be used in a variety of programming languages, commonly in combination with a DOM API.

The article starts with an introduction to the concept of XPath, moves on to addressing, talks about XPath axes, mentions functions/operators/conditions and looks at styling XML with XPath and XSLT.

tagged: xpath xslt address axes function operator condition

Link:


Trending Topics: