<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>PHPDeveloper.org</title>
    <link>http://www.phpdeveloper.org</link>
    <description>Up-to-the Minute PHP News, views and community</description>
    <language>en-us</language>
    <pubDate>Sun, 19 May 2013 13:18:55 -0500</pubDate>
    <ttl>30</ttl>
    <item>
      <title><![CDATA[Matthew Turland's Blog: Gotcha on Scraping .NET Applications with PHP and cURL]]></title>
      <guid>http://www.phpdeveloper.org/news/14730</guid>
      <link>http://www.phpdeveloper.org/news/14730</link>
      <description><![CDATA[<p>
New on his blog today <i>Matthew Turland</i> has <a href="http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl/">posted about a "gotcha"</a> he came across when working with cURL to pull down information (scrape content) from a remote .NET application.
</p>
<blockquote>
I recently wrote a PHP script to scrape data from a .NET application. In the process of developing this script, I noticed something interesting that I thought I'd share. In this case, I was using the cURL extension, but the tip isn't necessarily specific to that. One thing my script did was submit a POST request to simulate a form submission. [...] The issue I ran into had to do with a behavior of the CURLOPT_POSTFIELDS  setting that's easy to overlook.
</blockquote>
<p>
The problem was something cURL does automatically - change the header for the content type because you're sending an array. Thankfully, with the help of a call to <a href="http://php.net/http_build_query">http_build_query</a> to encode it correctly, the request will use the right headers.
</p>]]></description>
      <pubDate>Thu, 01 Jul 2010 08:51:36 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Juozas Kaziukenas' Blog: Scraping login requiring websites with cURL]]></title>
      <guid>http://www.phpdeveloper.org/news/12007</guid>
      <link>http://www.phpdeveloper.org/news/12007</link>
      <description><![CDATA[<p>
Several sites have areas that have content protected behind a login making them difficult to pull into a script. <i>Juozas Kaziukenas</i> has <a href="http://dev.juokaz.com/php/scraping-login-requiring-websites-with-curl">created an option</a> to help you past this hurdle - a PHP class (that uses cURL) that can POST the login data to the script and pull back the session ID.
</p>
<blockquote>
But how you are going to do all this work with cookies and session id? Luckily, PHP has <a href="http://uk.php.net/curl">cURL extension</a> which simplifies connecting to remote addresses, using cookies, staying in one session, POSTing data, etc. It's really powerful library, which basically allows you to use all HTTP headers functionality. For secure pages crawling, I've created very simple <a href="http://dev.juokaz.com/examples/crawler/crawler.phps">Secure_Crawler</a> class.
</blockquote>
<p>
The class uses the built-in cURL functionality to send the POST information (in this case the username and password, but it can be easily changed for whatever the form requires) and provides a get() method to use for fetching other pages once you're connected.
</p>]]></description>
      <pubDate>Tue, 24 Feb 2009 08:44:43 -0600</pubDate>
    </item>
    <item>
      <title><![CDATA[Hasin Hayder's Blog: Making a jobsite using PHP]]></title>
      <guid>http://www.phpdeveloper.org/news/9489</guid>
      <link>http://www.phpdeveloper.org/news/9489</link>
      <description><![CDATA[<p>
<i>Hasin Hayder</i> has <a href="http://hasin.wordpress.com/2008/01/24/making-a-jobsite-using-php/">started up a new project</a> that he's documented in a new blog entry - the creation of a new jobs website in PHP.
</p>
<blockquote>
I was involved in making a job site few days ago. During the development, I have studied how easily anyone can develop a job site using PHP (language independent in true sense) . So I decide to write a blog post about my experience and here it goes. But note that this article is not about scaling or balancing the load on your site during heavy traffic, heh heh.
</blockquote>
<p>
He comments on the startup process surrounding this type of site and <a href="http://hasin.wordpress.com/2008/01/24/making-a-jobsite-using-php/">makes suggestions</a> about something to consider for your careers site - pulling job content from other sites in two ways - screen scraping and using the job search APIs out there.
</p>]]></description>
      <pubDate>Thu, 24 Jan 2008 14:41:38 -0600</pubDate>
    </item>
    <item>
      <title><![CDATA[Jonathan Street's Blog: When scraping content from the web don't make it obvious]]></title>
      <guid>http://www.phpdeveloper.org/news/8992</guid>
      <link>http://www.phpdeveloper.org/news/8992</link>
      <description><![CDATA[<p>
<i>Jonathan Street</i> <a href="http://torrentialwebdev.com/blog/archives/125-When-scraping-content-from-the-web-dont-make-it-obvious.html">has a tip</a> for those developers out there that have no other choice than scraping content from a remote site - don't make it obvious. He also includes a suggestion on how to make it a little less obvious.
</p>
<blockquote>
A couple of hours ago I was playing around scraping some content from a website. All was going well until suddenly I couldn't get my script to fetch meaningful content. [...] The first thing I did was stop visiting the site for 15 minutes or so and then increase the time between requests. It briefly worked again but quickly stopped.
</blockquote>
<p>
One simple change to his user agent string in his php.ini made the problem evaporate pointing to a user agent filtering happening on the remote side. His helpful hint involves two methods - one in just PHP and the other in cURL - to change the user agent that your scripts are sending. An even better sort of solution might be some sort of rotating array that would alternate between four or five strings to make things even more random.
</p>]]></description>
      <pubDate>Wed, 07 Nov 2007 11:26:00 -0600</pubDate>
    </item>
    <item>
      <title><![CDATA[MakeBeta Blog: Scraping Links With PHP]]></title>
      <guid>http://www.phpdeveloper.org/news/8460</guid>
      <link>http://www.phpdeveloper.org/news/8460</link>
      <description><![CDATA[<p>
From <i>Justin Laing</i> over at Merchant OS there's a <a href="http://www.merchantos.com/makebeta/php/scraping-links-with-php/">new tutorial</a> on creating a simple link scraper with the help of PHP and the cURL extension.
</p>
<blockquote>
In this tutorial you will learn how to build a PHP script that <a href="http://en.wikipedia.org/wiki/Web_scraping">scrapes</a> links from any web page. You learn how to use cURL, call PHP DOM functions, use XPath and store the links in MySQL.
</blockquote>
<p>
You'll have to have PHP5 and the cURL extension enabled on your web server to make it all work, but the code is <a href="http://www.merchantos.com/makebeta/php/scraping-links-with-php/">all there</a> ready for you to cut and paste. The application grabs the page with cURL (including the possibility to fake your user agent), parses through the HTML with the DOM and XPath functionality to grab the links and uses the MySQL methods to store them into your database.
</p>]]></description>
      <pubDate>Wed, 15 Aug 2007 12:08:00 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[WaxJelly Blog:  The easiest way to scrape details from a MySpace profile page with PHP]]></title>
      <guid>http://www.phpdeveloper.org/news/7462</guid>
      <link>http://www.phpdeveloper.org/news/7462</link>
      <description><![CDATA[<p>
From the WaxJelly blog today comes <a href="http://waxjelly.wordpress.com/2007/03/17/scrape-details-from-a-myspace-profile/">a handy bit of code</a> for anyone out there looking to scrape details from just about any MySpace page out there (quick and easy).
</p>
<blockquote>
It's amazing how just a little optimization on the part of myspace makes crawling their site so much easier. We're going to scrape the user detail (name, age, sex, etc..) from a profile, using the header info...
</blockquote>
<p>
The <a href="http://waxjelly.wordpress.com/2007/03/17/scrape-details-from-a-myspace-profile/">script</a> grabs the contents of the given URL, loops through, pulls out the meta tag information and uses that as a key to grab the rest of the user's information (including name, age, city, state, etc).
</p>]]></description>
      <pubDate>Tue, 20 Mar 2007 10:41:00 -0500</pubDate>
    </item>
  </channel>
</rss>
