<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>PHPDeveloper.org</title>
    <link>http://www.phpdeveloper.org</link>
    <description>Up-to-the Minute PHP News, views and community</description>
    <language>en-us</language>
    <pubDate>Wed, 19 Jun 2013 18:01:09 -0500</pubDate>
    <ttl>30</ttl>
    <item>
      <title><![CDATA[Juozas Kaziukenas' Blog: Scraping login requiring websites with cURL]]></title>
      <guid>http://www.phpdeveloper.org/news/12007</guid>
      <link>http://www.phpdeveloper.org/news/12007</link>
      <description><![CDATA[<p>
Several sites have areas that have content protected behind a login making them difficult to pull into a script. <i>Juozas Kaziukenas</i> has <a href="http://dev.juokaz.com/php/scraping-login-requiring-websites-with-curl">created an option</a> to help you past this hurdle - a PHP class (that uses cURL) that can POST the login data to the script and pull back the session ID.
</p>
<blockquote>
But how you are going to do all this work with cookies and session id? Luckily, PHP has <a href="http://uk.php.net/curl">cURL extension</a> which simplifies connecting to remote addresses, using cookies, staying in one session, POSTing data, etc. It's really powerful library, which basically allows you to use all HTTP headers functionality. For secure pages crawling, I've created very simple <a href="http://dev.juokaz.com/examples/crawler/crawler.phps">Secure_Crawler</a> class.
</blockquote>
<p>
The class uses the built-in cURL functionality to send the POST information (in this case the username and password, but it can be easily changed for whatever the form requires) and provides a get() method to use for fetching other pages once you're connected.
</p>]]></description>
      <pubDate>Tue, 24 Feb 2009 08:44:43 -0600</pubDate>
    </item>
  </channel>
</rss>
