Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

SitePoint PHP Blog:
Building Microsoft’s What-Dog AI in under 100 Lines of Code
Feb 16, 2016 @ 18:38:28

On the SitePoint PHP blog there's a tutorial posted from editor Bruno Skvorc where he attempts to replicate the "What-Dog" application recently created by Microsoft in under 100 lines of code. It makes use of the Diffbot service to do the actual detection and evaluation.

Rather recently, Microsoft released an app using AI to detect a dog’s breed. [...] In my non-SitePoint time, I also work for Diffbot – the startup you may have heard of over the past few weeks – who also dabble in AI. To test how they compare, in this tutorial we’ll recreate Microsoft’s application using Diffbot’s technology to see if it does a better job at recognizing the adorable beasts we throw at it!

He walks you through the installation and configuration of the software you'll need (and account you'll need to create). From there he shares the code to take in the user's upload, send it as a POST request over to the Diffbot service and returning the relevant results. He finishes out the article with a comparison of the two services, posting various images and seeing which comes closer.

tagged: whatdog ai tutorial diffbot api dog compare microsoft

Link: http://www.sitepoint.com/building-microsofts-what-dog-ai-in-under-100-lines-of-code/

SitePoint PHP Blog:
Suggesting Carbon with Composer – Date and Time the Right Way
Nov 16, 2015 @ 15:16:58

On the SitePoint PHP blog there's a tutorial that's been posted spotlightling a PHP library that can make working with dates and times simpler: Carbon . In this new tutorial they walk you through what the library has to offer and plenty of examples of it in use.

Carbon is a small library for date and time manipulation in PHP. It relies on and extends the core DateTime class, adding helpful methods for a significantly saner experience.

In this article, we’ll take a look at some basic usage examples, and then use it in a real project.

They start with the installation (via Composer) and a few examples of it in use, determining if a given date string is a weekend, in a leap year, etc. They also talk about localization support and working with time intervals. He then gets into the more real-world part of the example, updating the Diffbot client to optionally support Carbon for its date/time handling. He starts with some tests to define how he wants the handling to work and how to use it to parse the date returned from the DiffBot API.

tagged: carbon datetime library tutorial date time diffbot client api

Link: http://www.sitepoint.com/suggesting-carbon-with-composer-date-and-time-the-right-way/

SitePoint PHP Blog:
Powerful Custom Entities with the Diffbot PHP Client
Nov 02, 2015 @ 16:55:18

On the SitePoint PHP blog editor Bruno Skvorc continues his look at the Diffbot service and shows how to use their API to create entities representing objects from the API data.

A while back, we looked at Diffbot, the machine learning AI for processing web pages, as a means to extract SitePoint author portfolios. [...] Since then, the design of the pages we processed has changed, and thus the API no longer reliably works. In this tutorial, apart from rebuilding the API so that it works again, we’ll use the official Diffbot client to build custom entities that correspond to the data we seek (author portfolios).

He starts by setting up the environment for development (a Homestead Improved instance) and pulling in a few libraries to bootstrap the script. He shows the setup and configuration of the Diffbot client, creating a new API application via their UI and using the browser tools to help the service locate the information it needs. He then shows how to extend the client and define an "entity factory" to generate the objects requested. In this case he creates an AuthorFolio class to contain the author's number of posts.

tagged: diffbot client custom entities tutorial author portfolio api

Link: http://www.sitepoint.com/powerful-custom-entities-with-the-diffbot-php-client/

SitePoint PHP Blog:
Turning a Crawled Website into a Search Engine with PHP
Jul 06, 2015 @ 15:19:43

The SitePoint PHP blog has posted the second part of their "Powerful Custom Search Engines with Diffbot" series with part two showing how to take the Diffbot results and make them searchable.

In the previous part of this tutorial, we used Diffbot to set up a crawljob which would eventually harvest SitePoint’s content into a data collection, fully searchable by Diffbot’s Search API. We also demonstrated those searching capabilities by applying some common filters and listing the results. [...] In this part, we’ll build a GUI simple enough for the average Joe to use it, in order to have a relatively pretty, functional, and lightweight but detailed SitePoint search engine. What’s more, we won’t be using a framework, but a mere total of three libraries to build the entire application.

For those interested in the end result, you can skip to the demo. Otherwise, they'll walk you through the full process:

  • Bootstrapping the environment and needed libraries
  • Creating a simple "home" page with a Diffbot client
  • Creating the frontend interface (a form allowing for various search terms)
  • Making the Javascript to catch the form submission
  • Adding CSS to style the page
  • Building out the PHP backend to perform the different search types (author and keywords)

Finally he ties it all together and create the output of the search results, providing links to each of the matching pages, posting date, author information and a brief summary. He ends the post with a look at paginating the results via a "PaginationHelper" class that will drop a navigation item at the bottom of the results and handle moving from page to page, interfacing with the Diffbot client.

tagged: search engine diffbot tutorial series part2 results crawled website

Link: http://www.sitepoint.com/turning-crawled-website-search-engine-php/

SitePoint PHP Blog:
Crawling and Searching Entire Domains with Diffbot
Jul 02, 2015 @ 14:41:39

The SitePoint PHP blog has a new tutorial posted, the first part in a new series, showing you how to create a "powerful custom search engine" with the help of the Diffbot service. In this first part they help you get everything you need set up (including a VM to run it from).

In this tutorial, I’ll show you how to build a custom SitePoint search engine that far outdoes anything WordPress could ever put out. We’ll be using Diffbot as a service to extract structured data from SitePoint automatically, and this matching API client to do both the searching and crawling. I’ll also be using my trusty Homestead Improved environment for a clean project, so I can experiment in a VM that’s dedicated to this project and this project alone.

He walks you through each step of the process, first creating the "crawljob" script and then executing it to gather the results. He also shows how to show this information via a simple GUI when searches are performed. A Diffbot PHP client library makes creating the crawljob simpler and lets you configure things like max number of items to crawl, patterns to match and what URLs to follow on the pages. Running the script creates the job which is then executed immediately. The same library makes search the data simpler too, using a "search" method along with some special tagging, and returning a JSON result with the matching records.

tagged: crawl domain diffbot search engine part1 series tutorial

Link: http://www.sitepoint.com/crawling-searching-entire-domains-diffbot/

SitePoint PHP Blog:
Diffbot: Crawling with Visual Machine Learning
Aug 01, 2014 @ 16:37:12

On the SitePoint PHP blog Bruno Skvorc has posted a tutorial showing you how to use the Diffbot service to extract data from any page. He introduces both the service itself and walks you through a simple request via Guzzle.

Have you ever wondered how social networks do URL previews so well when you share links? How do they know which images to grab, whom to cite as an author, or which tags to attach to the preview? Is it all crawling with complex regexes over source code? Actually, more often than not, it isn’t. [...] If you want to build a URL preview snippet or a news aggregator, there are many automatic crawlers available online, both proprietary and open source, but you seldom find something as niche as visual machine learning. This is exactly what Diffbot is – a “visual learning robot” which renders a URL you request in full and then visually extracts data, helping itself with some metadata from the page source as needed.

He uses a combination of a Laravel installation (via a Homestead instance) and a Guzzle request using a fetched token. The service offers a 10k call limit on a 7 day free trial, so you can sign up and grab your token there. He includes code for an example request fetching a SitePoint page and parsing out the tags. He also briefly looks at the custom handling diffbot allows based on CSS-type rules.

tagged: diffbot parse data service api guzzle homestead tutorial introduction

Link: http://www.sitepoint.com/diffbot-crawling-visual-machine-learning/


Trending Topics: