A while back, we looked at Diffbot, the machine learning AI for processing web pages, as a means to extract SitePoint author portfolios. [...] Since then, the design of the pages we processed has changed, and thus the API no longer reliably works. In this tutorial, apart from rebuilding the API so that it works again, we’ll use the official Diffbot client to build custom entities that correspond to the data we seek (author portfolios).
He starts by setting up the environment for development (a Homestead Improved instance) and pulling in a few libraries to bootstrap the script. He shows the setup and configuration of the Diffbot client, creating a new API application via their UI and using the browser tools to help the service locate the information it needs. He then shows how to extend the client and define an "entity factory" to generate the objects requested. In this case he creates an
AuthorFolio class to contain the author's number of posts.