Optical Character Recognition (OCR) is the process of converting printed text into a digital representation. It has all sorts of practical applications — from digitizing printed books, creating electronic records of receipts, to number-plate recognition and even circumventing image-based CAPTCHAs. [...] Tesseract is an open source program for performing OCR. You can run it on *Nix systems, Mac OSX and Windows, but using a library we can utilize it in PHP applications. This tutorial is designed to show you how.
They walk you through the installation of the Tesseract software locally (well, inside of a VM) and testing the install with the output from a sample image. With that up and working they show how to use this library to work with the Tesseract functionality, passing it in via a simple Silex application endpoint as a POSTed image file. Full code for the sample application is included as well as the results from another sample image. They also include some additional functionality you could use to detect phone numbers in the image content.