Google Vision API

Google Cloud Platform provides a bunch of pre-trained machine learning models for text and object detection through APIs.

The Vision API can detect text, objects and label images, etc.

To give it a try:

Sign up with google cloud platform, fill in address and credit card info (which creates a billing account)

In google clould platform console, create an empty project

Link the billing account to the project through Billing on the left task bar

Go to APIs&Services, and enable the Vision API with the project

Host you images on google cloud storage in a bucket, and make the images public

Call the api through the web plug in at https://cloud.google.com/vision/docs/quickstart

Modify the following rest service request. The imageuri points to your image on google cloud storage / any other public url.

{

  "requests": [

    {

      "features": [

        {

          "type": "LABEL_DETECTION"

        }

      ],

      "image": {

        "source": {

          "imageUri": "gs://bucket-name-123/demo-image.jpg"

        }

      }

    }

  ]

}

The type can be LABEL_DETECTION, DOCUMENT_TEXT_DETECTION, OBJECT_LOCALIZATION etc. for labeling, detecting text, detecting objects and locations.

If using a Rest client to call the api, firstly generate an API key for authentication. https://cloud.google.com/docs/authentication/api-keys

Then post the rest service request to

  POST https://language.googleapis.com/v1/images:annotate?key=API_KEY

The images:annotate is the name of the api for reading text from an image. Change the name if using other apis.

The image doesn't have to be an url. It can be the encoded image content.

There are libraries available in Python, C#, Java, etc. 

Insall the library locally and call the API programmetically. Snippet in python:

    

    client = vision.ImageAnnotatorClient()

    with io.open(image_file, 'rb') as image_file:

        content = image_file.read()

    image = types.Image(content=content)

    response = client.document_text_detection(image=image)

    document = response.full_text_annotation