We compared the 3 best image analysis API’s — here’s what we learned

By Scott Domes

Last updated:

One of the flagship applications of modern machine learning has been working with images — training computers to analyze, classify, and alter different types of pictures. Last year, Google’s DeepDream software made waves by creating a series of terrifying, nightmare-inducing images:

image1

Thanks, Google.

While image and video classification (that is, telling us what is depicted in the chosen media) is not so cutting edge anymore, this is actually good news. A number of cheap services have sprung up to make image classification quite accessible.

Here at MuseFind we’ve been looking for ways to help both brands and social media influencers produce better content — and part of that effort is figuring out the correlations between the most successful posts and their content.

Here’s our example picture, from one of MuseFind’s influencer marketing campaigns with John Varvatos. Courtesy of @rusty.blade:

image2

Below are the top three image/video analysis services we considered: Clarifai, Google Cloud Vision, and Amazon Rekognition. Let’s see what they can tell us about this image.

Clarifai

Clarifai, unlike any other API’s on this list, has the added bonus of video analysis. View a demo below:

For videos, Clarifai offers scene recognition. For images, it can also do sentiment analysis, text recognition, logo detection, and face detection, as well as a more robust version of Resemble’s image attribute detection: brightness, colour, dominant colour.

Here’s what it shows about our image:

Resemble’s image attribute detection

It was able to recognize some key concepts (the first five are fairly accurate), but also added some irrelevant tags. No scarf, no child, no woman.

Still, if we took the top five tags and used that to automatically categorize our image, we’d be in a good place.

Price: 5,000 free operations, 10,000 free inputs per month- $1.20/$.80 per 1000 after that

Cloud Vision API

Cloud Vision by Google includes many of the key features of Clarifai (sentiment analysis, text recognition, logo detection, facial analysis) and adds a couple of bonuses: landmark detection and a simple REST API.

https://medium.com/google-cloud/google-cloud-vision-landmark-detection-vacation-photos-41392d4b5765

Unlike Clarifai, you can’t create your own models to test against — but you also have access to an API backed by Google, one they seem quite dedicated to constantly improving.

For our image, it gave us the following labels:

Cloud Vision was accurate

Cloud Vision was accurate in all but one tag (Spring) and managed to identify not just the dog but the activity of dog walking.

However, it did fail to detect the face in the image. Here’s a preview of its sentiment analysis with a different image:

sentiment analysis with a different image

Price: Free up to 1000/month, $1.50 per thousand beyond that

Amazon Rekognition

By now, the features will sound familiar: scene detection, sentiment analysis. Amazon boasts of a more robust suite of facial analysis tools, including facial recognition (not offered by Google or Clarifai) across images, and detailed information like image quality of face, beard recognition (yes), and facial comparison (how likely is it that two faces are the same person?).

It also promises integration with AWS services like S3 and Lamba.

https://aws.amazon.com/rekognition/

Let’s start with object and scene detection. Here’s the results:

object and scene detection

It nailed the dog aspect (bonus points for ‘Adorable’) but failed to recognize any other aspect of the scene.

However, it did pick up the face:

face

That killer beard detection. A bit of a mixed bag (no sunglasses, no mustache), but also reasonable accuracy.

Price: $1 per 1000 images processed (up to a million a month)

Conclusion

We haven’t settled on a service yet — we’re going to continue experimenting and clarifying our goals over the coming weeks.

But based on these results, we already get an idea of the strengths and weaknesses of the different API’s. Clarifai has the strongest concept modeling, Google the best scene detection and sentiment analysis, and Amazon the best facial analysis.

Let us know in the comments your thoughts on these API’s, and whether you have any additional comments. And please like and share if this article was useful to you!


Share on:

Leave a Comment