Company
For photographers
For researchers
For developers
Image gallery
Blog

January 26, 2008

“Computer vision not as good as thought”, who thought?!

Filed under: computer vision/machine vision/AI, rants, news — Peter @ 7:05 pm

A study came out of MIT a couple of days ago. According to the press release the study “cautions that this apparent success may be misleading because the tests being used are inadvertently stacked in favor of computers”. The image test collections such as Caltech101 make image recognition too easy by, for example, placing the object in the middle of the image.

The titles of the press releases were “Computer vision may not be as good as thought” or similar. I must ask, Who thought that computer vision was good? Who thought that testing image recognition on such a collection proves anything?

A quick look at Caltech101 reveals how extremely limited it is. In the sample images the objects are indeed centered. This means that the photographer gives the computer a hand. Also, there is virtually no background – it’s either all white or very simple, like grass behind the elephant. Now the size of the collection: 101 categories with most having about 50 images. So far this looks too easy.

Let’s look at the pictures now. It turns out that there is another problem elsewhere. The task is in fact too hard! The computer is supposed to see that the side view of a crocodile represents the same object as the front view. How? By training. Suggested number of training images: 1, 3, 5, 10, 15, 20, 30.

The idea “training” (or machine learning) is that you collect as much information about the image as possible and then let the computer sort it out by some sort of clustering. One approach is appropriately called “a bag of words” - patches in images are treated the way Google treats words in text, with no understanding of the content. You can only hope that you have captured the relevant information that will make image recognition possible. Since there is no understanding of what that relevant information is, there is no guarantee.

Then how come some researchers claim that their methods work? Good question. My guess is that by tweaking your algorithm long enough you can make it work with a small collection of images. Also, just looking at color distribution could give you enough information to “categorize” some images – in a very small collection, with very few categories.

My suggestion: try black-and-white images first!

4 Responses to ““Computer vision not as good as thought”, who thought?!”

  1. Bob Mottram Says:

    Using many learning algorithms (genetic, neural, etc) it is very easy to categorise images on some very trivial basis. In theory the larger the data set the harder the system has to work and the less likely it is to find a quick “cheat”, but it all depends upon how features are being represented in the system.

    To achieve invariance across many images what the system needs to do is hypothesise the local surface normal of each observed feature and build those into a 3D geometric hash.

  2. Peter Says:

    Thanks for the feedback, Bob. I think a reply may warrant a separate post. I will use this opportunity to clarify my thinking on machine vision a bit.

  3. Peter Says:

    I meant “machine learning”.

  4. Computer Vision for Dummies » Why machine learning never works Says:

    […] In response to my previous post Bob Mottram wrote “Using many learning algorithms (genetic, neural, etc) it is very easy to categorise images on some very trivial basis. In theory the larger the data set the harder the system has to work and the less likely it is to find a quick “cheat”, but it all depends upon how features are being represented in the system.” […]

Leave a Reply

You must be logged in to post a comment.


| Home | Site map | Terms & Conditions | Contact us |                       Copyright© Intelligent Perception Inperc.com