August 15, 2009
TechCrunch deadpools Riya. Certainly, no surprise to me. Their technology was never been impressive (posts are here). Like.com remains but does not seem to be going anywhere…
At least TechCrunch announced this death after promoting Riya for 3 years. Others have died or will die more quietly.
Comments Off
April 29, 2009
I was about to review the newly released Google Similar Image Search when I ran across this one. The verdict: not so good.
The guy does not seem to realize though that Microsoft released its own similarity search a few months before. I am not judging because I missed it myself when it came out. It would be interesting to test and see which one is better (or not as bad). One point in favor of Microsoft is that Google didn’t index all images.
UPDATE: Another good revew at Rich Marr’s Tech Blog.
Comments Off
March 13, 2009
In the previous post:
Recall = Number of retrieved images that are also relevant / Total number of relevant images.
Precision = Number of retrieved images that are also relevant / Total number of retrieved images.
In Pixcavator Search, the matches are simply ordered based on their distance from the query (just like Google). Then we need choose a cut-off. In the example considered last time, the cut-off was implicitly “all that fit in one page”. This is a reasonable standard for user oriented applications. For experimentation and testing, however, we may want to use the distance instead. For example, below I may choose the cut-off distance of 80: all images within 80 from the query are declared matches and retrieved, the rest are not. Then recall = 7/8, precision = 7/14.

The choice was made based on the examination of the search results for this particular image in an attempt to include as many as possible of “good” matches and, at the same time, to exclude as many as possible of the “bad” matches. More experimentation showed that 80 works OK for other queries as well. In general, however, this is not to be expected.
My conclusion is that the main drawback of precision and recall as a measure quality of the algorithm is that it requires a cut-off to separate the retrieved images from the rest. Then, the evaluation results depend on this choice. In fact, this measure ends up to be a measure of the quality of the query image, not the algorithm.
Comments Off
March 6, 2009

Technology News asks: “Will Microsoft’s Kumo Bring New Visual Dimension to Search?” and answers: “Microsoft seems to be amping up visual search capabilities in its upcoming Kumo search engine, if leaked screenshots are any indication.”
Well, they aren’t (see for yourself here). And neither is the leaked email.
The reporter seems to have been swayed by the CEO of Imprezzeo, a company offering their own image-to-image search engine.
Microsoft is certainly capable of doing that, e.g., Lincoln. Incidentally, Imprezzeo’s site has only a video demo.
Comments Off
March 4, 2009
In a recent post I discussed Pixcavator Search, prototype image-to-image software. As it searches for images based on similarity, a potential application of this software is a search for copyrighted images. With numerous application of this type out there, the discussion of performance evaluation is usually absent.
Let’s consider the performance evaluation commonly used in information retrieval. Below, we just replace “document” with “image”.
For a given query image, we assume the following:
- There are M relevant images in the collection = correct matches.
- When the query is executed, N images are retrieved.
- Out of those only R are relevant.
Then the following two measurements quantify the quality of the search:
Recall = R / M = Number of retrieved images that are also relevant / Total number of relevant images.
Precision = R / N = Number of retrieved images that are also relevant / Total number of retrieved images.
The recall is the answer to the question: How close am I to getting all good matches? The precision is the answer to the question: How close am I to getting only good matches?
In the example below, lenaC.jpg is the query and the program searched for its modified versions (7 modified versions of the original: stretched, rotated, noised, etc). Then N = 27, M = 8, R = 7, so recall = 7/8 and precision = 7/27.

Ideally, the value of recall and precision should each be equal to 1. However, in reality they go in the opposite directions. When the query is broad, the recall is high, but precision is low. When the query is restrictive, the precision is high and recall is low. We have all experienced this effect searching Google, Yahoo, etc.
But what does “restrictive” mean in the image search context? The query is an image, so what is a “restrictive image”?
Turns out that, for this particular algorithm, image with strong, distinctive features are best. This is certainly very vague. More specifically, these are images with large, high contrast objects. They should also have enough of these objects. For example, one large object, or several medium ones, or the whole image filled with dots. The last one isn’t something you’d call an image of good quality but it is very stable under noise, blur, and other transformations.
I’ll write some more on this. Meanwhile, this is the Wikipedia article about precision and recall. The article provides a probabilistic interpretation of precision and recall.
Comments Off
February 16, 2009
The link to this demo was sent to me by Ricardo Niederberger Cabral (thanks!). The demo program is called Vision4 and was created by Numenta. This is its main point:
This program demonstrates some capabilities of Numenta’s Hierarchical Temporal Memory (HTM) technology applied to visual object recognition. .. The HTM network contained in this demo has been trained to recognize four types of objects: cell phones, sailboats, cows, and rubber ducks.
Every image is given four ratings. Each represents how much the image resembles one of the four types.
As you can see, the goal is modest and there are no unsubstantiated claims of how this is ready to be applied in real life (and don’t get me started on academic publications!). This is refreshing. The program is also fun to play with. You can load your own images, you can add noise, blur etc to the images and see the effect on the recognition. The recognition results are often good and when they aren’t, it’s still interesting.
For serious purposes, it is unclear where this is going though.
It’s fine with me that there are only four categories – just one would be enough to test the concept. It does not bother me when a face is rated high in the cow category and another face high in the duck category. My main complaint is the instability of recognition under image transformations. For example, after turning “sailboat” a few degrees it became “cell phone”. A few degrees more and it becomes mixed – half “cow” (first image below). Adding noise, occlusion, etc has similar effect (second image).
 
Certainly, one does not expect rotations to affect image recognition. Meanwhile, a mixed recognition is a failed recognition and should be presented as such.
I am certainly biased here. I don’t believe in “build[ing] machines that work on principles used by the brain”. I don’t believe in trying to imitate brain and I’ve written a few times about that. Traditionally, a scientist tries to understand nature by observing it, analyzing it, etc. Instead, it is suggested to try to understand nature by first understanding how the brain understands it? Seems like a roundabout to me, bordering on a vicious circle. I also have serious reservations about the use of machine learning in computer vision.
Annoying bug: every time I start it, the program would turn on my webcam and it would keep it on even after I shut it down.
Comments Off
February 10, 2009
TechCrunch is happy to do PR for another visual search company: Milabra.
Milabra claims that it can categorize images, “from puppies to porn”:
…when searching through a library of images for dogs, Milabra doesn’t need to constantly compare each image with its database of known ‘dog’ images – instead, it can look for traits that it has learned to associate with “doggyness”…
The two examples in the demo are “beach” and “dog”. You upload an image with people on the beach, click “Search” and you get a page of beach photos… Wait, you don’t get to upload anything – this is just a video! So, there is no way to test their claims. Unfortunately, this is not unusual in this area and in computer vision in general.
If your software can recognize a puppy in an image (95% of the time as you claim), it should be easy for you to demonstrate this ability. Create a little web application (or desktop, I don’t care) that allows me to upload my own image which is then identified as “puppy” (or “tree”, or “street”, I don’t care). There is no such program. Why not? The answer is obvious.
In response to some skepticism, this is what one of the founders wrote:
…if you think that this cannot be done, then you are completely clueless: object classifiers have been made for more than 10 years now at leading CS labs around the world.
That reminds me of the episode of Seinfeld when Kramer decides to build levels in his apartment:
KRAMER: It’s a simple job. Why, you don’t think I can?
JERRY: Oh, no. It’s not that I don’t think you can. I know that you can’t, and I’m positive that you won’t.
This is Millabra’s team:
- MBA
- MS in Biological Engineering and PhD in neuroscience
- MS in Computer Science and Ph.D. in Biophysics
- Professional Project Manager
- Expert in computer networking, user interface design
JERRY: I don’t see it happening.
And what about TechCrunch? Same story again and again since I started to keep track a couple of years ago: they publish an enthusiastic report about a company doing image analysis/search/recognition, and then silence. The company slips into obscurity and there is no follow-up, nothing. These people never learn…
The people who do seem to learn, slowly, are the investors: Riya (like.com) $20 million or more, Polar Rose $5 million, Milabra $1.4 million. Or maybe this is just the effect of the economic downturn?
Comments Off
January 25, 2009
This has been an on-and-off project for almost two years (version 1.0 described here). The purpose is simple: find images similar to a given image. Since it is not even well understood what images are similar, the progress in this area of “image-to-image” search (aka “visual image search”) is very slow–. So, instead, we focus on the goal of finding modified versions of the original. This release is a way to report a limited success we have achieved.The executable PxSearch.exe is accompanied by a small collection of images (download here, 7.2 MB). The system consists of the following modules:
- the collection of images that can be extended;
- the database containing “signatures” of images, images’ origins, and other data;
- the image analysis unit (produces the signatures);
- the matching unit (matches the signatures);
- user interface (uploads an image, searches for similar images in the collection, displays the matches as a list);
For every image to be added, first the image is converted to grayscale and then shrunk so that the larger dimension is 150. Then several of its secondary versions are created, analyzed, and added to the collection and their data is added to the database, total of 8:
- original
- rotation, 5 degrees
- rotation, 45 degrees
- Gaussian blur
- salt and pepper noise
- stretch, 5%
- shrink, 5%
- crop from all sides, 5%
The entry in the database for each image contains the information about its origin:
- date and time,
- the filename of the original image,
- the way the image was produced from the original (shrinking, rotation, etc),
- the signature of the image.
A signature is a sequence of 126 integers which is the output of image analysis: it is essentially the distribution of sizes of objects found in the image (the data comes form the same source as for Pixcavator).
Suppose the signature of the two images are {An} and {Bn}. Move along these sequences and compute the absolute value of the differences of n-th entries. The result is a distance formula as the “weighted 1-norm metric”:
D = Σ Cn |An – Bn|.

A search is deemed successful if most of the versions of the query image are at the top of the list. This is the case for images that are “good” in the sense that they have clear pattern (based on shapes not color). However, this standard is hard to quantify as it is dependent on the collection. Since the collection I used for testing was small (4500 images), I had to find a way to evaluate the quality of searches that is independent of the size of the collection, as much as possible. So, the quality score for a given image was
(average distance to its 7 versions) / (average distance to all images) * 100.
There are many interesting question to study based on this data and I will report further.
Comments Off
January 12, 2009
Gazopa is a new visual search engine that is “a venture project inside Hitachi”.
I tried its Facebook application. I uploaded a few standard images and a few test images of my own and ran Gazopa. Some of the matches were awful while others were sort of meaningful. See for yourselves. The first match is displayed under the target image.




Gazopa also found a cropped copy of the “cameraman”, but not a rotated copy. The inability to handle rotations is a common problem with almost all visual search engines. Pixcavator Image Search can handle rotations with ease (read about it here or wait for the last version – to be released soon).
As far as the underlying technology, the site says that “GazoPa enables users to search for a similar image from characteristics such as a color or a shape extracted from an image itself” and nothing more. So, even what they consider similar is unknown.
Comments Off
November 28, 2008
The screenshot tells the whole story. The image of a table in the upper left corner is the query image. The rest are supposed to be “similar”. What is the image filled with numbers doing here you ask? Hmm… Oh yes, it’s a table of numbers!
Previous posts on the topic are here.

Comments Off
September 15, 2008
Last time I noticed that image-to-image search engines launch in batches was in May. Of course, “launch” usually means private beta. I also found it interesting that there are so many of them and yet they never mention or discuss each other.
Now, another batch – within a few days from each other.
First, Gazopa (what an awful name!) from Hitachi. Private beta.
Second, Imprezzeo. “Coming soon”.
Third, Picasa launched a face recognition feature. By most accounts it does not work well.
Fourth, VideoSurf “Unveils First Computer Vision Search for Video”. Private beta.
Finally, Idee updated its TinEye. Apparently, now it can match an image and its rotated version. That was my main problem with the application.
June 2, 2008
In part 1 and part 2 I discussed a paper on face recognition and the methods it relies on. Recall, each 100×100 gray scale image is a table of 100×100 = 10,000 numbers that can be rearranged into a 10,000-vector or a point in the 10,000-dimensional Euclidean space. As we discovered in part 2, using the closedness of these points as a measurement of similarity between images ignores the way the pixels are attached to each other. A deeper problem is that unless the two images are aligned first, there is no way to use this representation to discover that they depict the same or similar thing. The proper term for this alignment is image registration.
The similarity between images represented this way will be entirely based on their overlap. As result, the distance can be large even between images that we would consider similar. In part 2 we had examples of one-pixel images. More realistic examples are these:
- image with an object in one corner onewith the same object in another corner;
- image of a cross and the same cross turned 45 degrees;
- etc.
Back to face identification. As the faces are points in the 10,000-dimensional space, these points should be grouped somehow. The point is that all images of the same individual should belong to one group and not any other. It is common to consider “clusters” of points, i.e., groups formed of point close to each other. This was discussed above.
Now, in this paper the approach is different: a new point (the face to be identified) is represented as a linear combination of all other points (all faces in the collection).
As we know from linear algebra, this implies the following. (1) the entire collection has to be linearly dependent, (2) you can find a subcollection that adds up to 0! In other words, everything cancels out and you end up with a blank photo. Is it possible? If the dimension is low or the collection is large (the images are small relative to the number of images), maybe. What if the collection is small? (It is small – see below.) It seems unlikely. Why do I think so? Consider this very extreme case: you may need the negative for each face to cancel it: same shape with dark vs. light hair, skin, eyes, teeth (!).…
Second, the new image in the collection has to be a linear combination of training images of the same person. In other words, any image of person A is represented as a linear combination of other images of A in the collection, ideally. (More likely this image is supposed to be closer to the linear space spanned by these images.) The approach could only work under the assumption that people are linearly independent:
No face in the collection can be represented as a linear combination of the rest of the faces.
It’s a bold assumption.
If it is true, then the challenge is to make the algorithm efficient enough. The idea is that you don’t need all of those pixels/features and they in fact could be random. That must be the point of the paper.
The testing was done on two collections with several thousand images each. That sounds OK, but the number of individuals in these collections was 38 and 114!
To summarize, there is nothing wrong with the theory but its assumptions are unproven and the results are untested.
P.S. It’s strange but after so many years computer vision still looks like an academic discipline and not an industry.
Comments Off
May 27, 2008
TinEye is an image-to-image search engine from Idée. It is in a closed testing but I got to try it a couple of days ago. After a very positive review at TechCrunch, I decided to write up my impressions (a review of an earlier version is here).
They don’t make wild claims about being able to do face identification or similar (unsolved) problems. The goal seems very simple: find copies of images. With this task TinEye does a fairly good job. It finds even ones that have been modified – noise, color, stretch, crop, some photoshopping. It does not do well with rotation. That’s a major drawback (compare to Lincoln from MS Research).
These are the images that I tried.
  
Barbara: found both color and bw copies and a slightly cropped version.
Marilyn: found cropped and stretched versions, and an even edited (defaced) version.
Lenna: found both color and bw, but not partial or rotated versions (even though a rotated version is in the index).
May 12, 2008
Let’s review part 1 first. If you have a 100×100 gray scale image, it is simply a table of 100×100 = 10,000 numbers. You rearrange the rows of this table into a 10,000-vector and represent the image as a point in the 10,000-dimensional Euclidean space. This enables you to measure distances between images, discover patterns, match images, etc. Now, what is wrong with this approach?
Suppose A, B, and C are images with a single black pixel in the left upper corner, next to it, and the right bottom corner respectively. Then, the distances will be the equal: d(A,B) = d(B,C) = d(C,A), no matter how you define the distance d(,) between points in this space. The conclusion: if A and B are in the same cluster, then so is C. So adjacency of pixels and distance between them is lost in this representation!
Of course this can be explained, as follows. The three images are essentially blank so it’s not surprising that they are close to the blank image and to each other. So as long as pixels are “small” the difference between these four images is justifiably negligible.
Of course, “small” pixels means “small” with respect to the size of the image. This means high resolution. High resolution means larger image (for the same “physical” object), which means higher dimension of the Euclidean space, which means higher computational costs. Not a good sign.
To take this line of thought all the way to the end, we have to ask the question: what if we keep increasing resolution?
The image will simply turn into an exact copy of the “physical” object. Initially, the image is a table of numbers. Now, you can think of the table as a rectangle subdivided into small squares, then the image is a function to the reals constant on each of these squares. As the resolution grows, the rectangle remains the same but the squares become smaller. In the end we have a – possibly continuous – function (as the limit of this sequence of functions). This is the “real” image and the rest are its approximations.
It’s not as clear what happens to the representations of images in the Euclidean space. The dimension of this space grows and in the end becomes infinite! It also seems that this new space should be made of infinite strings of numbers. That does not work out.
Indeed, consider this (“real”) image: a white square with a black upper left quarter. Let’s represent it first as a 2×2 image. Then in the 4-dimensional Euclidean space this image is (1,0,0,0). Now let’s increase the resolution. If this is a 4×4 image, it is (1,1,0,0,1,1,0,0,..,0) in the 16-dimensional space. In the 32-dimensional space it’s (1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,…,0). You can see the pattern. But what is the end result (as the limit of this sequence of points)? It can’t be (1,1,1,…), can it? It definitely isn’t the original image. That image can’t even be represented as a string of numbers, not in any obvious way…
OK, these are just signs that there may be something wrong with this approach. A more tangible problem is that unless the two images are aligned first, there is no way to use this representation to discover that they depict the same or similar thing. About that in the next post.
Comments Off
May 7, 2008
After Google “launched” its ImageRank - by presenting a paper about it, now there are two more.
First, Idée “publicly launched” its image search engine (report here). If you want to try it, they’ll put you on a waiting list. How is it different from what we saw before?
Second, “Pixsta launches image search engine” (report here). Testing is also closed. What is the difference from what we saw before?
The only good thing here is that I discovered a better term for visual image search, CBIR, etc. It’s “image-to-image search“, as opposed to text-to-text and text-to-image we are familiar with.
Comments Off
Next Page » |
|
|