In a recent post I discussed Pixcavator Search, prototype image-to-image software. As it searches for images based on similarity, a potential application of this software is a search for copyrighted images. With numerous application of this type out there, the discussion of performance evaluation is usually absent.
Let’s consider the performance evaluation commonly used in information retrieval. Below, we just replace “document” with “image”.
For a given query image, we assume the following:
- There are M relevant images in the collection = correct matches.
- When the query is executed, N images are retrieved.
- Out of those only R are relevant.
Then the following two measurements quantify the quality of the search:
Recall = R / M = Number of retrieved images that are also relevant / Total number of relevant images.
Precision = R / N = Number of retrieved images that are also relevant / Total number of retrieved images.
The recall is the answer to the question: How close am I to getting all good matches? The precision is the answer to the question: How close am I to getting only good matches?
In the example below, lenaC.jpg is the query and the program searched for its modified versions (7 modified versions of the original: stretched, rotated, noised, etc). Then N = 27, M = 8, R = 7, so recall = 7/8 and precision = 7/27.
Ideally, the value of recall and precision should each be equal to 1. However, in reality they go in the opposite directions. When the query is broad, the recall is high, but precision is low. When the query is restrictive, the precision is high and recall is low. We have all experienced this effect searching Google, Yahoo, etc.
But what does “restrictive” mean in the image search context? The query is an image, so what is a “restrictive image”?
Turns out that, for this particular algorithm, image with strong, distinctive features are best. This is certainly very vague. More specifically, these are images with large, high contrast objects. They should also have enough of these objects. For example, one large object, or several medium ones, or the whole image filled with dots. The last one isn’t something you’d call an image of good quality but it is very stable under noise, blur, and other transformations.
I’ll write some more on this. Meanwhile, this is the Wikipedia article about precision and recall. The article provides a probabilistic interpretation of precision and recall.