What is image segmentation?
Let’s go to Wikipedia. The first sentence is:
“Image segmentation is partitioning a digital image into multiple regions”.
This description isn’t what I would call a definition as it suffers from a few very serious flaws.
First, what does “partitioning” mean? A partition is a representation of something as the union of non-overlapping pieces. Then partitioning is a way of obtaining a partition. The part about the regions not overlapping each other is missing elsewhere in the article: “The result of image segmentation is a set of regions that collectively cover the entire image” (second paragraph).
Then, is image segmentation a process (partitioning) or the output of that process? The description clearly suggests the former. That’s a problem because it emphasizes “how” over “what”. That suggests human involvement in the process that is supposed to be objective and reproducible.
Next, a segmentation is a result of partitioning but not every partitioning results in a segmentation. A segmentation is supposed to have something to do with the content of the image.
More nitpicking. Do the regions have to be “multiple”? The image may be blank or contain a single object. Does the image has to be “digital”? Segmentation of analogue images makes perfect sense.
A slightly better “definition” I could suggest is this:
A segmentation of an image is a partition of the image that reveals some of its content.
This is far from perfect. First, strictly speaking, what we partition isn’t the image but what’s often called its “carrier” – the rectangle itself. Also, the background is a very special element of the partition. It shouldn’t count as an object…
Another issue is with the output of the analysis. The third sentence is “Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images.” It is clear that “boundaries” should be read “their boundaries” here - boundaries of the objects. The image does not contain boundaries – it contains objects and objects have boundaries. (A boundary without an object is like Cheshire Cat’s grin.)
Once the object is found, finding its boundary is an easy exercise. This does not work the other way around. The article says: “The result of image segmentation [may be] a set of contours extracted from the image.” But contours are simply level curves of some function. They don’t have to be closed (like a circle). If a curve isn’t closed, it does not enclose anything – it’s a boundary without an object! More generally, searching for boundaries instead of objects is called “edge detection”. In the presence of noise, one ends up with just a bunch of pixels – not even curves… And by the way, the language of “contours”, “edges”, etc limits you to 2D images. Segmentation of 3D images is out of the window?
I plan to write a few posts about specific image segmentation methods in the coming weeks.
July 14th, 2008 at 1:47 am
Your critics of the wikipedia article are well argued, why don’t you modify it?
July 14th, 2008 at 1:57 pm
That’s a fair question. The main reason is that all I’ve got right now is a critique. I don’t have something to offer that wouldn’t be just as flawed. I do plan to right something up in the coming weeks but only as a very first draft. It will take a while for it to develop into something serious. But even if I had something perfect ready, I’d hesitate to offer it for Wikipedia. I find the idea that what I wrote can be sliced and diced by other people very unappealing. Not that I have anything against Wikipedia…