November 23, 2008
This is a new version of a paper I wrote a few months ago. It describes the algorithm behind Pixcavator.
There are a couple of issues here. The first is reflected in the new title. It was unclear to a lot of people that the topology graph isn’t a tree. To deal with that I first added 5-6 mentions of the fact that “this graph isn’t a tree”. Then I added a section called The topology graph is not a tree. Finally I renamed the paper as well. At that point I just kept rewriting. The new version is almost unrecognizable. The image below illustrates the structure of the topology graph.

The second issue is the complexity of the algorithm as a function of the number of pixels N in the image. I show that my algorithm is O(N2) in the worst case scenario - the comb image on the right. In Fast computation of a contrast invariant image representation by P. Monasse and F. Guichard, they build a graph too (that one is a tree) and claim that the complexity is O(NlogN). Their proof looked unsatisfactory so I decided to apply their algorithm to my image. Odd…
Suppose the image is not quantized so there may be N levels of gray in the image. For each level, the level curve has to be traversed (step 4a). These level curves expand from a very short to the whole comb. The perimeter of the comb is N, roughly. Then the total number of pixels to be visited is 1+2+3+…+N = N*(N+1)/2. Therefore the complexity is O(N2), not O(NlogN).
Seems clear to me but maybe there is someone who can see a flaw in this argument? What am I missing? I’d appreciate any feedback.
October 27, 2008
Recall the detection is based on the mnemonic ABCDE:
- Asymmetry of the spot.
- Border: irregular.
- Color: varies.
- Diameter: large.
- Evolution of the spot.
The test for asymmetry was discussed in the last post. Suppose it is passed: the spot is symmetric more or less. Now, what about the border? To fail B, it is supposed to be irregular. Is it possible to be irregular and symmetrical at the same time? Yes, if the curve has large, but not repetitive, oscillations on one side of the axis of symmetry and then has the same oscillations on the other side. The larger are these oscillations, the less likely this is to happen however. A more likely possibility is a lot of small oscillations. In other words, we’d have to zoom in on a piece of the border and measure the smoothness of the curve. How isn’t obvious. The difficulty is that no digital curve is smooth. So, we’d have to look for oscillations that are small enough to be feasible and large enough not to be confused with edges of pixels…
Next is the color. Most normal moles are uniform in color. But varied shades of brown, tan, or black may be a sign of melanoma. Since all of the colors are close to each other in the spectrum, it’s possible that analysis of the gray scale would suffice. In that case the variability of color is easy to capture by computing the standard deviation.
The test for the diameter reads: a mole smaller than a pencil eraser is probably not cancerous. It is unclear whether the diameter (it fits in the mnemonic so well) is to consider here or the size/area is just as good. Either way it’s simple and reliable.
Finally, the evolution of the spot: “the change of a spot may indicate that the lesion is becoming malignant”. Not indication what kind of change that would be. The best guess is: the change of any of the other 4 characteristics.
In spite of their vagueness, these tests can be developed into image analysis procedures - with a help of medical experts. Next one would need to get from this string of numbers to a diagnosis or at least a score that reflects the likelihood of cancer. This step would also require input from a medical expert, though machine learning would be tempting too, for some…
This will be filed under Measuring objects in the wiki.
October 19, 2008
Melanoma is a kind of skin cancer so common and obvious at the same time that doctors encourage self-detection, even on TV, which is unusual. The detection is based on the mnemonic ABCDE:
- Asymmetry of the spot.
- Border: irregular.
- Color: varies.
- Diameter: large.
- Evolution of the spot.
There are many images out there that illustrate these features but the one above is one of very few I could find that compares benign lesions to cancerous ones. E is missing which is not uncommon.
The tests seem so simple, it’s tempting to try to design an image analysis system that would detect this cancer. Even though it’s been probably done before, let’s see how far we can go within the limits of a single blog post.
There is clearly an overlap between A, B and D. I will try to separate them as much as possible.
A for Asymmetry: How do we detect that by image analysis? On the face of it, this is about the measure of symmetry of the spot. If this is the case, the simplest approach is the following. Find the major axes of the spot, then carry out reflections about these axes and compute the overlap of the resulting spots (3 total) with the original. The overlap should be computed in relative terms in order to separate A from D.
Of course to detect even more symmetry one need to look at all rotations but that may be unnecessary.
Another, even simpler, approach is to compute the roundness of the spot as “[m]ost moles - the kind you usually don’t have to worry about - are more or less round.” That may have to be preceded by smoothing the border (Gaussian blur or similar) in order to separate A from B.
Either approaches has its problems: the major axis are badly affected by noise while computing the roundness produces errors even without noise.
To be continued…
October 6, 2008
September 29, 2008
The accuracy of measurements is reduced by noise and other environmental factors. In the digital domain, we have the complete knowledge of the values of the pixels. That may lead to the feeling that the accuracy, if not absolute, is always sufficiently good. The argument in support of this attitude is very simple: “The resolution is just so high!”
We know that the area behaves well in this respect. As the resolution increases, the digital area converges to the “real” are of the “real” object. However, the accuracy of measuring of the length of a digital curve is limited by the degree of its approximation by regular curves – independent of the resolution!
Now we have to deal with noise as well. Turns out, the length, and the length related characteristics, once again behaves poorly in comparison to the area.
Let’s consider a very simple example. Suppose we have an image containing a 1×1 black square on white background. Suppose also that the resolution is 1/N, so that the square contains N*N pixels. Add noise. Let’s suppose the noise is just a single black pixel. Now, how are the area and the perimeter of the square affected by this event?
If the new pixel ends up inside the square, neither area nor perimeter is affected. Same, if it is entirely outside the rectangle. Now, suppose the pixel is adjacent to the border of the square, as in the picture.
Then the area changes from 1 to 1+1/N2, while the perimeter changes from 4 to 4+2/N. Proportionally, the changes are 1/N2 and 1/(2N) respectively. As the resolution increases (and N goes to infinity), both go to 1. However, the “noisy” area approaches the “real” area much faster than the perimeter!
Another characteristic is the centroid. The centroid of the square is (½, ½). Under our one-pixel noise, the x-coordinate of the centroid is now ½*1+(1+1/(2N))*1/N2 = ½+1/N2 +1/(2N3). It converges at the rate of 1/N2.
On the other hand, the box dimensions change by a single pixel, 1/n! Not as good - they are length related.
Roundness is a tricky one. It is 4π*area/perimeter2, a mixture of areas and lengths. For the square, the roundness is 4π/16. For the new, “noisy”, square we have 4π(1+1/N2)/(4+2/N)2. After some algebra (long division OMG!) we reduce this to 4π/16+1/(4N) + higher power terms. Once again, this is, roughly, 1/N.
This post will be filed in the wiki under Robustness of geometry.
September 21, 2008
Recall that we have red blood cells, both fixed and living. They average 7.7 microns in size and were photographed unstained with differential interference contrast lighting. The fixed preparation was fairly easy as the cells were isolated (left). The living cells tend to adhere and form rolls (right).

The image is too messy to analyze, even manually. Best we can do is to evaluate the area covered by these cells.
I had to crop the image to ensure reasonable processing times. To try to estimate the area covered by these cells, I did 7 rounds of erosion. Then I analyzed the image with Pixcavator, settings: area = 67,523. The area of the only dark objects (given by the red contour) was 101,250. Considering 7 rounds of erosion, estimate is a bit off. Then the area covered by the cells is 709×619 (the area of the image) - 101,250 = 337,621 pixels.

Based on this data one may try to estimate the number of cells by dividing the found area by the perceived density of the cells. This number would have to be found manually. Considering the fact that the density varies a lot, the resulting estimate would be quite crude.
This is a clear example of limitations of digital image analysis.
For other examples, see our wiki.
September 10, 2008
These are fixed red blood cells. The task is to count them with Pixcavator. They average 7.7 microns in size and were photographed unstained with differential interference contrast lighting. I had to crop the image to ensure reasonable processing times.

The quality of the image is good, but there is still a problem with the image. Each cell is captured by two light semicircles. These two semicircles aren’t connected to each other however (because the light comes from one direction?), so there are no full circles. The result is that cells can’t be treated as objects and they aren’t captured by the software. In the left image, there should be red contours for each of the cells like in the image on the right.

One way to get around this is to count the semicircles themselves (2 per cell). I ran Pixcavator with the following settings: 1000 for area and 100 for contrast.
The problem with counting semicircles is that many of some of them touch each other so that they form clusters. These clusters are what’s captured by Pixcavator. To deal with this problem I needed some extra computation that followed the analysis. In the last column of the saved spreadsheet (table below) I divided the areas of the clusters of semicircles by the area of one semicircle (“1030”). The total number of semicircles found this way was 35. Then the estimate was 17.5 versus 17 of manual count.

Another way to handle the problem is to start with some preprocessing. Erosion makes the light semicircles grow, they merge and form circular regions. Inside of those lie dark objects captured by Pixcavator. They correspond to cells.
I did 15 rounds of erosion (I had to use Pixcavator’s feature because ImageJ does erosions for binary images only). 15 is a lot as you can see.

Then I analyzed the image with the following settings: contrast 27, saliency 6768. The erosions, however, created several artifacts that had to be unmarked.
This method is more straightforward. With it, however, it is harder to get good results without manual intervention.
Live cells in the next post.
For other examples, see our wiki.
August 20, 2008
As a suggestion from one of our users, we used Pixcavator to analyze floorplans. The task is very simple – measure the rooms.
Measuring irregular (or even regular) isn’t easy for a person because unless all rooms are rectangular one needs know some geometry. If the corners aren’t 90 degrees, you may have to measure them and then (OMG!) use trigonometry. The walls can also be curved. If the curves are known, all you need is calculus (OMG!!). It is unlikely that the formulas for the curves come with the floorplan, so digital image analysis seems inevitable.
The results are below. Of course, I had to “close” the doors first.
 

Calbration wasn’t addressed though.
August 3, 2008
Image analysis and computer vision is the extraction of meaningful information from digital images. One of the most prominent application of computer vision is in medical image processing - extraction of information for the purpose of making a medical diagnosis. It can be detection and measurement of tumors, arteriosclerosis or other malign changes or it can be identifying and counting cells, etc. Other main areas are industrial machine vision (automatic quality inspection, robotics, etc) and the military (missile guidance, battlefield awareness, etc).
The science of computer vision consists of an abundance of image analysis methods. These methods have been developed over the years for solving various but often narrow image analysis tasks. The result is that these methods are very task specific and seldom can be applied to a broad range of applications.
Our conclusion is then that as a discipline computer vision lacks a solid mathematical foundation.
Our long term goal is to design a comprehensive computer vision system “from first principles”. These principles come initially from one of the most fundamental fields of mathematics, topology. The idea is that just as mathematics rests on topology (and algebra), computer vision should be built on a firm topological foundation.
Algebraic topology is a well established discipline within mathematics. Its main computational tools have been implemented as software (CHomP, Computational Homology Project, and others). However, this theory and these tools are only applicable to binary images.
A framework for analysis of gray scale images has been under development. It is called Pixcavator. It includes both an image analysis software and an SDK. Pixcavator was into a product that also includes image management and database capabilities.
Some further issues remain. Future projects include the development of:
- protocols for applying the framework for specific tasks (e.g., tumor measurement),
- new methods that resolve the ambiguity of the boundaries of objects in gray scale images,
- integration of the existing image analysis methods into the framework,
- a framework for video (first binary, then gray scale, etc),
- a framework for color images (and other multichannel images),
- a framework for 3D images (first binary, then gray scale, etc).
July 29, 2008
During a retina inspection one of the most common pathology is Drusen deposits. Some computer assisted methods have been created to solve this problem and especially avoid the subjectivity of the doctors (”MD3RI a Tool for Computer-Aided Drusens Contour Drawing”) [1].
An image from this paper is below:

Pixcavator easily produces similar results:

Another example is ice cracking (thanks to Nikolay Makarenko for the idea). The image is analyzed with Pixcavator with settings 596-63.
An iceberg is born!
These kind of examples will appear in the wiki under Case studies.
July 27, 2008
The paper (PDF, 10 pages, 360K) describes the algorithm behind Pixcavator. The algorithm is presented in detail in the wiki but this is a new and improved exposition. I reconsidered some of the terminology, re-wrote the pseudocode, and improved illustrations. There is also a gap in the wiki - when an edge is added to the image, case 4 is missing. I’ll have to re-write a few articles. The presentation in the paper is less detailed (in terms of examples, images etc) but it is a bit more thorough.
Abstract: The paper provides a method of image segmentation of binary and gray scale images. For binary images, the method captures not only connected components but also the holes. For gray scale images, there are two kinds of “connected components” – dark regions surrounded by lighter areas or light regions surrounded by darker areas.
The long term goal is to design a computer vision system “from first principles”. The last sentence in the abstract is one such principle. Keep in mind (of course) that if every dark region surrounded by a lighter area is an object, it does not mean that every object is a dark region surrounded by a lighter area (or vice versa). In a way, these are “potential” objects and you still have to filter and/or group them to find the “real” ones. So there must be more first principles.
The paper does not go far beyond this stage. The main step is – all potential objects are recorded in the “topology graph” (“frame graph” in the wiki). Then only one method of filtering is presented (the one based on size).
All feedback is welcome.
June 22, 2008
The next version of Pixcavator is to be relased in a few weeks. A new feature that I want to preview is Coloring Objects. Once objects are found, you can do anything with them. So, it was easy to implement (the objects are colored randomly). And it’s definitely an amusing feature. It can also be helpful.
This tool can help you confirm that your image segmentation is correct:

A more intricate segmentation:
 
Coloring combined with background removal:

Something more amusing: recoloring objects and discovering a broken bone:

Just for fun:

For more examples, see our Image Gallery.
June 8, 2008
Microarrays (microplates etc) are plastic rectangles with a grid of “wells” containing biological materials. When another biological or chemical substance is added to these cells, the reaction is captured in digital images. For example, various concentrations of a chemical or a drug are added to the wells containing biological cells. The cells then start to divide faster, or slower, or simply die. The result affects the color of the substance in each cell. The image analysis automatically captures this data and draws conclusions. For example, you can pinpoint exactly at what concentration the drug becomes toxic. It’s like hundreds experiments in one! Appropriately, this is also called high throughput screening.
I’ve been working on a related project for one of our clients and I would like to present a modified version of Pixcavator. First it captures all the wells in the form of a list with all the data about them – in the usual way. Then it displays the gray level (intensity) for each well – according to its position in the microarray. Of course, instead of intensity you can display other characteristics of these objects: the average intensity, or the standard deviation, or the average color (for color images), etc.
The point of the post is this: the hard part of collecting the data about the objects is taken care of by Pixcavator - the rest is a easy exercise with the Pixcavator SDK.
May 6, 2008
I’ve tried a variety of images and, as the diagram indicates, the processing time appears to depend linearly on the number N of pixels in the image. Roughly, 40 seconds for each million pixels in the image. The testing was done on HP Pavilion laptop with Intel Core 2 Dual CPU T7500 2.2GHz.
I can’t improve my estimate though. It’s O(N^2) (link to the article in Wikipedia). That’s how we get it. The analysis algorithm works as follows:
- Each pixel is processed separately.
- For each of the N pixels an object is created and you may have to run around it to mark its edges.
- If this object is very thin and fills the image (like this spiral), its perimeter is proportional to N.
My feeling is that the images of this kind are unusual. Maps may be close, as well as microchips, or anything fractal-like. Cells are OK. 
Update: The estimate O(N^2) refers to the time of image analysis - creation of the graph. After that, you still have to run up and down this graph to come up with the output data. BTW, the size of the graph and, therefore, the memory depends linearly on N, O(N).
April 29, 2008
The next release of cellAnalyst due in May will have one especially nice feature. The hardest part new users find about getting the best from cellAnalyst (and Pixcavator) is discovering good analysis settings - size, contrast etc. Trial and error takes too long, expecially for larger images. Now the user will have the ability to analyze a small rectangle to find the measurements of objects - cells - he is interested in and then apply these settings for the analysis of the whole image. In more detail, this is described in the wiki under Analysis Strategy.
— Next Page » |
|
|