<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Computer Vision For Dummies &#187; rants</title>
	<atom:link href="http://inperc.com/blog2/index.php/category/rants/feed/" rel="self" type="application/rss+xml" />
	<link>http://inperc.com/blog2</link>
	<description>Computer vision and image analysis for newcomers</description>
	<lastBuildDate>Fri, 18 Jun 2010 00:00:31 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>A common view of digital imaging</title>
		<link>http://inperc.com/blog2/2009/10/22/a-common-view-of-digital-imaging/</link>
		<comments>http://inperc.com/blog2/2009/10/22/a-common-view-of-digital-imaging/#comments</comments>
		<pubDate>Thu, 22 Oct 2009 14:00:55 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[computer vision/machine vision/AI]]></category>
		<category><![CDATA[image processing/image analysis software]]></category>
		<category><![CDATA[rants]]></category>

		<guid isPermaLink="false">http://inperc.com/blog2/?p=223</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p><a title="ImageShack - Image And Video Hosting" href="http://img34.imageshack.us/i/wyoolk.jpg/" target="_blank"><img src="http://img34.imageshack.us/img34/3828/wyoolk.jpg" border="0" alt="" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://inperc.com/blog2/2009/10/22/a-common-view-of-digital-imaging/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Object recognition demo from Numenta</title>
		<link>http://inperc.com/blog2/2009/02/16/object-recognition-demo-from-numenta/</link>
		<comments>http://inperc.com/blog2/2009/02/16/object-recognition-demo-from-numenta/#comments</comments>
		<pubDate>Mon, 16 Feb 2009 18:01:19 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[computer vision/machine vision/AI]]></category>
		<category><![CDATA[image search]]></category>
		<category><![CDATA[rants]]></category>
		<category><![CDATA[reviews]]></category>

		<guid isPermaLink="false">http://inperc.com/blog2/2009/02/16/object-recognition-demo-from-numenta/</guid>
		<description><![CDATA[The link to this demo was sent to me by Ricardo Niederberger Cabral (thanks!). The demo program is called Vision4 and was created by Numenta. This is its main point:
This program demonstrates some capabilities of Numenta&#8217;s Hierarchical Temporal Memory (HTM) technology applied to visual object recognition. .. The HTM network contained in this demo has [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.numenta.com/about-numenta/technology/vision4-demo.php">link</a> to this demo was sent to me by Ricardo Niederberger Cabral (thanks!). The demo program is called Vision4 and was created by Numenta. This is its main point:</p>
<blockquote><p>This program demonstrates some capabilities of Numenta&#8217;s Hierarchical Temporal Memory (HTM) technology applied to visual object recognition. .. The HTM network contained in this demo has been trained to recognize four types of objects: cell phones, sailboats, cows, and rubber ducks.</p></blockquote>
<p>Every image is given four ratings. Each represents how much the image resembles one of the four types.</p>
<p>As you can see, the goal is modest and there are no <a href="http://inperc.com/blog2/2009/02/10/image-search-engines-keep-launching-milabra/">unsubstantiated claims</a> of how this is ready to be applied in real life (and don’t get me started on academic publications!). This is refreshing. The program is also fun to play with. You can load your own images, you can add noise, blur etc to the images and see the effect on the recognition. The recognition results are often good and when they aren’t, it’s still interesting.</p>
<p>For serious purposes, it is unclear where this is going though.</p>
<p>It’s fine with me that there are only four categories – just one would be enough to test the concept. It does not bother me when a face is rated high in the cow category and another face high in the duck category. My main complaint is the instability of recognition under image transformations. For example, after turning “sailboat” a few degrees it became “cell phone”. A few degrees more and it becomes mixed &#8211; half “cow” (first image below). Adding noise, occlusion, etc has similar effect (second image).</p>
<p><img style="width: 287px; height: 199px" height="199" src="http://inperc.com/wiki/images/0/0c/Numenta_screenshot_1.jpg" width="287" /><img style="width: 282px; height: 197px" height="197" src="http://inperc.com/wiki/images/5/5f/Numenta_screenshot_2.jpg" width="282" /></p>
<p>Certainly, one does not expect rotations to affect image recognition. Meanwhile, a mixed recognition is a failed recognition and should be presented as such.</p>
<p>I am certainly biased here. I don’t believe in “build[ing] machines that work on principles used by the brain”. I don’t believe in trying to imitate brain and <a href="http://inperc.com/blog2/index.php?s=brain+inspired">I’ve written a few times about that</a>. Traditionally, a scientist tries to understand nature by observing it, analyzing it, etc. Instead, it is suggested to try to understand nature by first understanding how the brain understands it? Seems like a roundabout to me, bordering on a vicious circle. I also have serious reservations about the use of <a href="http://inperc.com/wiki/index.php?title=Machine_learning_in_computer_vision">machine learning in computer vision</a>.</p>
<p>Annoying bug: every time I start it, the program would turn on my webcam and it would keep it on even after I shut it down.</p>
]]></content:encoded>
			<wfw:commentRss>http://inperc.com/blog2/2009/02/16/object-recognition-demo-from-numenta/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Image search engines keep launching: Milabra</title>
		<link>http://inperc.com/blog2/2009/02/10/image-search-engines-keep-launching-milabra/</link>
		<comments>http://inperc.com/blog2/2009/02/10/image-search-engines-keep-launching-milabra/#comments</comments>
		<pubDate>Tue, 10 Feb 2009 02:26:43 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[computer vision/machine vision/AI]]></category>
		<category><![CDATA[image search]]></category>
		<category><![CDATA[rants]]></category>
		<category><![CDATA[reviews]]></category>

		<guid isPermaLink="false">http://inperc.com/blog2/2009/02/10/image-search-engines-keep-launching-milabra/</guid>
		<description><![CDATA[TechCrunch is happy to do PR for another visual search company: Milabra.
Milabra claims that it can categorize images, “from puppies to porn”:
…when searching through a library of images for dogs, Milabra doesn’t need to constantly compare each image with its database of known ‘dog’ images &#8211; instead, it can look for traits that it has learned to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.techcrunch.com/2009/02/02/milabra-b2b-image-recognition-service-learns-to-find-anything-from-puppies-to-porn/">TechCrunch</a> is happy to do PR for another visual search company: Milabra.</p>
<p>Milabra claims that it can categorize images, “from puppies to porn”:</p>
<blockquote><p><em>…when searching through a library of images for dogs, Milabra doesn’t need to constantly compare each image with its database of known ‘dog’ images &#8211; instead, it can look for traits that it has learned to associate with “doggyness”…</em></p></blockquote>
<p>The two examples in the demo are “beach” and “dog”. You upload an image with people on the beach, click “Search” and you get a page of beach photos&#8230; Wait, you don’t get to upload anything – this is just a video! So, there is no way to test their claims. Unfortunately, this is <a href="http://inperc.com/wiki/index.php?title=Visual_image_search_engines">not unusual</a> in this area and in computer vision in general.</p>
<p>If your software can recognize a puppy in an image (95% of the time as you claim), it should be easy for you to demonstrate this ability. Create a little web application (or desktop, I don’t care) that allows me to upload my own image which is then identified as “puppy” (or “tree”, or “street”, I don&#8217;t care). There is no such program. Why not? The answer is obvious.</p>
<p>In response to some skepticism, this is what one of the founders wrote:</p>
<blockquote><p><em>&#8230;if you think that this cannot be done, then you are completely clueless: object classifiers have been made for more than 10 years now at leading CS labs around the world.</em></p></blockquote>
<p>That reminds me of the episode of <em>Seinfeld</em> when Kramer decides to build <strong>levels</strong> in his apartment:</p>
<blockquote><p><em>KRAMER: It&#8217;s a simple job. Why, you don&#8217;t think I can?</em></p>
<p><em>JERRY: Oh, no. It&#8217;s not that I don&#8217;t think you can. I know that you can&#8217;t, and I&#8217;m positive that you won&#8217;t.</em></p></blockquote>
<p>This is Millabra’s team:</p>
<ul>
<li>MBA</li>
<li>MS in Biological Engineering and PhD in neuroscience</li>
<li>MS in Computer Science and Ph.D. in Biophysics</li>
<li>Professional Project Manager</li>
<li>Expert in computer networking, user interface design</li>
</ul>
<blockquote><p><em>JERRY: I don&#8217;t see it happening.</em></p></blockquote>
<p>And what about TechCrunch? Same story again and again since I started to keep track a couple of years ago: they publish an enthusiastic report about a company doing image analysis/search/recognition, and then silence. The company slips into obscurity and there is no follow-up, nothing. These people never learn…</p>
<p>The people who do seem to learn, slowly, are the investors: Riya (like.com) $20 million or more, Polar Rose $5 million, Milabra $1.4 million. Or maybe this is just the effect of the economic downturn?</p>
]]></content:encoded>
			<wfw:commentRss>http://inperc.com/blog2/2009/02/10/image-search-engines-keep-launching-milabra/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Algebraic topology and digital image analysis</title>
		<link>http://inperc.com/blog2/2008/12/22/algebraic-topology-and-digital-image-analysis/</link>
		<comments>http://inperc.com/blog2/2008/12/22/algebraic-topology-and-digital-image-analysis/#comments</comments>
		<pubDate>Mon, 22 Dec 2008 22:23:29 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[computer vision/machine vision/AI]]></category>
		<category><![CDATA[mathematics]]></category>
		<category><![CDATA[rants]]></category>

		<guid isPermaLink="false">http://inperc.com/blog2/2008/12/22/algebraic-topology-and-digital-image-analysis/</guid>
		<description><![CDATA[In my last paper, I made a comment about topology of binary images: “These issues have been studied over the last 100 years or so and they are well understood”. It was pointed out to me that digital image analysis didn’t start until the 1960s, so how come?
Let me set the record straight.
The history is [...]]]></description>
			<content:encoded><![CDATA[<p>In my <a href="http://inperc.com/blog2/2008/11/23/a-graph-non-tree-representation-of-the-topology-of-a-gray-scale-image-paper/">last paper</a>, I made a comment about topology of binary images: “These issues have been studied over the last 100 years or so and they are well understood”. It was pointed out to me that digital image analysis didn’t start until the 1960s, so how come?</p>
<p>Let me set the record straight.</p>
<p><img src="http://inperc.com/wiki/images/2/21/Tiles3.JPG" align="right" />The history is this. Algebraic topology was founded by Poincare around 1900 (the title of his book “Analysis Situs” converted from Latin to Greek turns into “topology”). There was no talk about binary images, obviously. What they studied was cell complexes, collections of cells attached to each other in an <a href="http://en.wikipedia.org/wiki/Cell_complex">appropriate way</a>. The cells were initially only triangular but later of any shape. It was also informally assumed that all topological theorems are independent of the <a href="http://inperc.com/wiki/index.php?title=Cell_decomposition_of_images">cell decomposition</a> or representation. This fact was formally proven by the 1950s, roughly. By then all the issues had been settled and algebraic topology had become one of the central disciplines in mathematics. The fist monographs were written in the 1930s (Alexandroff&#038;Hopf) and first (graduate) textbooks were written in the 1960s (Hilton&#038;Wiley, Mac Lane, Spanier, and many more).</p>
<p>Undergraduate books are rare (one that I like the most and use is <em>Topology of Surfaces</em> by Kinsey). Courses are even rarer. As a result, computer scientists (and even mathematicians) are often unfamiliar with the well established ways of dealing with even the most elementary topological issues (and I mean <em>really</em> elementary: how many objects, which ones have holes or tunnels and how many, etc.)</p>
<p>Even though relevant papers pop up once in a while, the connection of image analysis to algebraic topology is not a common knowledge among practitioners of computer vision and image analysis. I know this from personal experience…</p>
<p>The main reference on the subject is <em>Computational Homology</em> by Kaczynski, Mischaikow, and Mrozek. This is still very much a graduate text. Hopefully, <a href="http://inperc.com/wiki/index.php?title=Main_Page">our wiki</a> is more accessible.</p>
]]></content:encoded>
			<wfw:commentRss>http://inperc.com/blog2/2008/12/22/algebraic-topology-and-digital-image-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Image search engines still keep launching: Incogna</title>
		<link>http://inperc.com/blog2/2008/11/28/image-search-engines-still-keep-launching-incogna/</link>
		<comments>http://inperc.com/blog2/2008/11/28/image-search-engines-still-keep-launching-incogna/#comments</comments>
		<pubDate>Fri, 28 Nov 2008 01:19:23 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[image search]]></category>
		<category><![CDATA[rants]]></category>
		<category><![CDATA[reviews]]></category>

		<guid isPermaLink="false">http://inperc.com/blog2/2008/11/28/image-search-engines-still-keep-launching-incogna/</guid>
		<description><![CDATA[The screenshot tells the whole story. The image of a table in the upper left corner is the query image. The rest are supposed to be “similar”. What is the image filled with numbers doing here you ask? Hmm… Oh yes, it’s a table of numbers!
Previous posts on the topic are here.

]]></description>
			<content:encoded><![CDATA[<p>The screenshot tells the whole story. The image of a table in the upper left corner is the query image. The rest are supposed to be “similar”. What is the image filled with numbers doing here you ask? Hmm… Oh yes, it’s a <em>table</em> of numbers!</p>
<p>Previous posts on the topic are <a href="http://inperc.com/blog2/category/image-search/">here</a>.</p>
<p><img style="width: 605px; height: 393px" height="393" src="http://inperc.com/wiki/images/b/b4/Incogna-screenshot.jpg" width="605" /></p>
]]></content:encoded>
			<wfw:commentRss>http://inperc.com/blog2/2008/11/28/image-search-engines-still-keep-launching-incogna/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Image search engines still keep launching</title>
		<link>http://inperc.com/blog2/2008/09/15/image-search-engines-still-keep-launching/</link>
		<comments>http://inperc.com/blog2/2008/09/15/image-search-engines-still-keep-launching/#comments</comments>
		<pubDate>Mon, 15 Sep 2008 00:08:47 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[computer vision/machine vision/AI]]></category>
		<category><![CDATA[image search]]></category>
		<category><![CDATA[rants]]></category>

		<guid isPermaLink="false">http://inperc.com/blog2/2008/09/15/image-search-engines-still-keep-launching/</guid>
		<description><![CDATA[Last time I noticed that image-to-image search engines launch in batches was in May. Of course, &#8220;launch&#8221; usually means private beta. I also found it interesting that there are so many of them and yet they never mention or discuss each other.
Now, another batch &#8211; within a few days from each other.
First, Gazopa (what an [...]]]></description>
			<content:encoded><![CDATA[<p>Last time I noticed that image-to-image search engines launch in batches was in <a href="http://inperc.com/blog2/2008/05/07/image-search-engines-keep-launching/">May</a>. Of course, &#8220;launch&#8221; usually means private beta. I also found it interesting that there are <a href="http://inperc.com/wiki/index.php?title=Visual_image_search_engines">so many of them</a> and yet they never mention or discuss each other.</p>
<p>Now, another batch &#8211; within a few days from each other.</p>
<p>First, <a href="http://www.businesswire.com/portal/site/google/?ndmViewId=news_view&#038;newsId=20080910006523&#038;newsLang=en">Gazopa</a> (what an awful name!) from Hitachi. Private beta.</p>
<p>Second, <a href="http://www.netimperative.com/news/2008/september/02/image-based-search-engine-imprezzeo-launches">Imprezzeo</a>. “Coming soon”.</p>
<p>Third, <a href="http://picasaweb.google.com">Picasa</a> launched a face recognition feature. By most accounts it does not work well.</p>
<p>Fourth, <a href="http://www.itnewsonline.com/showprnstory.php?storyid=10877">VideoSurf</a> “Unveils First Computer Vision Search for Video”. Private beta.</p>
<p>Finally, <a href="http://inperc.com/wiki/index.php?title=Visual_image_search_engines#Idee">Idee</a> updated its TinEye. Apparently, now it can match an image and its rotated version. That was my main problem with the application.</p>
]]></content:encoded>
			<wfw:commentRss>http://inperc.com/blog2/2008/09/15/image-search-engines-still-keep-launching/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>What is image segmentation?</title>
		<link>http://inperc.com/blog2/2008/06/15/what-is-image-segmentation/</link>
		<comments>http://inperc.com/blog2/2008/06/15/what-is-image-segmentation/#comments</comments>
		<pubDate>Sun, 15 Jun 2008 14:11:46 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[computer vision/machine vision/AI]]></category>
		<category><![CDATA[mathematics]]></category>
		<category><![CDATA[rants]]></category>
		<category><![CDATA[reviews]]></category>

		<guid isPermaLink="false">http://inperc.com/blog2/2008/06/15/what-is-image-segmentation/</guid>
		<description><![CDATA[Let’s go to Wikipedia. The first sentence is:
“Image segmentation is partitioning a digital image into multiple regions”.
This description isn’t what I would call a definition as it suffers from a few very serious flaws.
First, what does “partitioning” mean? A partition is a representation of something as the union of non-overlapping pieces. Then partitioning is a [...]]]></description>
			<content:encoded><![CDATA[<p>Let’s go to <a href="http://en.wikipedia.org/wiki/Image_segmentation" target="_blank">Wikipedia</a>. The first sentence is:</p>
<p>“Image segmentation is partitioning a digital image into multiple regions”.</p>
<p><img style="width: 266px; height: 247px" height="247" hspace="5" src="http://inperc.com/wiki/images/5/5a/Drosophila.JPG" width="266" align="right" />This description isn’t what I would call a definition as it suffers from a few very serious flaws.</p>
<p>First, what does “partitioning” mean? A <a href="http://en.wikipedia.org/wiki/Partition" target="_blank">partition</a> is a representation of something as the union of non-overlapping pieces. Then partitioning is a way of obtaining a partition. The part about the regions not overlapping each other is missing elsewhere in the article: “The result of image segmentation is a set of regions that collectively cover the entire image” (second paragraph).</p>
<p>Then, is image segmentation a process (partitioning) or the output of that process? The description clearly suggests the former. That’s a problem because it emphasizes “how” over “what”. That suggests human involvement in the process that is supposed to be objective and reproducible.</p>
<p>Next, a segmentation is a result of partitioning but not every partitioning results in a segmentation. A segmentation is supposed to have something to do with the content of the image.</p>
<p><img style="width: 267px; height: 252px" height="252" hspace="5" src="http://inperc.com/wiki/images/0/0f/Drosophila_cellprofiler.JPG" width="267" align="right" />More nitpicking. Do the regions have to be “multiple”? The image may be blank or contain a single object. Does the image has to be “digital”? Segmentation of analogue images makes perfect sense.</p>
<p>A slightly better “definition” I could suggest is this:</p>
<p><em>A segmentation of an image is a partition of the image that reveals some of its content.</em></p>
<p>This is far from perfect. First, strictly speaking, what we partition isn’t the image but what’s often called its “carrier” – the rectangle itself. Also, the background is a very special element of the partition. It shouldn’t count as an object…</p>
<p>Another issue is with the output of the analysis. The third sentence is “Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images.” It is clear that “boundaries” should be read “their boundaries” here &#8211; boundaries of the objects. The image does not contain boundaries – it contains objects and objects have boundaries. (A boundary without an object is like Cheshire Cat’s grin.)</p>
<p>Once the object is found, finding its <a href="http://en.wikipedia.org/wiki/Boundary_%28topology%29" target="_blank">boundary</a> is an easy exercise. This does not work the other way around. The article says: “The result of image segmentation [may be] a set of contours extracted from the image.” But <a href="http://en.wikipedia.org/wiki/Contour" target="_blank">contours</a> are simply level curves of some function. They don’t have to be closed (like a circle). If a curve isn’t closed, it does not enclose anything – it’s a boundary without an object! More generally, searching for boundaries instead of objects is called “edge detection”. In the presence of noise, one ends up with just a bunch of pixels – not even curves… And by the way, the language of “contours”, “edges”, etc limits you to 2D images. Segmentation of 3D images is out of the window?</p>
<p>I plan to write a few posts about specific image segmentation methods in the coming weeks.</p>
]]></content:encoded>
			<wfw:commentRss>http://inperc.com/blog2/2008/06/15/what-is-image-segmentation/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Pattern recognition in computer vision, part 3</title>
		<link>http://inperc.com/blog2/2008/06/02/pattern-recognition-in-computer-vision-part-3/</link>
		<comments>http://inperc.com/blog2/2008/06/02/pattern-recognition-in-computer-vision-part-3/#comments</comments>
		<pubDate>Mon, 02 Jun 2008 18:48:39 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[computer vision/machine vision/AI]]></category>
		<category><![CDATA[image search]]></category>
		<category><![CDATA[mathematics]]></category>
		<category><![CDATA[rants]]></category>
		<category><![CDATA[reviews]]></category>

		<guid isPermaLink="false">http://inperc.com/blog2/2008/06/02/pattern-recognition-in-computer-vision-part-3/</guid>
		<description><![CDATA[In part 1 and part 2 I discussed a paper on face recognition and the methods it relies on. Recall, each 100&#215;100 gray scale image is a table of 100&#215;100 = 10,000 numbers that can be rearranged into a 10,000-vector or a point in the 10,000-dimensional Euclidean space. As we discovered in part 2, using the [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://inperc.com/blog2/2008/05/02/pattern-recognition-in-computer-vision-part-1/">part 1</a> and <a href="http://inperc.com/blog2/2008/05/12/pattern-recognition-in-computer-vision-part-2/">part 2</a> I discussed a <a href="http://www.wired.com/science/discoveries/news/2008/03/new_face_recognition" target="_blank">paper</a> on face recognition and the methods it relies on. Recall, each 100&#215;100 gray scale image is a table of 100&#215;100 = 10,000 numbers that can be rearranged into a 10,000-vector or a point in the 10,000-dimensional Euclidean space. As we discovered in part 2, using the closedness of these points as a measurement of similarity between images ignores the way the pixels are attached to each other. A deeper problem is that unless the two images are aligned first, there is no way to use this representation to discover that they depict the same or similar thing. The proper term for this alignment is <em>image registration</em>.</p>
<p>The similarity between images represented this way will be entirely based on their overlap. As result, the distance can be large even between images that we would consider similar. In part 2 we had examples of one-pixel images. More realistic examples are these:</p>
<ul>
<li>image with an object in one corner onewith the same object in another corner;</li>
<li>image of a cross and the same cross turned 45 degrees;</li>
<li>etc.</li>
</ul>
<p>Back to face identification. As the faces are points in the 10,000-dimensional space, these points should be grouped somehow. The point is that all images of the same individual should belong to one group and not any other. It is common to consider “clusters” of points, i.e., groups formed of point close to each other. This was discussed above.</p>
<p>Now, in this paper the approach is different: a new point (the face to be identified) is represented as a <a href="http://inperc.com/wiki/index.php?title=Linear_combination">linear combination</a> of all other points (all faces in the collection).</p>
<p>As we know from linear algebra, this implies the following. (1) the entire collection has to be <a href="http://inperc.com/wiki/index.php?title=Linear_independence">linearly dependent</a>, (2) you can find a subcollection that adds up to 0! In other words, everything cancels out and you end up with a blank photo. Is it possible? If the dimension is low or the collection is large (the images are small relative to the number of images), maybe. What if the collection is small? (It is small – see below.) It seems unlikely. Why do I think so? Consider this very extreme case: you may need the negative for each face to cancel it: same shape with dark vs. light hair, skin, eyes, teeth (!).…</p>
<p>Second, the new image in the collection has to be a linear combination of training images of the same person. In other words, any image of person A is represented as a linear combination of other images of A in the collection, ideally. (More likely this image is supposed to be closer to the linear space spanned by these images.) The approach could only work under the assumption that <strong>people are linearly independent</strong>:</p>
<p><em>No face in the collection can be represented as a linear combination of the rest of the faces. </em></p>
<p>It’s a bold assumption.</p>
<p>If it is true, then the challenge is to make the algorithm efficient enough. The idea is that you don’t need all of those pixels/features and they in fact could be random. That must be the point of the paper.</p>
<p>The testing was done on two collections with several thousand images each. That sounds OK, but the number of individuals in these collections was 38 and 114!</p>
<p>To summarize, there is nothing wrong with the theory but its assumptions are unproven and the results are untested.</p>
<p>P.S. It’s strange but after so many years computer vision still looks like an academic discipline and not an industry.</p>
]]></content:encoded>
			<wfw:commentRss>http://inperc.com/blog2/2008/06/02/pattern-recognition-in-computer-vision-part-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pattern recognition in computer vision, part 2</title>
		<link>http://inperc.com/blog2/2008/05/12/pattern-recognition-in-computer-vision-part-2/</link>
		<comments>http://inperc.com/blog2/2008/05/12/pattern-recognition-in-computer-vision-part-2/#comments</comments>
		<pubDate>Mon, 12 May 2008 18:31:42 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[computer vision/machine vision/AI]]></category>
		<category><![CDATA[image search]]></category>
		<category><![CDATA[mathematics]]></category>
		<category><![CDATA[rants]]></category>

		<guid isPermaLink="false">http://inperc.com/blog2/2008/05/12/pattern-recognition-in-computer-vision-part-2/</guid>
		<description><![CDATA[Let’s review part 1 first. If you have a 100&#215;100 gray scale image, it is simply a table of 100&#215;100 = 10,000 numbers. You rearrange the rows of this table into a 10,000-vector and represent the image as a point in the 10,000-dimensional Euclidean space. This enables you to measure distances between images, discover patterns, [...]]]></description>
			<content:encoded><![CDATA[<p>Let’s review <a href="http://inperc.com/blog2/2008/05/02/pattern-recognition-in-computer-vision-part-1/">part 1</a> first. If you have a 100&#215;100 gray scale image, it is simply a table of 100&#215;100 = 10,000 numbers. You rearrange the rows of this table into a 10,000-vector and represent the image as a point in the 10,000-dimensional Euclidean space. This enables you to measure distances between images, discover patterns, match images, etc. Now, what is wrong with this approach?</p>
<p>Suppose A, B, and C are images with a single black pixel in the left upper corner, next to it, and the right bottom corner respectively. Then, the distances will be the equal: d(A,B) = d(B,C) = d(C,A), no matter how you define the distance d(,) between points in this space. The conclusion: if A and B are in the same cluster, then so is C. So adjacency of pixels and distance between them is lost in this representation!</p>
<p>Of course this can be explained, as follows. The three images are essentially blank so it’s not surprising that they are close to the blank image and to each other. So as long as pixels are “small” the difference between these four images is justifiably negligible.</p>
<p>Of course, “small” pixels means “small” with respect to the size of the image. This means high resolution. High resolution means larger image (for the same “physical” object), which means higher dimension of the Euclidean space, which means higher computational costs. Not a good sign.</p>
<p>To take this line of thought all the way to the end, we have to ask the question: what if we keep increasing resolution?</p>
<p>The image will simply turn into an exact copy of the “physical” object. Initially, the image is a table of numbers. Now, you can think of the table as a rectangle subdivided into small squares, then the image is a function to the reals constant on each of these squares. As the resolution grows, the rectangle remains the same but the squares become smaller. In the end we have a &#8211; possibly continuous – function (as the <a href="http://en.wikipedia.org/wiki/Pointwise_convergence" target="_blank">limit of this sequence of functions</a>). This is the “real” image and the rest are its approximations.</p>
<p>It’s not as clear what happens to the representations of images in the Euclidean space. The dimension of this space grows and in the end becomes infinite! It also seems that this new space should be made of infinite strings of numbers. That does not work out.</p>
<p>Indeed, consider this (“real”) image: a white square with a black upper left quarter. Let’s represent it first as a 2&#215;2 image. Then in the 4-dimensional Euclidean space this image is (1,0,0,0). Now let’s increase the resolution. If this is a 4&#215;4 image, it is (1,1,0,0,1,1,0,0,..,0) in the 16-dimensional space. In the 32-dimensional space it’s (1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,…,0). You can see the pattern. But what is the end result (as the <a href="http://en.wikipedia.org/wiki/Limit_of_a_sequence" target="_blank">limit of this sequence of points</a>)? It can’t be (1,1,1,…), can it? It definitely isn’t the original image. That image can’t even be represented as a string of numbers, not in any obvious way…</p>
<p>OK, these are just <em>signs</em> that there may be something wrong with this approach. A more tangible problem is that unless the two images are aligned first, there is no way to use this representation to discover that they depict the same or similar thing. About that in the next post.</p>
]]></content:encoded>
			<wfw:commentRss>http://inperc.com/blog2/2008/05/12/pattern-recognition-in-computer-vision-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Image search engines keep launching</title>
		<link>http://inperc.com/blog2/2008/05/07/image-search-engines-keep-launching/</link>
		<comments>http://inperc.com/blog2/2008/05/07/image-search-engines-keep-launching/#comments</comments>
		<pubDate>Wed, 07 May 2008 22:25:01 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[image search]]></category>
		<category><![CDATA[news]]></category>
		<category><![CDATA[rants]]></category>

		<guid isPermaLink="false">http://inperc.com/blog2/2008/05/07/image-search-engines-keep-launching/</guid>
		<description><![CDATA[After Google “launched” its ImageRank - by presenting a paper about it, now there are two more.
First, Idée “publicly launched” its image search engine (report here). If you want to try it, they’ll put you on a waiting list. How is it different from what we saw before?
Second, “Pixsta launches image search engine” (report here). Testing [...]]]></description>
			<content:encoded><![CDATA[<p>After Google “launched” its <a href="http://inperc.com/blog2/2008/04/29/googles-new-image-search/">ImageRank</a> - by presenting a paper about it, now there are two more.</p>
<p>First, Idée “publicly launched” its image search engine (report <a href="http://www.financialpost.com/small_business/story.html?id=494118">here</a>). If you want to try it, they’ll put you on a waiting list. How is it different from <a href="http://www.inperc.com/wiki/index.php?title=Image_search#Idee">what we saw before</a>?</p>
<p>Second, “Pixsta launches image search engine” (report <a href="http://www.netimperative.com/news/2008/april/28/pixsta-launches-image-search-engine">here</a>). Testing is also closed. What is the difference from <a href="http://www.inperc.com/wiki/index.php?title=Image_search#Pixsta">what we saw before</a>?</p>
<p>The only good thing here is that I discovered a better term for visual image search, CBIR, etc. It’s &#8220;<strong>image-to-image search</strong>&#8220;, as opposed to text-to-text and text-to-image we are familiar with.</p>
]]></content:encoded>
			<wfw:commentRss>http://inperc.com/blog2/2008/05/07/image-search-engines-keep-launching/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
