From Intelligent Perception
|Stereo vision is one of the ways of extracting 3D information from 2D images. However, the presence of the second image serves as an extra parameter, which you can think of as the third dimension.|
Taking two images of the same scene from two (slightly) different locations and then finding the same item in both of them gives you the distances to the item.
The image matching part is crucial and more challenging. For example, in the image on the right the corner of the cube may be a good pixel to choose, but how would the computer know? The rest of image is mainly featureless. The geometry is simple, below.
Suppose we established a match between a pixel P in image I and pixel Q in image J. Let's find the distance to whatever these pixels depict.
Two images with a red pixel in each image representing the same thing:
We need only to consider only the horizontal line through P,Q to find the distance to the object with the red dot:
View from above; the eyes are the foci of the cameras. Black lines are the images:
"Triangulation" (the word means something entirely different in topology): the object lies on the line from the focus of the camera and its mark on the image. Here the big red dot is the actual location of the object:
D is what we are looking for.
The pink, and the blue, triangles are similar. So,
f / x = D / a f / y = D / b.
a = xD / f b = yD / f.
L = x + a + b + y = x + xD / f + yD / f + y = x + D / f(x + y) + y = (x + y)(D / f + 1)
D = f(L / (x + y) - 1)
Now, d = x + y is simply the distance the pixel moves as we switch from one image to the other. It is called the disparity. Then
The lack of structure in the image often makes finding good reference points very hard. One of the tricks that may (partially) overcome this problem is projecting a structure (such as the grid lines below) on the scene.
For a video, see Stereo vision with hacked Kinect.