Why machine learning never works
I am exaggerating of course – it works sometimes. In my view, however, it can only work under very narrow circumstances. I explain this below.
In response to my previous post Bob Mottram wrote “Using many learning algorithms (genetic, neural, etc) it is very easy to categorise images on some very trivial basis. In theory the larger the data set the harder the system has to work and the less likely it is to find a quick “cheat”, but it all depends upon how features are being represented in the system.”
As I have expressed this view many times, I want to use this opportunity to clarify my thinking a bit. In my post I described machine vision as follows: “collect as much information about the image as possible and then let the computer sort it out by some kind of clustering”. That seems like a good plan. The test I suggested previously is to teach the computer to add without revealing that it’s dealing with numbers, i.e., symbolically. This time, let’s test it instead on a very simple computer vision problem.
Given an image, find out whether it contains one object or more.
Let’s assume that the image is binary so that the notion of “object” is unambiguous. Essentially, you have one object if any two 1’s can be connected by a sequence of adjacent 1’s. Anyone with a minimal familiarity with computer vision (some even without) can write a program that solves this problem. But that’s irrelevant here because the computer is supposed to learn on its own, as follows.
- You have a computer with some general purpose machine learning program (meaning that no person provides insight into the nature of the problem).
- Then you show the computer images one by one and tell it which ones contain one object.
- After you’ve had enough images, the computer will gradually start to classify images on its own.
There is no reason to say that this can never work. But I have a couple of questions.
First, why gradually? Why not give the computer all information at once and instantly become good at the task? One drawback is that you can’t keep tweaking the algorithm. But I think the main reason why machine learning is popular is that everyone likes to teach. It’s fun to see your child/student/computer learn something new and become better and better at it. This is very human – and also totally misplaced.
My second question is, This can work but what would guarantee that it will? More narrowly, what information about the image should be passed to the computer to ensure that the computer will succeed more then 50% of the time, sooner or later?
Option 1: we pass all information. Then, this could work. For example, the computer represents every 100×100 image a point in the 10,000-dimensional space and then runs clustering. First, this may be impractical and, second,.. does it really work? Will the one-object images form a cluster? Or maybe a hyperplane? One thing is clear, these images will be very close to the rest because the difference between one object and two may be just a single pixel.
Option 2: we pass some information. What if you pass information that cannot possibly help to classify the images the way we want? For example, you may pass just the value of the (1,1) pixel, or the number of 1’s, or 0’, or their proportion. Who will make sure that the relevant information (adjacency) isn’t left out? Computer doesn’t know what is relevant – it’s hasn’t learnt “yet”. If it’s the human, then he would have to solve the problem first, if not algorithmically then at least mathematically. Then the point of machine learning as a way to solve problems is lost.
BTW, this “simple” challenge of counting the number of objects may also be posed for texts instead of images. That one is for Google and the people who dream of “semantic web”!
My conclusion is, don’t apply machine learning in image search, image recognition, etc, anywhere software is expected to replace a human and solve a problem.
So, when can machine learning be useful? In cases where the problem can’t be, or hasn’t been, solved by a human. For example, a researcher is trying to find a cause of a certain phenomenon and there is a lot of unexplained data. Then - with some luck - machine learning may suggest to the researcher a way to proceed. “Pattern recognition” is a better name then.