Recognition systems have to be trained. An expert is necessary to act as a teacher. He has to know what is what. But …
- Does he really know, or does he just believe that he knows?
- Or, does he know that he just believes?
- And, does he know how good his belief is?
Nils Nilsson, one of the early researchers in pattern recognition addresses this point. In his 1965 monograph on learning machines he studied the concept of networks of simple functions, TLU’s, Threshold Logic Units. Now, after a long career in AI, he has again written a book, this time discussing the belief of having knowledge.
Overconfident networks
Learning Machines was one of the first texts I read about pattern recognition as a master student. It was clearly written, in an accessible style. For some people the book was boring. No magic, no perspective of simulating the brain. Nilsson discussed the possibilities and limitations of training learning machines of single and multiple TLU’s, in well-defined mathematical terms. What struck me most was the fact, as he explained clearly, that a sufficiently large network can always perfectly separate the classes of a finite training set. Consequently, there is no reason why such a classifier should generalize. It is another version of the Cover’s study on complexity .
For a long time I’ve used Nilsson’s observation against the idea that sufficiently large artificial neural networks can solve any problem. The standard artificial neuron used in such networks is a generalization of the TLU. So, well trained neural networks may end up in a perfect and, at the same, time bad, overtrained, solution. For that reason I was very glad to learn that Nilsson’s book was republished in 1990, at the moment it was most needed.
The main reason why it is almost impossible to train a multi-layer network of TLU’s and why it can be done by its generalization, the multilayer network of artificial neurons, is that the neuron has a continuous output function where the TLU has a threshold. Consequently the neuron does not result in a sharp yes / no decision but in a soft output in the (0,1) interval. The neural network yields hesitant answers where a multilayer TLU gives strict but sometimes wrong answers.
But here we walk on the very thin ice. Neural networks remain hesitant as long as they don’t perfectly separate the training set. At the moment they find the perfect solution they tend to become overconfident and may show crisp outputs. Nilsson’s networks of TLUs could only reach this state if it was on purpose constructed as such by the designer. There was no procedure to reach this automatically. The new, soft neurons, however, enabled advanced training by which this dangerous state could be reached by accident. The network will behave as somebody who is confident about his statements but who is often wrong in reality.
Knowledge or belief
In his new book Nilsson emphasizes that most, if not all, knowledge is just a belief. This seems exaggerated. I know that I am born and that my life will end. I know it is in the midst of the night and that 12 hours ago it was midday. This knowledge may be called a belief, but then the concept of knowledge disappears. In relation to pattern recognition the distinction between knowledge and belief is very relevant.
Traditional pattern recognition procedures are based on the availability of a knowledgeable teacher. He knows the true classes of the training examples. But how does he know them? Even in very strictly controlled environments mistakes are made by human decision makers. Letters are delivered at the wrong address. The wrong amount of money is sent to the wrong account. The identities of people are mixed up. The wrong surgery is made on the wrong person or on the wrong organ. People, even specialists, make recognition errors. So, the set of training examples will contain errors.
Even in case a teacher is aware (knows!) that he may be mistaken, and just believes that he makes the right decision, this is hardly of use. Most procedures are not able to handle this as teachers are usually not able to quantify a state of belief. The fact that we believe and do not know may be detected by having a lack of arguments.
If I see a person in a crowd of people and recognize him as John I may have no other arguments than just a resemblance that is hard to specify. I directly know, but this is in fact a belief. If, on the other hand, I am discussing with John the concert we are attending together and some other acquaintance comes along I can introduce John with certainty. I know that the guy next to me is John and I know that I am right. I have very good arguments: we have already been for hours together and we are having a discussion on that very moment.
If teachers can distinguish between believing and knowing we might want to design pattern recognition such that they can make the difference as well. Or is this overambitious? Can an artificial system really know? Can it even believe? Or do we just believef that its response is correct? We have to read what Nilsson may tell us about this topic. Anyway, it is intriguing that after a long career of working on learning machines this point has been reached.
Filed under: Classification • Overtraining • PR System