The Russian scientist A. Lerner published in 1972 a paper under the title: “A crisis in the theory of Pattern Recognition”. This is definitely a title that attracts the attention of researchers interested in the history of the field. What was it to have appeared as the crisis? The answer is surprising, in short it is the following: automatic pattern recognition is solved, but it does not help us to understand the source of human intuition.
The book in which the paper was published (Frontiers of Pattern Recognition, Watanabe, Ac. Press, 1972) was the result of a workshop. Unfortunately, Lerner was unable to present his insights. It was very difficult in those days for a Russian scientist to get permission to travel. Nevertheless, the organizers found the short paper (6 pages) so interesting that they published it anyway.
Why did Lerner think (in 1972!) that pattern recognition was solved? And why, if he was correct, was this not a very happy event, but had to be entitled as a crisis? Both questions point to significant cultural differences between the eastern and the western scientific research communities.
In his paper Lerner argues that a procedure developed by Vapnik and published by Vapnik and himself on empirical risk minimization (Vapnik & Lerner, 1963, Pattern recognition using generalized portrait method, Automation and Remote Control, 24, 774–780) is very good, perhaps the best what is possible. He, thereby, suggests that more research will not bring any significant improvement. This procedure contains already the basic ingredients of what has later became known as the maximum margin classifier. This classifier has developed, also thanks to Vapnik, into the support vector machine (SVM). It is very remarkable that the moment the pattern recognition community in the West was still puzzled by the peaking phenomenon, somebody from an entirely different part of the world was claiming that pattern recognition was solved without any reference to peaking.
The collision of paradigms
In retrospection Lerner was very right to point to Vapnik’s results as a significant step in pattern recognition. There are two aspects that made it revolutionary and at the same time difficult to grasp by a community that lives in another paradigm. The maximum margin principle is about distances between objects. For researchers that focus on densities in feature spaces it is a heavy stone to digest in accepting that this would make sense.
In my own study of pattern recognition it was emphasized by my teachers that class distributions always overlap. It is necessarily there due to the limited measurement accuracy of sensors. If not, the problem is easy and not of interest. When the feature size is increased the overlap shrinks but the peaking phenomenon will stop us soon. From the perspective of overlapping density functions the emphasis on distances does not make much sense. The distance of the nearest objects to the classifier are heavily noise sensitive and it is not clear how the overlap should be handled. As a result, the papers by Vapnik and Lerner did not draw any attention in the text books written in the 70’s and 80’s. Only after the publication of the SVM in 1995 interest started and grew fast.
In the original papers on the maximum margin classifier and the SVM distances are measured in an input space. This is the vector space that coincides with what is called the feature space in the traditional pattern recognition. In the dissimilarity representation, studied by us, the step is made that distances can be any reasonable distance measure between the entire objects, thereby implicitly covering all possible features. If distances are zero if and only if in the situation that objects are identical, classes do not overlap. This is the appropriate condition for the maximum margin principle: the classifier should be as far away from the nearest objects as possible.
As shown by Vapnik, the maximum margin principle coincides with the minimum number of objects that support it, that have the minimum distance to the classifier. The distance paradigm is directly related to the minimum description length generalization principle and to Occam‘s razor.
The density paradigm can be traced back to the generalization principle as formulated in the Bayes‘ rule. It has been discussed earlier that these two seemingly opposing principles underpin almost any practical recognition procedure.
Solving the crisis
The above may make clear why Lerner’s observation about the crisis in pattern recognition was entirely out of context. It was a remark from another paradigm. But why did he think that by solving pattern recognition as done by Vapnik did not throw any light on how pattern recognition arises in the human mind? I assume that he refused to make the step that distances as used in the maximum margin classifier are related to human observation. That might be true for the Euclidean distances in an input space. But is we follow Edelman’s observation that in human recognition (dis)similarities are primal and come before features, then it becomes clear how to overcome the ‘crisis’: make the human dissimilarity measures explicit. This should be the main research effort in pattern recognition.
Filed under: Classification • History • PR System • Representation