Pattern recognition studies the tools to learn from examples what is yet unknown. Distances (or dissimilarities) are a primary notion for learning in pattern recognition. What to do if no proper distance measure is known? Can it be learnt? On what basis? This seems to be a consciousness problem.
There are three sources of scientific knowledge. First, there is the authority of the scientists of the past. For an incoming student it is his teacher. The modern researcher has his textbooks and the established, well reviewed journals. The medieval scientist almost always referred to Aristotle. Conclusions are considered to be correct because they are based on statements by these authorities. The usage of the old knowledge will, however, not lead to any new knowledge. The knowledge was already there, known by the authority. New scientific observations in the surrounding world are needed to build the second source of knowledge. Finally, observations may be logically combined with each other and with what has already been known.
These three sources of knowledge can be recognized in most scientific papers: a bibliography, experimental observations and logic or/and mathematical analysis. In some areas new experimental observations are sometimes lacking in the articles, e.g. in mathematics and theoretical machine learning. Here, the growth of knowledge is entirely based on a new or more advanced analysis of the properties of the available tools and procedures.
Pattern recognition aims to learn from examples, from new observations. In this way it adds in to the growth of knowledge. This act relies on the steps of representation and generalization. The generalization is based on domains or a probabilistic analysis in a representation space, or in short, on distances and densities. Densities (probability density functions), however, are defined on the top of a given distance measure (or an inner product, which is a similarity). It seems, thereby, to be an essential prerequisite for pattern recognition. How do we find a good distance measure, a useful metric, if it is not available in the knowledge we have?
Pattern recognition learns from observations and is based on a given distance measure. Can this measure be learnt? Not by pattern recognition procedures, obviously, as they need a distance measure. Metric learning is not pattern recognition, although it is useful for pattern recognition, as for itself it is not a procedure for recognizing patterns.
So, we have a meta-problem: if solved well, the following recognition procedures have a good performance. There is no paradox here. The problem is somewhere else.
For metric learning a set of labeled objects is needed with known patterns. They are used to optimize the distance measure on the basis of a chosen generalization procedure. Somebody has to label these objects. An expert has to recognize their classes. He is able to perceive the class differences. Apparently, this expert is unable to make his perception explicit and to define a distance measure that does the same. He has a consciousness problem.
This problem is inherent to pattern recognition. An expert has the ability to see class differences but is not aware of how he does it. In the previous post the difference between the kernel based approach and the dissimilarity representation was discussed. The advantage of the dissimilarities is that it is good if a proper dissimilarity measure is known. No optimization is needed. The kernel approach, on the contrary, has the advantage that there is a flexibility in the choice of the kernel. It can be optimized in case the expert is not conscious about how he sees class differences. It is suitable to perform metric learning.