aristotleWe already discussed several times the significance of understanding the Platonic and Aristotelian ways of gaining knowledge. It can be of great help to researchers in the field of pattern recognition in the appreciation of contributions by others, in discussions with colleagues and in supervising students. This may hold for science in general, but it this understanding becomes in particular crucial if one studies the design of machines that try to integrate knowledge and observations.

The Platonic thinker may have a great, intuitive vision of his area of research. Put to the extremes, he may think that he knows it all, but he is not able to concretize it into words or mathematics. Colleagues may accuse him from building castles in the air. The true Platonic, however, has a keen interest in reality. He wonders where he can find or realize his ideas in the world he lives in.

The Aristotelian researcher starts from his observations and tries to combine them logically. In the extreme he is mainly a collector of observations and at most comes to empty abstractions. The true Aristotelian, however, is not a nominalist, but wants to infer meaningful concepts from his observations. Some talks with a Platonic colleague may be inspiring for both.

The Aristotelian – Platonic cooperation

When the designer of a pattern recognition system has nobody around with a larger picture of the problem and when he is also not able himself to gain a wider view, he might be in great trouble. We will illustrate this with the problem of feature extraction.

Assume that in some recognition problem the objects are given by a small set of features. We may realize that these features might not be appropriate as they are not specifically selected for the problem at hand. For that reason we consider to extend the original representation with various non-linear combinations of the given ones. This set can be very large. The feature space may become high-dimensional and we realize that our classifiers will overtrain. Consequently we keep the original feature set, or at most extend it with some polynomial combinations. Effectively we are biased towards the given feature set. So what is needed is somebody with a good idea about the features to choose.

The ugly duckling

This well known reasoning has been put to an extreme by Satosi Watanabe, one of the fundamental thinkers on pattern recognition. He used the “ugly duckling theorem” to illustrate what happens in the case of a given set of binary features. Suppose that  k features are given. There are 2^k different feature vectors possible. Consider all possible logic functions of the k features. Different logic functions will map at least one of the 2^k feature vectors differently. Consequently there are 2^(2^k) different logic functions. They are now used instead of the original features.

Let us now consider two objects that differ at least in one of the k original features. We compute for both of these feature vectors all the 2^(2^k) logic functions. It can be shown, e.g. see the lecture notes by Costa and Backofen, that independent of the difference of the original features, exactly half of the logic functions will be equal and half will be different (unless the original feature vectors are identical, of course). In conclusion, all objects are equally different if we represent them by all possible combinations of the original features.

The_Ugly_Duckling_by_rachelthegreat

All differences are equal, unless we have some prior knowledge

In the ugly duckling story the ugly duckling is, in the generalized function representation, equally different from all other ducklings as the mutual differences between the other ducklings, independent of how small these are in terms of the original features. Consequently, if we don’t have any preference how to represent the objects in features, then, in expectation, by an arbitrary choice or by using everything and all possible combinations, objects are equally different.

The observation that some objects are more similar than others is subjective. It depends on the choice of the features. How does the Aristotelian researcher know what feature to use? He needs a Platonic insight, either by himself of by a colleague, to know what to observe to see the difference between the ugly ducklings and the other ducklings.

Is every pattern recognition problem a small sample size problem?
Why is the nearest neighbor rule so good?

Filed under: FoundationOvertrainingRepresentation