Is pattern recognition about statistics? Well, it depends. If you see as its target to understand how new knowledge can be gained by learning from examples the role of statistics may be disputable. Knowledge lives in the human mind. It is born in the marriage between observations and reasoning. If we follow this process consciously…
Evaluation Archives
The role of densities in pattern classification
Discovered by accident
Some discoveries are made by accident. The wrong road brought a beautiful view. An arbitrary book from the library gave a great, new insight. A procedure was suddenly understood in a discussion with colleagues during a poster session. In a physical experiment a failure in controlling the circumstances showed a surprising phenomenon. Children playing with…
Cross-validation
A returning question by students and colleagues is how to define a proper cross-validation: How many folds? Do we need repeats? How to determine the significance?. Here are some considerations. Why cross-validation? Cross-validation is a procedure for obtaining an error estimate of trainable system like a classifier. The resulting estimate is specific for the training…
Using the test set for training
Never use the test set for training. It is meant for independent validation of the training result. If it has been somewhere included in the training stage it is not independent anymore and the evaluation result will be positively biased. This has been my guideline for a long time. Some students were shocked when I…
My classifier scores 50% error. How bad is that?
What error rates can we expect for a trained classifier? How good or bad is a 50% error? Well, if classes are separable, a zero-error classifier is possible. But a very bad classifier may assign every object to the wrong class. Generally, all errors between zero and one are possible: . Much more can be…
Are football results random?
The recent results in the round of 16 of the football world championship in Brazil showed a remarkable statistic. The eight group winners all had to play against a runner-up of another group. All group winners won. Is that significant? Does this show that the result of a match is not random? Watching them strongly…
The error in the error
How large is the classification error? What is the performance of the recognition system? At the end this is the main question, in applications, in proposing novelties, in comparative studies. But how trustworthy is the number that is measured, how accurate is the error estimate? The most common way to estimate the error of a…
Peaking summarized
Pattern recognition learns from examples. Thereby, generalization is needed. This can only be done if the objects, or at least the differences between pattern classes have a finite complexity. That is what peaking teaches us. We will go once more through the steps. (See also our previous discussions on peaking, dimensionality problems and Hughes’ phenomenon)….
Trunk’s example of the peaking phenomenon
In 1979 G.V. Trunk published a very clear and simple example of the peaking phenomenon. It has been cited many times to explain the existence of peaking. Here, we will summarize and discuss it for those who want to have a better idea about the peaking problem. The paper presents an extreme example. Its value…
The curse of dimensionality
Imagine a two-class problem represented by 100 training objects in a 100-dimensional feature (vector) space. If the objects are in general position (not by accident in a low-dimensional subspace) then they still fit perfectly in a 99-dimensional subspace. This is a ‘plane’, formally a hyperplane, in the 100-dimensional feature space. We will argue that this…