Is pattern recognition about statistics? Well, it depends. If you see as its target to understand how new knowledge can be gained by learning from examples the role of statistics may be disputable. Knowledge lives in the human mind. It is born in the marriage between observations and reasoning. If we follow this process consciously…
Classification Archives
The role of densities in pattern classification
Discovered by accident
Some discoveries are made by accident. The wrong road brought a beautiful view. An arbitrary book from the library gave a great, new insight. A procedure was suddenly understood in a discussion with colleagues during a poster session. In a physical experiment a failure in controlling the circumstances showed a surprising phenomenon. Children playing with…
Cross-validation
A returning question by students and colleagues is how to define a proper cross-validation: How many folds? Do we need repeats? How to determine the significance?. Here are some considerations. Why cross-validation? Cross-validation is a procedure for obtaining an error estimate of trainable system like a classifier. The resulting estimate is specific for the training…
Adaboost and the Random Fisher Combiner
Like in most areas, pattern classification and machine learning have their hypes. In the early 90-s the neural networks awoke and enlarged the community significantly. This was followed by the support vector machine reviving the applicability of kernels. Then, from the turn of the century the combining of classifiers became popular, with significant fruits like adaboost…
Using the test set for training
Never use the test set for training. It is meant for independent validation of the training result. If it has been somewhere included in the training stage it is not independent anymore and the evaluation result will be positively biased. This has been my guideline for a long time. Some students were shocked when I…
My classifier scores 50% error. How bad is that?
What error rates can we expect for a trained classifier? How good or bad is a 50% error? Well, if classes are separable, a zero-error classifier is possible. But a very bad classifier may assign every object to the wrong class. Generally, all errors between zero and one are possible: . Much more can be…
Why is the nearest neighbor rule so good?
Just compare the the new observations with the ones stored in memory. Take the most similar one and use its label. What is wrong with that? It is simple, intuitive, implementation is straightforward (everybody will get the same result), there is no training involved and it has asymptotically a very nice guaranteed performance, the Cover…
There is no best classifier
Every problem has its own best classifier. Every classifier has at least one dataset for which it is the best. So there is no end to pattern recognition research as long as there are problems that are at least slightly different from all other ones that have been studied so far. The reason for this…
Surprisingly good results in flow-cytometry classification
Are the evaluation results of the new procedure you worked on for months, worse or at most marginally better than the baseline procedure? Don’t worry, it happens all the time. Are they surprisingly good? Congratulations! You may write an interesting paper. But can you really understand why they are so good? Check, check, and double-check….
Are football results random?
The recent results in the round of 16 of the football world championship in Brazil showed a remarkable statistic. The eight group winners all had to play against a runner-up of another group. All group winners won. Is that significant? Does this show that the result of a match is not random? Watching them strongly…