PRTools examples: Cross validation

Cross validation is a standard technique for the evaluation of pattern recognition systems. It is assumed that the reader is familiar with the introductory sections of the user guide:

There is a specific set of evaluation routines. Here the PRTools routine prcrossval is discussed and illustrated.See also the introductory examples on classifiers and evaluation. In cross validation a single labeled dataset is randomly split in $$n$$ subsets of about the same size, called the folds. All-but-one folds are used for training and the remaining one is used for testing. This is rotated over all $$n$$ folds by which all samples in the dataset are used just once for testing and $$n-1$$ times for training. For sufficiently large $$n$$ the training sets are about the same and equivalent to the total set. Consequently, cross validation approximates the use of all samples for training as well as all for testing while simultaneously the test set is independent of the training set.

The entire process is often repeated a number of times and results are averaged in order to rule out the effect of randomly splitting the folds. PRTools uses the so-called stratified splitting strategy to reduce the variability of splitting a far as possible. Herewith a split is not fully random, but the relative class sizes are maintained as good as possible equal to the original ones. Another, systematic (non-random) splitting strategy is ‘density preserving splitting’ (DPS) in which case the number of folds should be a power of 2. As it is non-stochastic, there is no need to iterate it. Consequently it is faster.

In the following experiment we take a single dataset of 200 objects per class and try to predict the performance of the Parzen classifier parzenc by 8-fold crossvalidation. This is done by 1, 5 and 25 iterations and once using density preserving sampling. The results are compared with the ‘true’ performance of the Parzen classifier trained on the full dataset and tested by a larger test set of 2*1000 objects.

A = gendatb([200 200]);
e1 = prcrossval(A,parzenc,8,1);
e5 = prcrossval(A,parzenc,8,5);
e25 = prcrossval(A,parzenc,8,25);
edps = prcrossval(A,parzenc,8,'DPS');
S = gendatb([1000,1000]);
etrue = S*(A*parzenc)*testc;
disp([etrue,e1,e5,e25,edps])

Repeat this a few times. Results and conclusions are every time different. In order to obtain a more definite conclusion run the following experiment.

Exercise

  1. Take one of the mfeat datasets, e.g. mfeat_kar.
  2. Split it 50-50 in a training and a test set using gendat.
  3. Estimate the performance of the ldc classifier on the training set only by 8-fold cross validation. Do this for 1, 5 (and 25, if feasible)  iterations as well a by density preserving sampling (NREP = 'DPS'), using prcrossval
  4. Compare the results with those of a single classifier based on the entire training set and tested by the test set (the ‘true’error). Compute for each of the 4 cross validation errors the absolute value of its deviation from the true error.
  5. Repeat steps 2-4 10 times and report averages and standard deviations of the deviations of the cross validation errors from the true error.

elements: datasets datafiles cells and doubles mappings classifiers mapping types.
operations: datasets datafiles cells and doubles mappings classifiers stacked parallel sequential dyadic.
user commands: datasets representation classifiers evaluation clustering examples support routines.
introductory examples: Introduction Scatterplots Datasets Datafiles Mappings Classifiers Evaluation Learning curves Feature curves Dimension reduction Combining classifiers Dissimilarities.
advanced examples.