PRTools examples: Learning Curves

Learning curves are an important tool for studying the behavior of classifiers. It is assumed that the reader is familiar with the introductory sections of the user guide:

A learning curve shows the evaluation of a classifier as a function of the size of the training set. The main tool we will use is the routine for classifier evaluation: cleval. Its main function is the comparison of the performance of different classifiers for different sizes of the training set. A simple example:

% generate Gaussian distribution
A = gendatgauss(1000,[0 0],eye(2));    

% Another, differentely shaped Gaussian distribution
B = gendatgauss(1000,[3 2],[9 0;0 1]);
% Construct two-class dataset with equal priors 

C = prdataset([A;B],genlab([1000 1000]));
C = setprior(C,0);
% Compute the learning curves

e = cleval(C,{nmc,ldc,qdc},[3 4 5 6 8 10 12 15],50);
plote(e)

The learning curves are based on averaging 50 repetitions. They show a scissors phenomenon: for small sample sizes simple classifiers are better than more advanced ones.

The training sets are randomly drawn fro the total given design set. In every experiment the remaining objects are used for testing. So the size of the test set differs along the learning curve. It should be realized that the training set sizes in the cleval routine are interpreted per class unless the class priors are not set. In case of no class priors the given sizes are the total size of the training set.

Exercise

Take a large dataset, e.g. satellite. Compute the learning curve for the k-nearest neighbor classifier knnc. Try some other classifiers. Why is it to be expected that no classifier can beat knnc for large training sets?

elements: datasets datafiles cells and doubles mappings classifiers mapping types.
operations: datasets datafiles cells and doubles mappings classifiers stacked parallel sequential dyadic.
user commands: datasets representation classifiers evaluation clustering examples support routines.
introductory examples: Introduction Scatterplots Datasets Datafiles Mappings Classifiers Evaluation Learning curves Feature curves Dimension reduction Combining classifiers Dissimilarities.
advanced examples.