PRTools examples: Learning Curves
Learning curves are an important tool for studying the behavior of classifiers. It is assumed that the reader is familiar with the introductory sections of the user guide:
A learning curve shows the evaluation of a classifier as a function of the size of the training set. The main tool we will use is the routine for classifier evaluation: cleval
. Its main function is the comparison of the performance of different classifiers for different sizes of the training set. A simple example:
% generate Gaussian distribution
A = gendatgauss(1000,[0 0],eye(2));
% Another, differentely shaped Gaussian distribution
B =
gendatgauss
(1000,[3 2],[9 0;0 1]);
% Construct two-class dataset with equal priors
C = prdataset([A;B],genlab([1000 1000]));
C = setprior(C,0);
% Compute the learning curves
e = cleval(C,{nmc,ldc,qdc},[3 4 5 6 8 10 12 15],50);
plote(e)
The learning curves are based on averaging 50 repetitions. They show a scissors phenomenon: for small sample sizes simple classifiers are better than more advanced ones.
The training sets are randomly drawn fro the total given design set. In every experiment the remaining objects are used for testing. So the size of the test set differs along the learning curve. It should be realized that the training set sizes in the cleval routine are interpreted per class unless the class priors are not set. In case of no class priors the given sizes are the total size of the training set.
Exercise
Take a large dataset, e.g. satellite
. Compute the learning curve for the k-nearest neighbor classifier knnc
. Try some other classifiers. Why is it to be expected that no classifier can beat knnc for large training sets?
elements:
datasets
datafiles
cells and doubles
mappings
classifiers
mapping types.
operations:
datasets
datafiles
cells and doubles
mappings
classifiers
stacked
parallel
sequential
dyadic.
user commands:
datasets
representation
classifiers
evaluation
clustering
examples
support routines.
introductory examples:
Introduction
Scatterplots
Datasets
Datafiles
Mappings
Classifiers
Evaluation
Learning curves
Feature curves
Dimension reduction
Combining classifiers
Dissimilarities.
advanced examples.