PRTools examples: Feature Curves

Feature curves evaluate classifiers as a function of the number of features, i.e. the dimensionality. It is assumed that the reader is familiar with the introductory sections of the user guide:

The main tool we will use is clevalf, which is a modification of cleval, the routine used for studying learning curves. It computes by cross-validation the classification error for a single size of the training set, but for multiple sizes of the feature set. The given ranking of the feature set is used. Like for cleval, the result may be averaged over a set of runs and several classifiers can be studied simultaneously. Here is a simple example:

delfigs
A = sonar;   % 60 dimensional dataset
% compute feature curve for the original feature ranking
E = clevalf(A,{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);
figure; plote(E); title('Original'); axis([1 60 0 0.5])
% Compute feature curve for a randomized feature ranking
R = randperm(60);
E = clevalf(A(:,R),{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);
figure; plote(E); title('Random'); axis([1 60 0 0.5])
% Compute feature curve for an optimized feature ranking
W = A*featself('maha-m',60);
E = clevalf(A*W,{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);
figure; plote(E); title('Optimized (Maha)'); axis([1 60 0 0.5])
showfigs

In this experiment the entire dataset A is used for computing the feature ranking according to featself. More correctly one has to use just the training set for this. However, a call like:

U = featself('',60);
E = clevalf(A,U*{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);

will first take the desired number of features from A and then use this reduced set for training the feature selection and the classifier. A solution is offered by clevalfs, which is an extension of clevalf:

U = featself('',60);
E = clevalfs(A,U,{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);

which first splits A in sets for training and testing, then trains U and finally computes feature curves for the specified classifiers.

Exercise

  1. Why show the feature curves a rather noisy behavior, in spite of averaging over 25 repetitions?
  2. Why isnt it always true that the more features yield a better result?
  3. Why are the results for using all the 60 features not exactly the same over the 3 experiments?
  4. Extend the set of experiments with a 4th one in which the forward feature selection is based on the nearest neighbor criterion ‘NN’, see feateval.

There is a post about the well-known example by Trunk showing the peaking phenomenon in feature curves.

elements: datasets datafiles cells and doubles mappings classifiers mapping types.
operations: datasets datafiles cells and doubles mappings classifiers stacked parallel sequential dyadic.
user commands: datasets representation classifiers evaluation clustering examples support routines.
introductory examples: Introduction Scatterplots Datasets Datafiles Mappings Classifiers Evaluation Learning curves Feature curves Dimension reduction Combining classifiers Dissimilarities.
advanced examples.