PRTools examples: Feature Curves
Feature curves evaluate classifiers as a function of the number of features, i.e. the dimensionality. It is assumed that the reader is familiar with the introductory sections of the user guide:
The main tool we will use is clevalf, which is a modification of cleval, the routine used for studying learning curves. It computes by cross-validation the classification error for a single size of the training set, but for multiple sizes of the feature set. The given ranking of the feature set is used. Like for cleval, the result may be averaged over a set of runs and several classifiers can be studied simultaneously. Here is a simple example:
delfigs
A = sonar; % 60 dimensional dataset
% compute feature curve for the original feature ranking
E =clevalf(A,{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);
figure; plote(E); title('Original'); axis([1 60 0 0.5])
% Compute feature curve for a randomized feature ranking
R = randperm(60);
E =clevalf(A(:,R),{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);
figure;plote(E); title('Random'); axis([1 60 0 0.5])
% Compute feature curve for an optimized feature ranking
W = A*featself('maha-m',60);
E =clevalf(A*W,{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);
figure;plote(E); title('Optimized (Maha)'); axis([1 60 0 0.5])
showfigs
In this experiment the entire dataset A is used for computing the feature ranking according to featself. More correctly one has to use just the training set for this. However, a call like:
U =featself('',60);
E =clevalf(A,U*{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);
will first take the desired number of features from A and then use this reduced set for training the feature selection and the classifier. A solution is offered by , which is an extension of clevalfs:clevalf
U =('',60);featself
E =clevalfs(A,U,{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);
which first splits A in sets for training and testing, then trains U and finally computes feature curves for the specified classifiers.
Exercise
- Why show the feature curves a rather noisy behavior, in spite of averaging over 25 repetitions?
- Why isnt it always true that the more features yield a better result?
- Why are the results for using all the 60 features not exactly the same over the 3 experiments?
- Extend the set of experiments with a 4th one in which the forward feature selection is based on the nearest neighbor criterion ‘NN’, see
feateval.
There is a post about the well-known example by Trunk showing the peaking phenomenon in feature curves.
elements:
datasets
datafiles
cells and doubles
mappings
classifiers
mapping types.
operations:
datasets
datafiles
cells and doubles
mappings
classifiers
stacked
parallel
sequential
dyadic.
user commands:
datasets
representation
classifiers
evaluation
clustering
examples
support routines.
introductory examples:
Introduction
Scatterplots
Datasets
Datafiles
Mappings
Classifiers
Evaluation
Learning curves
Feature curves
Dimension reduction
Combining classifiers
Dissimilarities.
advanced examples.
