PRTools examples: Feature Curves
Feature curves evaluate classifiers as a function of the number of features, i.e. the dimensionality. It is assumed that the reader is familiar with the introductory sections of the user guide:
The main tool we will use is clevalf
, which is a modification of cleval
, the routine used for studying learning curves. It computes by cross-validation the classification error for a single size of the training set, but for multiple sizes of the feature set. The given ranking of the feature set is used. Like for cleval
, the result may be averaged over a set of runs and several classifiers can be studied simultaneously. Here is a simple example:
delfigs
A = sonar; % 60 dimensional dataset
% compute feature curve for the original feature ranking
E =
clevalf
(A,{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);
figure; plote(E); title('Original'); axis([1 60 0 0.5])
% Compute feature curve for a randomized feature ranking
R = randperm(60);
E =
clevalf
(A(:,R),{nmc
,ldc
,qdc
,knnc
},[1:5 7 10 15 20 30 45 60],0.7,25);
figure;
plote
(E); title('Random'); axis([1 60 0 0.5])
% Compute feature curve for an optimized feature ranking
W = A*featself('maha-m',60);
E =
clevalf
(A*W,{nmc
,ldc
,qdc
,knnc
},[1:5 7 10 15 20 30 45 60],0.7,25);
figure;
plote
(E); title('Optimized (Maha)'); axis([1 60 0 0.5])
showfigs
In this experiment the entire dataset A is used for computing the feature ranking according to featself. More correctly one has to use just the training set for this. However, a call like:
U =
featself
('',60);
E =clevalf
(A,U*{nmc
,ldc
,qdc
,knnc
},[1:5 7 10 15 20 30 45 60],0.7,25);
will first take the desired number of features from A
and then use this reduced set for training the feature selection and the classifier. A
solution is offered by
, which is an extension of clevalfs
:clevalf
U =
('',60);
featself
E =clevalfs
(A,U,{nmc
,ldc
,qdc
,knnc
},[1:5 7 10 15 20 30 45 60],0.7,25);
which first splits A
in sets for training and testing, then trains U
and finally computes feature curves for the specified classifiers.
Exercise
- Why show the feature curves a rather noisy behavior, in spite of averaging over 25 repetitions?
- Why isnt it always true that the more features yield a better result?
- Why are the results for using all the 60 features not exactly the same over the 3 experiments?
- Extend the set of experiments with a 4th one in which the forward feature selection is based on the nearest neighbor criterion ‘NN’, see
feateval
.
There is a post about the well-known example by Trunk showing the peaking phenomenon in feature curves.
elements:
datasets
datafiles
cells and doubles
mappings
classifiers
mapping types.
operations:
datasets
datafiles
cells and doubles
mappings
classifiers
stacked
parallel
sequential
dyadic.
user commands:
datasets
representation
classifiers
evaluation
clustering
examples
support routines.
introductory examples:
Introduction
Scatterplots
Datasets
Datafiles
Mappings
Classifiers
Evaluation
Learning curves
Feature curves
Dimension reduction
Combining classifiers
Dissimilarities.
advanced examples.