PRTools examples: Dimension Reduction
Dimension reduction is used and sometimes necessary to decrease the computational complexity and to increase the classification performance. Dimension reduction itself, however, may be time consuming. We will present some examples. It is assumed that the reader is familiar with the introductory sections of the user guide:
Dimension reduction can be realized by feature selection or by extraction. Extraction is traditionally based on finding linear subspaces. Non-linear procedures may be based on neural networks, kernels or dissimilarities.
There is usual a criterion and a strategy to optimize this criterion.
In case of feature selection one distinguishes filters and wrappers. The criteria for filters are different from the classifier to be used in the final space. The wrapper approach simply minimizes the final classification error over the family of considered spaces. To avoid that the wrapper approach overtrains, cross-validation may be applied. The filter approach may be faster. As they are more simple than classifiers there is a smaller danger of overtraining.
Finally criteria may be supervised, i.e. depending on class labels, or unsupervised.
In PRTools terms, procedures for dimension reduction are trainable mappings. Here are some examples for selection and extraction.
Feature Selection
There are routines for different strategies like individual, forward, backward and floating selection. They are based on criteria computed by feateval. Some are supervised, others are unsupervised. The criteria can be computed on the training set, by cross-validation or on a fixed test set. This is important is the classification error in a wrapper approach is used as a criterion. Experiments are usually very time consuming, so we will restrict to very simple examples based on finding 3 features for the Malaysia dataset, which has in total 8 features and 291 objects in 20 classes.
A = malaysia;
A = setprior(A,getprior(A)); % avoid heaps of warnings
[T,S] = gendat(A,0.5); % generate train / test set
W1 = A*
featseli
([],'NN',4); W1 =setname
(W1,'ISel NN');
W2 = A*
featseli
([],'maha-s',4); W2 =setname
(W2,'ISel
maha-s');
W3 = A*
featseli
([],'maha-m',4); W3 =setname
(W3,'ISel
maha-m');
W4 = A*
featseli
([],'in-in',4); W4 =setname
(W4,'ISel in-in
');
W5 = A*
featseli
([],ldc,4,5); W5 =setname
(W5,'ISel
wrapper');
disp([+W1;+W2;+W3;+W4;+W5]) % show selected features
V = T*({W1,W2,W3,W4,W5}*ldc); % train all selectors and classifiers
S*V*
testc
% show test result
Note that the criteria maha-s
and in-in
behave very similar. Moreover, they match good with the classifier ldc
as all are based on maximizing the between scatter while minimizing the within scatter. We will use this criterion to compare several strategies.
U1 = featseli([],'maha-s',4);
U1
= setname(U1
,'ISel maha-s');
U2 = featself([],'maha-s',4);
U2
=setname
(U2
,'FSel maha-s');
U3 = featselb([],'maha-s',4);
U3
=setname
(U3
,'BSel maha-s');
U4 = featselo([],'maha-s',4);
U4
=setname
(U4
,'OSel maha-s');
W = T*{U1,U2,U3,U4}; % use train set for feature selection
disp([+W{1};+W{2};+W{3};+W{4}]) % show selected featurs
V = T*(W*
ldc
); % use train set also for classifier training
S*V*testc % show test results
These results are based on a single split of the data. A full 8-fold cross-validation can be executed by:
prcrossval(A,{U1,U2,U3,U4}*
ldc
,8,'DPS')
Exercises
- Compare in the last example one of the filter feature selectors with a equivalent wrapper.
- Compare the learning curves for this wrapper and the filter (selecting features for the same classifier
ldc
).
Feature extraction
By feature extraction from existing vector representations (non)linear functions are found as a basis for a new representation. Three simple linear feature extractors offered by PRTools are:
- Principal component analysis (
pcam
) based on the eigenvectors found for the entire dataset. This approach is unsupervised. However, if applied to a labeled dataset class priors are used to weight the object contributions to the overall covariance matrix. If class priors are equal to class frequencies objects have equal weights. - Fisher mapping (
fisherm
), also known as LDA (Linear Discriminant Analysis). It finds the linear subspace that maximizes the class between scatter w.r.t. the class within scatter. - Karhunen Loéve Mapping (
klm
), it computes the eigenvectors of the averaged class covariance matrices.
In case of images the axes representing a linear subspace point into a direction of the original feature space based on the image pixels. These directions consitute also images. They are the so-called eigen-images.
delfigs
prdatafiles; % make sure that prdatafiles in in the path
a = faces; % the ORL database of faces.
a = prdataset(a) % convert datafile into a dataset
w1 = pcam(a,10);
figure; show(w1); % show first 10 eigenfaces
w2 = fisherm(a,10);
figure; show(w2); % show first 10 fisherfaces
w3 = klm(a,10);
figure; show(w3); % show first 10 KL-faces
showfigs
Exercises
- Show the three scatterplots based on the first two features of the three feature extractors discussed above. What is remarkable? Can you find an explanation?
- Compute the learning curves for the
ldc
as well as theqdc
classifier in 5-dimensional spaces of the three feature representations. - Compute feature curves for the three feature representations using 5 training objects per class for
ldc
. - Compute feature curves for the PCA eigenspaces using 1-9 training objects per class for
ldc
.
elements:
datasets
datafiles
cells and doubles
mappings
classifiers
mapping types.
operations:
datasets
datafiles
cells and doubles
mappings
classifiers
stacked
parallel
sequential
dyadic.
user commands:
datasets
representation
classifiers
evaluation
clustering
examples
support routines.
introductory examples:
Introduction
Scatterplots
Datasets
Datafiles
Mappings
Classifiers
Evaluation
Learning curves
Feature curves
Dimension reduction
Combining classifiers
Dissimilarities.
advanced examples.