### PRTools examples: Dimension Reduction

Dimension reduction is used and sometimes necessary to decrease the computational complexity and to increase the classification performance. Dimension reduction itself, however, may be time consuming. We will present some examples. It is assumed that the reader is familiar with the introductory sections of the user guide:

Dimension reduction can be realized by feature **selection** or by **extraction**. Extraction is traditionally based on finding **linear** subspaces. **Non-linear** procedures may be based on neural networks, kernels or dissimilarities.

There is usual a **criterion** and a **strategy** to optimize this criterion.

In case of feature selection one distinguishes **filters** and **wrappers**. The criteria for filters are different from the classifier to be used in the final space. The wrapper approach simply minimizes the final classification error over the family of considered spaces. To avoid that the wrapper approach overtrains, cross-validation may be applied. The filter approach may be faster. As they are more simple than classifiers there is a smaller danger of overtraining.

Finally criteria may be **supervised**, i.e. depending on class labels, or **unsupervised**.

In PRTools terms, procedures for dimension reduction are trainable mappings. Here are some examples for **selection** and **extraction**.

### Feature Selection

There are routines for different strategies like individual, forward, backward and floating selection. They are based on criteria computed by feateval. Some are supervised, others are unsupervised. The criteria can be computed on the training set, by cross-validation or on a fixed test set. This is important is the classification error in a wrapper approach is used as a criterion. Experiments are usually very time consuming, so we will restrict to very simple examples based on finding 3 features for the Malaysia dataset, which has in total 8 features and 291 objects in 20 classes.

`A = malaysia;`

A = setprior(A,getprior(A)); % avoid heaps of warnings

`[T,S] = gendat(A,0.5); % generate train / test set`

`W1 = A*`

`featseli`

([],'NN',4); W1 =`setname`

(W1,'ISel NN');

`W2 = A*`

`featseli`

([],'maha-s',4); W2 =`setname`

(W2,'`ISel`

maha-s');

`W3 = A*`

`featseli`

([],'maha-m',4); W3 =`setname`

(W3,'`ISel`

maha-m');

`W4 = A*`

`featseli`

([],'in-in',4); W4 =`setname`

(W4,'`ISel in-in`

');

`W5 = A*`

`featseli`

([],ldc,4,5); W5 =`setname`

(W5,'`ISel`

wrapper');

disp([+W1;+W2;+W3;+W4;+W5]) % show selected features

`V = T*({W1,W2,W3,W4,W5}*ldc); % train all selectors and classifiers`

`S*V*`

`testc`

% show test result

Note that the criteria `maha-s`

and `in-in`

behave very similar. Moreover, they match good with the classifier `ldc`

as all are based on maximizing the between scatter while minimizing the within scatter. We will use this criterion to compare several strategies.

`U1 = featseli([],'maha-s',4);`

`U1`

= setname(`U1`

,'ISel maha-s');

`U2 = featself([],'maha-s',4);`

`U2`

=`setname`

(`U2`

,'FSel maha-s');

`U3 = featselb([],'maha-s',4);`

`U3`

=`setname`

(`U3`

,'BSel maha-s');

`U4 = featselo([],'maha-s',4);`

`U4`

=`setname`

(`U4`

,'OSel maha-s');

W = T*{U1,U2,U3,U4}; % use train set for feature selection

disp([+W{1};+W{2};+W{3};+W{4}]) % show selected featurs

`V = T*(W*`

`ldc`

); % use train set also for classifier training

`S*V*testc % show test results`

These results are based on a single split of the data. A full 8-fold cross-validation can be executed by:

`prcrossval(A,{U1,U2,U3,U4}*`

`ldc`

,8,'DPS')

#### Exercises

- Compare in the last example one of the filter feature selectors with a equivalent wrapper.
- Compare the learning curves for this wrapper and the filter (selecting features for the same classifier
`ldc`

).

### Feature extraction

By feature extraction from existing vector representations (non)linear functions are found as a basis for a new representation. Three simple linear feature extractors offered by PRTools are:

- Principal component analysis (
`pcam`

) based on the eigenvectors found for the entire dataset. This approach is unsupervised. However, if applied to a labeled dataset class priors are used to weight the object contributions to the overall covariance matrix. If class priors are equal to class frequencies objects have equal weights. - Fisher mapping (
`fisherm`

), also known as LDA (Linear Discriminant Analysis). It finds the linear subspace that maximizes the class between scatter w.r.t. the class within scatter. - Karhunen Loéve Mapping (
`klm`

), it computes the eigenvectors of the averaged class covariance matrices.

In case of images the axes representing a linear subspace point into a direction of the original feature space based on the image pixels. These directions consitute also images. They are the so-called eigen-images.

`delfigs`

`prdatafiles; % make sure that prdatafiles in in the path`

`a = faces; % the ORL database of faces.`

`a = prdataset(a) % convert datafile into a dataset`

`w1 = pcam(a,10);`

`figure; show(w1); % show first 10 eigenfaces`

`w2 = fisherm(a,10);`

`figure; show(w2); % show first 10 fisherfaces`

`w3 = klm(a,10);`

`figure; show(w3); % show first 10 KL-faces`

`showfigs`

#### Exercises

- Show the three scatterplots based on the first two features of the three feature extractors discussed above. What is remarkable? Can you find an explanation?
- Compute the learning curves for the
`ldc`

as well as the`qdc`

classifier in 5-dimensional spaces of the three feature representations. - Compute feature curves for the three feature representations using 5 training objects per class for
`ldc`

. - Compute feature curves for the PCA eigenspaces using 1-9 training objects per class for
`ldc`

.

**elements:**
datasets
datafiles
cells and doubles
mappings
classifiers
mapping types.

**operations:**
datasets
datafiles
cells and doubles
mappings
classifiers
stacked
parallel
sequential
dyadic.

**user commands:**
datasets
representation
classifiers
evaluation
clustering
examples
support routines.

**introductory examples:**
Introduction
Scatterplots
Datasets
Datafiles
Mappings
Classifiers
Evaluation
Learning curves
Feature curves
Dimension reduction
Combining classifiers
Dissimilarities.

**advanced examples**.