DisTools examples: Dimension reduction of the dissimilarity space
The dissimilarity space may have a very large dimensionality. This may cause severe computational problems. Here some techniques for dimension reduction are shown. It is assumed that readers are familiar with PRTools and will consult the following pages where needed:
- PRTools User Guide, See at the bottom of the page for a TOC
- Introduction to DisTools
- Dissimilarity Representation Course
- The following packages should be in the Matlab path: PRTools, DisTools, PRDisData
Intuitively one expects that more objects available for training a classifier will result in a better classification performance. This is not true for a number of pattern recognition algorithms. The dissimilarity approach suffers in at least two ways from large training sets. If all training objects are used for representation as well, the dissimilarity matrix grows in two directions when more objects are added: a larger set of examples in represented in a vector space with a growing dimensionality. Several procedures may thereby suffer from the curse of dimensionality (overtraining or peaking). In addition there is a computational problem: the number of dissimilarities to be computed increases with the square of the number of objects. The memory demands grow in the same way.
It has to be doubted whether the dimensionality should really grow as fast as the number of objects. Problems have a finite intrinsic dimensionality. The clouds of objects represented in dissimilarity space will not necessarily be in linear subspaces, so some growth of dimensionality may always be profitable, but if this growth is slower than the number of objects the accuracy of the density estimates or classifiers will still grow.
A very nice property of the dissimilarity representation is that, unlike features, dissimilarities with objects behave very similar if a representation object is exchange for another, similar one: scaling and densities will hardly be affected. For this reason a random selection of objects makes sense. In particular if larger sets are considered.
D = chickenpieces(15,60);
D = setprior(D,getprior(D)); % set class priors to class frequencies
e = clevald(D,{knndc,knnc,fisherc},[1,2,3,5,7,10,15,20,30,50],[],10)
plote
(e,'noapperror')
In this example some learning curves are computed. Training sets are selected at random. Representations sets are by default equal to the training sets and thereby also random.
Several systematic selections are possible and have been studied in the literature: the use feature selection procedures, cluster analysis, a careful uniform distribution of objects, just selecting objects in the tails of the distributions, far away from the decision boundary, close to the decision boundary, etcetera. So any routine for reducing a set of objects to a smaller set, as well as for reducing a set of features may be studied. Here are some examples.
U1 = protselfd('random'); U1 = setname(U1,'random');
U2 =
protselfd
('LOO'); U2 =setname
(U2,'LOO NN');
U3 =
protselfd
('maxdist'); U3 =setname
(U3,'KCentres');
U4 = pcam;
U5 = featseli('NN'); U5 =
setname
(U5,'IndFeatSel');
U = {U1,U2,U3,U4,U5};
repeats = 5;
gendata = genddat([],0.5);
e = clevalfs(D,U,fisherc,[1,2,3,5,7,10,15,20,30],gendata,repeats);
plote
(e,'noapperror')
Exercise
- Use a feasible subset of the
polydism57
dataset to run the above experiment many times to find a stable set of curves. - Compare the final results with the LOO NN error on the given dissimilarities (use
nne
). - Is the PCA result useful?
elements:
datasets
datafiles
cells and doubles
mappings
classifiers
mapping types.
operations:
datasets
datafiles
cells and doubles
mappings
classifiers
stacked
parallel
sequential
dyadic.
user commands:
datasets
representation
classifiers
evaluation
clustering
examples
support routines.
introductory examples:
Introduction
Scatterplots
Datasets
Datafiles
Mappings
Classifiers
Evaluation
Learning curves
Feature curves
Dimension reduction
Combining classifiers
Dissimilarities.
advanced examples.