DisTools examples: Chickenpieces
Some exercises are defined on the basis of the Chickenpieces dataset. It is assumed that readers are familiar with PRTools and will consult the following pages where needed:
- PRTools User Guide, See at the bottom of the page for a TOC
- Introduction to DisTools
- Dissimilarity Representation Course
- The following packages should be in the Matlab path: PRTools, DisTools, PRDisData
Some papers on the Chickenpieces dataset:
H. Bunke, H., U. Buhler, Applications of approximate string matching to 2D shape recognition, Pattern Recognition 26 (1993) 1797-1812.
B. Spillmann, Description of the Distance Matrices, Internal report, Computer Vision and Artificial Intelligence (FKI), Institute of Computer Science and Applied Mathematics, University of Bern, 2004.
E. Pekalska, A. Harol, R.P.W. Duin, D. Spillman, and H. Bunke, Non-Euclidean or non- metric measures can be informative, Poc. SSSPR 2006, LNCS 4109, Springer, 2006, 871-880.
R.P.W. Duin and E. Pekalska, Non-Euclidean Dissimilarities: Causes and Informativeness, Poc. SSSPR 2010, LNCS 6218, Springer, 2010, 324-333.
First we load all 44 dissimilarity matrices and compute for each of them the LOO 1NN classification error and the negative eigenfraction as a measure for the non-Euclidianess.
D =
chickenpieces
('all');
norm = [5 7 10 15 20 25 29 30 31 35 40];
cost = [45 60 90 120];
E = zeros(size(D));
F = zeros(size(D));
for i = 1:size(D,1), for j=1:size(D,2)
E(i,j) = nne(D{i,j});
F(i,j) = nef(D{i,j}*makesym*pe_em);
end, end
Next the classification errors are plotted as a function or the norm.
figure;
h = plot(norm,E);
set(h,'linewidth',2)
set(gca,'fontsize',12)
ylabel('Error')
xlabel('Norm')
title('1NN Error for chickenpieces dissimilarities')
legend('cost 45','cost 60','cost 90','cost 120')
Finally the NEF is plotted as a function of the norm.
figure;
h = plot(norm,F);
set(h,'linewidth',2)
set(gca,'fontsize',12)
ylabel('NEF')
xlabel('Norm')
title('Negative eigen fraction for chickenpieces dissimilarities')
legend('cost 45','cost 60','cost 90','cost 120')
showfigs
Note that the best results correspond with dissimilarity measures with a rather high NEF.
Exercises
- Compute the learning curve for the 1NN classifier of the given dissimilarities for norm = 29 and cost = 45 using nnerror2. Select a training set size for which you want to beat the !NN performance by a dissimilarity based classifier.
- Find a classifier in dissimilarity space that beats the above selected performance. Is the result significant in the statistical sense?
- Find a classifier in PE space that beats the above selected performance. Is the result significant in the statistical sense?
- Is it useful to use a transductive approach (include the test set in the construction of the representation?)
- Try to find a classifier based on more or on all dissimilarity matrices that beats, for the same training set size, your result based on a single dissimilarity matrix.
elements:
datasets
datafiles
cells and doubles
mappings
classifiers
mapping types.
operations:
datasets
datafiles
cells and doubles
mappings
classifiers
stacked
parallel
sequential
dyadic.
user commands:
datasets
representation
classifiers
evaluation
clustering
examples
support routines.
introductory examples:
Introduction
Scatterplots
Datasets
Datafiles
Mappings
Classifiers
Evaluation
Learning curves
Feature curves
Dimension reduction
Combining classifiers
Dissimilarities.
advanced examples.