Here the main commands of the DisTools toolbox for dissimilarity based pattern recognition are described. It is a Matlab toolbox and is built on top of PRTools, which should be in the path.

DisTools is primarily meant for the analysis of a given set of dissimilarities (possibly non-Eudlidean, non-metric or asymmetric) between labeled objects. There are a few commands for computing such dissimilarities. Its main use is, however, to help researchers who are studying dissimilarity measures between objects like images, time signals and spectra for classification purposes. It offers more advanced tools than based on the nearest neighbor relations.

There are three sets of tools based on three different interpretations of the dissimilarity matrix. Next to that there are some tools for visualization, evaluation and the computation of dissimilarities.

In our terminology we distinguish

  • training set: the set of labeled objects to be used for training a classifier
  • test set: the set of labeled objects to be used for evaluating a classifier
  • representation set: the set of objects used for representing other objects in a dissimilarity space or by embedding

In some documents, help files and software comments the following abbreviations may be used: trainset of training set, testset for test set, repset for representation set, dismat for dissimilarity matrix, disspace for dissimilarity space and pe_space for the space obtained by pseudo_Euclidean embedding.

In the below documentation D, D1, D2, etc are used for dissimilarity matrices. U stands for an untrained classifier, W for a trained classifier and V for a mapping between spaces or representations. Readers not familiar with PRTools should realize that the * operator is overloaded for feeding the left operand to the right (like piping in Unix). Some very simple examples:

U = svc([],1);  % defines the untrained support vector classifier
%               % with regularization parameter C=1.
W = D*U;        % trains U by D in a dissimilarity space
W = D*svc([],1);% Alternative notation 1
W = svc(D,1);   % Alternative notation 2
e = D*W*testc;  % evaluates W by the training set using the testc.

See PRTools documentation for more details.
For consistency we will use throughout in this documentation the alternative notation 1 as illustrated above. It depends on the further software and the coding style of the programmers what will be used in practice.

Manipulation of dissimilarity matrices

makesym Make a dissimilarity matrix symmetric
D2= D*makesym default is averaging d and d'.
D2= D*makesym([],'min') use min(d,d')
genddat Generate random training and test sets for dissimilarity data,
[DT,DS] = genddat(D,0.5) use 50% of the data for trainset (dt), remaining for testset (ds); repset is trainset
[DT,DS] =
genddat(D,[10 20],5)
use 10 objects of first class and 20 of second for training. repset is first 5 objects of trainset
 seldclass Select class subset from a square dissimilarity dataset
D2 = seldclass(D,[3 4]) Reduce square dissimilarity matrix such that only rows and columns of classes 3 and 4 are preserved

Classification of given dissimilarity matrix

In the analysis for such matrices it is assumed that training set and representation set coincide. The rows usually refer to the test objects and the columns to the training objects. The dissimilarity matrix between the training objects themselves is square. Occasionally dissimilarity measure can give non-zero values for the diagonal. In some routines they may be replaced by zeros. If in documentation and in the text below a square matrix is mentioned, it refers to a dissimilarity matrix between a set of objects and itself. D(i,j) is the distance from object i to object j, D(j,i) is the distance from object j to object i.

A dissimilarity matrix constructed as a PRTools dataset should have object labels for the rows and feature labels for the columns. The labels of the two data sets used for constructing the dissimilarity matrix should be used for that. PRTools commands computing dissimilarity matrix take care of that. If a dissimilarity matrix given in a PRTools dataset is transposed (E = D') then the two labellings are switched automatically by PRTools.

nne Leave-one-out nearest neighbor error estimated from a square labeled dissimilarity matrix.
e = D*nne
nnerror1 Exact expected NN error from a dissimilarity matrix, trainset size per class. The exact expected error is made by a full integration.
e = D*nnerror1; plote(e) Compute and plot learning curve based on the 1NN rule
e = D*nnerror1([],20) Compute and show the exact expected 1NN error for 20 objects / class
nnerror2 Exact expected NN error from a dissimilarity matrix, trainset size overall. The exact expected error is made by a full integration.
 e = D*nnerror2; plote(e) Compute and plot learning curve based on the 1NN rule
 e = D*nnerror2([],20) Compute and show the exact expected 1NN error for 20 objects
testkd Test k-NN classifier for dissimilarity data
e = D*testkd(3,'loo') Compute for a square dismat classification error of the kNN rule with k=3 using leave-one-out crossvalidation.
e = D*testkd Compute for a dissimilarity dataset of test data (dissimilarities with a trainset) the classification error using the 1NN rule.
knndc k-nearest neighbor classifier for dissimilarity data
[DT,DS] = genddat(D,0.5);
W = DT*knndc;
e = DS*W*testc
Generate trainset and testset from the given dismat D. The repset equals the training set. Train k-nearest neigbor classifier (optimize k) by DT and test it on DS. DT and DS should be dissmilarity datasets based on the same representation set.
 testpd Parzen classifier for dissimilarity data
e = D*testpd(h,'loo') Compute for a square dismat classification error of the Parzen classifier for a given value of the smoothing parameter (kernel width) h, using leave-one-out crossvalidation.
e = D*testpd Compute for a dissimilarity dataset of test data (dissimilarities with a trainset) the classification error using the Parzen classifier. The smoothing parameter is optimized by parzenddc.
parzenddc  Parzen classifier for dissimilarity data
[DT,DS] = genddat(D,0.5);
W = DT*parzenddc;
e = DS*W*testc
Generate trainset and testset from the given dismat D. The repset equals the training set. Train the Parzen classifier (optimize h) by DT and test it on DS. DT and DS should be dissmilarity datasets based on the same representation set.

Dissimilarity space

The dissimilarity space is postulated as an Euclidean vector space. Objects are represented in this space as vectors of given dissimilarities to a representation set of other (or the same) objects. It can be treated in a similar way as the traditional feature space in which the features are defined as dissimilarities to the representation set.

Users have to decide whether they want to use a separate representation set, different from the training and test set. Alternatively they can use (a part of) the training set for representation and may even include test objects as there is no need that the representation objects are labeled. The main routine to organize this is genddat. Here are a few examples based on a square dissimilarity matrix D of size 100*100 with two classes of 40 respectively 60 objects and class priors 0.4 and 0.6. Dissimilarity trainsets DT and testsets DS with the same repset are generated.

% 60% for training, repset is trainset
[DT,DS] = genddat(D,0.6)

% DT is a 60 by 60 dataset with 2 classes: [24  36]
% DS is a 40 by 60 dataset with 2 classes: [16  24]

% 60% for training, 10% of the trainset for the repset
[DT,DS] = genddat(D,0.6,0.1)

% DT is a 60 by 7 dataset with 2 classes: [24  36]
% DS is a 40 by 7 dataset with 2 classes: [16  24]
% Note that fractions in numbers of objects are rounded up.

% 60% for training, 0% of the trainset, 10% of the testset for the repset
[DT,DS] = genddat(D,0.6,0,0.1)
% DT is a 60 by 5 dataset with 2 classes: [24  36]
% DS is a 40 by 5 dataset with 2 classes: [16  24]

% similar, but exclude repset from testset
[DT,DS] = genddat(D,0.6,0,0.1,'ex')
% DT is a 60 by 5 dataset with 2 classes: [24  36]
% DS is a 35 by 5 dataset with 2 classes: [14  21]

Once a training set is available by a dissimilarity matrix stored as a dataset DT, it is interpreted in the dissimilarity space approach as a set of training vectors. In principle any classifier suitable for a traditional feature space can be used here as well. An independent test set represented in the same space by DS can be used for evaluation.

The strength of this approach is that it can be used for any dissimilarity measure, Euclidean, non-Euclidean, metric, non-metric, symmetric or non-symmetric. A drawback is that by this generality the characteristic of the dissimilarity values themselves (positive numbers with the meaning that smaller values indicate more similar objects) is not used. A second drawback is that the dimensionalities can be large, e.g. as large as the training set or even larger (possibly causing high correlations as well). This should be taken into account in the training of classifiers.

A solution for both drawbacks is to reduce the representation set significantly. This can be done by a random selection as shown above by the genddat function. There are in addition various approaches to do this systematically, e.g. the feature selection routines of PRTools. These don’t use the characteristics of dissimilarities but are based on the class separability in the dissimilarity space.

A faster solution is the use the object properties as offered by the individual dissimilarities. The DisTools routine protselfd offers some supervised and some unsupervised possibilities.

Here are some examples. They all compute a selection routine W that may be used by [D1,D2]= D*W, in which D1 contains the selected prototypes for the representation set and D2 contains the other ones.

w = d*featseli;
Individual selection using the 1NN LOO criterion
w = d*featself([],'maha-s',k); Forward selection of k prototypes based on the Mahalanobis distance
help feateval Lists possible criterions for feature selection
w = d*protselfd([],crit);
d2 = d*w(:,1:10);
Rank the prototypes (see help protselfd for criteria).
Use the first 10.
[dt,ds] = genddat(d,0.5);
w = dt*protselfd([],crit,10);
[dt1,dt2] = dt*w;
[ds1,ds2] = ds*w;
Split the dataset in a training set and a test set.
Compute a selection of 10 prototypes on the training set.
Apply it on the trainset, compute the complementary set as well.
Apply it on the testset, compute the complementary set as well.

Once a proper repset has been selected and applied to the testset as to the trainset, any PRTools classifier can be trained and tested using its test routines testc and testd.

Pseudo-Euclidean embedding

By embedding a dissimilarity matrix $$D$$ a vector space $$X$$ with a given norm $$||.||$$ is constructed such that $$D_{ij} = ||x_i-x_j||$$. Obviously, by definition, the dissimilarity matrix has to be Euclidean for a perfect embedding into an Euclidean vector space. A more general space is the pseudo-Euclidean space in which any symmetric dissimilarity matrix with a zero diagonal can be perfectly embedded. Even off-diagonal zeros and negative numbers are allowed.

The Pseudo-Euclidean space (PE-space)  consists of two orthogonal Euclidean subspaces, called the positive space $$P$$ and the negative space $$Q$$. By embedding, vectors $$x$$ are constructed in which the first $$p$$ components $$ x_p$$ refer to $$P$$ and the following $$q$$ components $$x_q$$ to $$Q$$, $$ x = (x_p,x_q)$$ . Usually, for non-degenerate (full rank) dissimilarity matrices holds that $$p+q = m-1$$ if $$D$$ describes the dissimilarities between $$m$$ objects. The pseudo-Euclidean norm $$||.||_{pe}$$is now defined as $$||x||_{pe} = sqrt{||x_p||^2 – ||x_q||^2}$$.

The PE-space can be interpreted as a Euclidean space $$P$$ offering an approximate solution for the embedding and a space  $$Q$$ in which all corrections are collected. Distances (or the length of vectors) are now computed (in the squared sense) by subtracting the corrections from the Euclidean norm (distances). Consequently, squared distances may be zero or even negative, resulting in a complex norm.

The main DisTools routine for PE- embedding is called pe_em. It contains various options for returning reduced spaces of a lower dimensionality. The mapping produced by the routine may be used to project the original training data on the pe_space, or, alternatively, any other dissimilarity matrix based on the same representation set. It should be realized this can be very inaccurate for non-Euclidean dissimilarities and high-dimensional spaces, resulting in imaginary distances in the PE-space in cases for which the given distances are all positive.

In addition to pe_em there are various other quantities describing the non-Euclidean characteristics of a dissimilarity matrix.

  • The signature is the duplet (p,q) describing the dimensionalities of the positive and the negative space.
  • The spectrum is the set of ranked eigenvalues of a Gramm matrix computed from the dissimilarity matrix. Some eigenvalues are negative in case of non-Euclidean dissimilarities. The eigenvectors related to the positive eigenvectors correspond with the axes of the positive space and the negative ones with the negative space.
  • The Non-Euclidean Fraction, NEF, considers the absolute values in the spectrum. The sum over the eigenvalues found for the negative space divided by the total sum over all eigenvalues constructs the NEF.
  • The Non-Metricity Fraction, NMF, is the fraction of triplets in the dissimilarity matrix (corresponding to the dissimilarities between three objects) that violates the the triangle inequality: the largest of the three is larger than the sum of the two other ones.

In the below table a number of DisTools commands and small examples is listed. They may be applied to a simple non-Euclidean dissimilarity matrix, e.g. as generated in the first rows:

 

a = gendatb;
d = a*proxm(a,'m',1);
Generate a 2-dimensional dataset, 2 classes, 100 objects.
Compute the 100 x 100 dismat based on Minkowsky-1 metric.
This dissimilarity is metric but non-Euclidean.
a = gendatb;
d = a*proxm(a,'m',1);
d = d.^5;
Generate a 2-dimensional dataset, 2 classes, 100 objects.
Compute the 100 x 100 dismat based on Minkowsky-1 metric.
Take the 5th power, creating a dissimilarity which is non-metric.
w = d*pe_em;
x = d*w;
Compute PE mapping
Apply to dissimilarity matrix
[dt,ds] = genddat(d,0.5);
w = dt*pe_em;
xt = dt*w;
xs = ds*w;
Split given dataset in trainset and testset
Compute PE mapping using the trainset
Map the trainset
Map the testset
w = d*pe_em;
x = d*w
w*signature
x*signature
xa = x*euspace;
xp = x*euspace([],'pos');
xn = x*euspace([],'neg');
x*nef
xa*nef
w*nef
Compute PE mapping
Apply mapping to dissimilarity matrix
Retrieve the signature of w
Retrieve the signature from the PE space
Convert x to associated space
Convert x to positive space
Convert x to negative space
Compute negative eigenfraction from x
Compute negative eigenfraction from xa
Compute negative eigenfraction from w
d*nmf
d.^3*nmf
Compute non-metric fraction of d
Compute non-metric fraction of d.^3
w = d*pe_em;
s = getspectrum(w);
plotspectrum
(w)
Compute a PE mapping
Get its spectrum
PLot its spectrum

In the next table a number of routines and classifiers are listed that make use of the specific norm that is defined for a PE space. This norm follows from the signature and is stored in a dataset x that is projected in a PE space by mappings that are computed by pe_em: x = d*(d*pe_em); or x = d*pe_em(d);

d = pe_distm(x,y) Square PE dismat between two datasets
z = pe_mtimes(x,y) Matrix multiplication (inner product) in PE space
k = x*pe_kernelm Compute linear kernel in PE space
w = x*pe_nmc Nearest mean classifier in PE space
w = x*pe_knnc KNN classifier in PE space
w = x*pe_parzenc Parzen classifier in PE space
w = x*pe_libsvc SVM in PE space

Here is a small example that splits a dataset in a trainset and a testset and computes the performances for three classifiers

a = gendatb;
d = a*proxm(a,'m',1);
w = d*pe_em;
x = d*w;
[xt,xs] = gendat(x,0.5);
w = xt*{pe_nmc,pe_knnc,pe_parzenc,pe_libsvc};
testc(xs,w)

Visualization

Some ways to visualize a dissimilarity matrix d:

imagesc(+d)  Show the data as an image
[lab,den] = hclust(d,'s');
imagesc(+d(lab,lab))
Perform some clustering that  ranks the data
Show it as an image in the ranked order
scatterd(d*pca(d,2)) Show a PCA space of the dissimilarity space
scatterd(d*pe_em(d,[2 0])) Show first two dimensions of the positive space after PE embedding
scatterd(d*pe_em(d,[1 1])) Show the first positive and the first negative dimension after PE embedding
scatterd(d*mds(d,2)) Multi-dimensional scaling

Evaluation

Here some examples are given illustrating how for the three ways a dissimilarity representation can be handled, classifiers can be computed and tested.

[dt,ds] = genddat(d,0.5)
w = knndc(dt);
e_dm = ds*w*testc
Split dissimilarity matrix, ds contains dissimilarities to trainset
Compute kNN classifier (optimize k)
Classification error on testset
[dt,ds]= gendat(d,0.5);
w = knnc(dt);
e_ds = ds*w*testc
Split dataset in disspace (based on all data!) in trainset and testset
Compute classifier on trainset
Classification error on testset
[dt,ds]= genddat(d,0.5);
w = knnc(dt);
e_ds = ds*w*testc
Split dataset in disspace (based on trainset) in trainset and testset
Compute classifier on trainset
Classification error on testset
x = d*pe_em(d);
[xt,xs] = gendat(x,0.5);
w = xt*pe_knnc;
e_pe = xs*w*testc
Compute PE space from all data
Split in trainset and testset
Compute classifier on trainset
Classification error on testset
[dt,ds]= genddat(d,0.5);
v = pe_em(dt);

xt = dt*v;
xs = ds*v;

w = xt*pe_knnc;
e_pe = xs*w*testc
Split dataset in disspace (based on trainset) in trainset and testset
Find PE space
Map trainset on PE space
Map testset on PE space
Compute classifier on trainset
Classification error on testset

In the second and the fourth example the spaces are defined on all data as the gendat function just selects objects in a given space. In the construction of that space the labels are not used. The test results are thereby still fair, but give an estimate of the performance when test sets of a similar size are used and classifiers are recomputed. This is also called transductive learning: the classifiers are adapted to the test data.

In the third and the fifth example just the training sets are used for building the spaces as the genddat function uses by default a representation set (columns) that is equal to the trainset.  Thereby classifiers can be computed that are valid for all future data. For some small, severely non-Eucledian datasets the projection of the test data on a PE space that has been computed on just the train data yields bad, e.g. complex, results. Therefor many studies are based on transductive learning.

PRTools contains special routines for crossvalidation, learning curves and feature curves: crossval, cleval and clevalf. Sometimes special versions, crossvald and clevald, are needed for dissimilarity data in order to use genddat instead of gendat for splitting datasets. In the transductive approaches this is not needed. Here are some examples.

%
crossval(d,nmc,10,5)

crossval(d*pe_em(d),pe_nmc,10,5)
Transductive learning
10-fold crossvalidation of nmc (5 times) in disspace.
Embedding in PE space and run a PE classifier.
%
crossvald (d,nmc,10.20,5)
crossvald (d,pe_em*pe_nmc,10,[],5)
Use of crossvald builds spaces from the trainset only.
Disspaces with at random 20 training objects for representation.
PE spaces recomputed for for every new trainset.
e = cleval(d,nmc,[],5);
plote(e)
nmc learning curve in disspace, repset is all data. 5repeats.
Plot learning curves of test and apparent error
e = clevald(d,{nmc,ldc,qdc}, ...
[],1,5); plote(e,'noapperror')
Compare learning curves of 3 classifiers in disspace with a repset of 1 training object per class.
e = clevald(d,{knndc,knnc}, ...
[],[],5);
plote(e,'noapperror')
Learning curves of knndc (on dismat) and knnc (in disspace), repset is trainset, plot without apparent error curve.
e = clevald(d,{pe_em*pe_knnc, ...
knnc,knndc},[],[],25); plote(e,'noapperror')
Compare learning curves of kNN in PE space, disspace and dismat. The representation set equals the training set.
e = clevalf(d,ldc,[1:50:216], ...
0.5,10);
plote(e,'noapperror')
Feature curve in disspace using all objects for representation and 50% of the objects for training.
e = clevalf(d*pe_em(d),ldc, ... [1:50:215],0.5,10); plote(e,'noapperror') Feature curve in PE space using all objects for representation and 50% of the objects for training.

Computing dissimilarities

DisTools focuses on the analysis of dissimilarities. The computation of dissimilarities should arise from the application, e.g. dissimilarities between images, shapes, spectra or time signals. However, also special distances between objects in vector spaces. Next to the PRTools routine proxm offers DisTools a similar routine with an extended set of distances: proxxm. In the below table a list is presented of all possibilities for its parameter type.

D = B*proxxm(A,type,p)
'POLYNOMIAL'   | 'P':  SIGN(A*B').*(A*B').^P
'HOMOGENEOUS'  | 'H':  SIGN(A*B').*(A*B').^P
'EXP'          | 'E':  EXP(-(||A-B||)/P)
'EXP-SUM'      | 'ES': SUM_Z SIGN(P(Z)) * EXP(-(||A-B||)/P(Z))
'RBF'          | 'R':  EXP(-(||A-B||.^2)/(P*P))
'RBF-SUM'      | 'RS': SUM_Z SIGN(P(Z)) * EXP(-(||A-B||.^2)/(P(Z)^2))
'SIGMOID'      | 'S':  SIGM(A*B'/P)
'DSIGMOID'     | 'DS': SIGM(||A-B||^(2P(1))/P(2))
'DISTANCE'     | 'D':  ||A-B||.^P
'MULTIQUADRIC' | 'MQ': SQRT(||A-B||.^2/P(1) + P(2))
'THIN-PLATE'   | 'TP': ||A-B||.^(2*P(1))/P(2) * LOG(1+||A-B||.^(2*P(1))/P(2))
'N-THIN-PLATE' | 'NTP': -||A-B||.^(2*P(1))/P(2) * LOG(1+||A-B||.^(2*P(1))/P(2))
'MINKOWSKI'    | 'M':  SUM(|A-B|^P).^(1/P)
'CITY-BLOCK'   | 'C':  SUM(|A-B|)
'COSINE'       | 'O':  1 - (A*B')/||A||*||B||
'FOURIER'      | 'F'
'TANH'         | 'T':  TANH(P*(A*B')/LENGTH(A) + P);
'ANOVA'        | 'A':  ANOVA MODEL
'BSPLINE'      | 'B':  BSPLINE MODEL, ORDER P
'ANOVABSPLINE' | 'AB': ANOVA-BSPLINE MODEL, ORDER P
'ANOVASPLINE1' | 'AS1':ANOVA-SPLINE MODEL, ORDER 1
'ANOVASPLINE2' | 'AS2':ANOVA-SPLINE MODEL, ORDER 2
'ANOVASPLINE3' | 'AS3':ANOVA-SPLINE MODEL, ORDER 3