### PRTools examples: Dissimilarities

Dissimilarities offer an alternative way in comparison with features to represent objects in a vector space. They are similar to kernels, but more general. They can be used on top of a given feature representation, but more importantly, they are often computed for raw data: images, spectra, time signals. Here some ways are given to compute dissimilarities in a given feature space. There is a separate set of examples on handling dissimilarities.  It is assumed that the reader is familiar with the introductory sections of the user guide:

The main routine for computing similarities and dissimilarities is `proxm`. Formally it is a trainable mapping. During training the training set is just stored. During execution the proximities between a ‘test’ set and the ‘training’ set are computed. Let us take the example of the L1 distance, which is identical to the Minkowksy-1 metric.

`A = gendatb([10 10]) % given set of 20 objects in a 2D space`
`B = gendatb([50 50]) % new set of 100 objects in the same space`
`W = A*proxm('m',1)   % define proximity mapping, 2 to 20`
`D = B*W              % resulting dissimilarity matrix 100 x 20`

The ‘trained’ mapping `W` (it is just trained in PRTools terms, in fact there is nothing optimized, just data is stored) maps data from a 2D space (the number of features of the input space) to 20D (the number of objects in A. After applying this mapping to a dataset of 100 objects a 100*20 dissimilarity matrix is `D` obtained, which can also be considered as a set of 100 objects in a 20D space.

A special case is the computation of Euclidean distances. This is the same as a L2 distance or a Minkowsky-2 distance (`proxm('m',2)`) or the computation of an Euclidean distance to the first power (`proxm('d',1)`). These yield the same results, which, moreover, can be obtained by the routine `distm`, which is especially designed for Euclidean distances:

`D1 = B*(A*proxm('m',2))`
`D2 = B*(A*proxm('d',1))`
`D3 = distm(B,A).^(0.5)`

The results are identical, except for very small differences due to digital noise. This may be checked by:

`max(max(abs(+D1 - +D2)))`
`max(max(abs(+D1 - +D3)))`
`max(max(abs(+D2 - +D3)))`

The last statement shows that D2 and D3 are fully identical, due to the fact that `proxm` internally uses `distm` for the `'d'` option. Note, however, the huge difference in computation times:

`X = rand(1000,100); % generate 1000 objects in 100D`
`tic; D1 = X*(X*proxm('m',2)); toc`
`tic; D3 = distm(X).^(0.5); toc`

The time difference may be as large as a factor 100. The routine `distm` is programmed for speed. It is just a fixed mapping and not trainable as `proxm`.

Proximity matrices obtained by `proxm` and `distm` store the class labels of the ‘train’ set as feature labels. They have thereby the same format as a classification matrix. However, values a classification matrix have the orientation that higher values are closer more similar) to the class corresponding to the column of the considered matrix element. Dissimilarities have thereby the wrong orientation for a direct classification. This can be easily corrected.

`A = gendatb([20 20]);  % training set`
`B = gendatb([50 50]);  % test set`
`D = B*(A*proxm('d',1));% Euclidean dissimilarity matrix`
`(1-D)*testc            % classification`
`B*(A*knnc([],1))*testc % 1NN classification`

The two classification results are identical. The first road, by using `proxm`, offers, however, the possibility to use other distance measures, e.g.

`D = B*(A*proxm('m',1)); % Euclidean dissimilarity matrix`
`(1-D)*testc             % classification`

computes the NN classification result based on the L1 distance.