Distools User Guide,
2 Hour Course,
4 Day Course
Computing Dissimilarities,
Manipulation ,
Visualization
Dissimilarity Matrix Classification,
Dissimilarity Space,
PEEmbedding,
Evaluation
PseudoEuclidean embedding
This page belongs to the User Guide of the DisTools Matlab package. It describes some of its commands. Links to other pages are listed above. More information can be found in the pages of the PRTools User Guide. Links are given at the bottom of this page.
By embedding a dissimilarity matrix a vector space with a given norm is constructed such that . Obviously, by definition, the dissimilarity matrix has to be Euclidean for a perfect embedding into an Euclidean vector space. A more general space is the pseudoEuclidean space in which any symmetric dissimilarity matrix with a zero diagonal can be perfectly embedded. Even offdiagonal zeros and negative numbers are allowed.
The PseudoEuclidean space (PEspace) consists of two orthogonal Euclidean subspaces, called the positive space and the negative space . By embedding, vectors are constructed in which the first components refer to and the following components to , . Usually, for nondegenerate (full rank) dissimilarity matrices holds that if describes the dissimilarities between objects. The pseudoEuclidean norm is now defined as .
The PEspace can be interpreted as a Euclidean space offering an approximate solution for the embedding and a space in which all corrections are collected. Distances (or the length of vectors) are now computed (in the squared sense) by subtracting the corrections from the Euclidean norm (distances). Consequently, squared distances may be zero or even negative, resulting in a complex norm.
The main DisTools routine for PE embedding is called pe_em
. It contains various options for returning reduced spaces of a lower dimensionality. The mapping produced by the routine may be used to project the original training data on the pe_space, or, alternatively, any other dissimilarity matrix based on the same representation set. It should be realized this can be very inaccurate for nonEuclidean dissimilarities and highdimensional spaces, resulting in imaginary distances in the PEspace in cases for which the given distances are all positive.
In addition to pe_em
there are various other quantities describing the nonEuclidean characteristics of a dissimilarity matrix.
 The signature is the duplet (p,q) describing the dimensionalities of the positive and the negative space.
 The spectrum is the set of ranked eigenvalues of a Gramm matrix computed from the dissimilarity matrix. Some eigenvalues are negative in case of nonEuclidean dissimilarities. The eigenvectors related to the positive eigenvectors correspond with the axes of the positive space and the negative ones with the negative space.
 The NonEuclidean Fraction, NEF, considers the absolute values in the spectrum. The sum over the eigenvalues found for the negative space divided by the total sum over all eigenvalues constructs the NEF.
 The NonMetricity Fraction, NMF, is the fraction of triplets in the dissimilarity matrix (corresponding to the dissimilarities between three objects) that violates the the triangle inequality: the largest of the three is larger than the sum of the two other ones.
In the below table a number of DisTools commands and small examples is listed. They may be applied to a simple nonEuclidean dissimilarity matrix, e.g. as generated in the first rows:
A = gendatb; 
Generate a 2dimensional dataset, 2 classes, 100 objects. Compute the 100 x 100 dismat based on Minkowsky1 metric. This dissimilarity is metric but nonEuclidean. 

Generate a 2dimensional dataset, 2 classes, 100 objects. Compute the 100 x 100 dismat based on Minkowsky1 metric. Take the 5th power, creating a dissimilarity which is nonmetric. 
W = D* 
Compute PE mapping Apply to dissimilarity matrix 
[DT,DS] = genddat(D,0.5); W = DT* XT = DT*W; XS = DS*W; 
Split given dataset in trainset and testset Compute PE mapping using the trainset Map the trainset Map the testset 
W = D* X = D*W W*signature X* signature XA= X*euspace; XP= X* XN= X* 
Compute PE mapping Apply mapping to dissimilarity matrix Retrieve the signature of W Retrieve the signature from the PE space Convert x to associated space Convert x to positive space Convert x to negative space Compute negative eigenfraction from X Compute negative eigenfraction from XA Compute negative eigenfraction from W 
D*nmf D.^3*nmf 
Compute nonmetric fraction of D Compute nonmetric fraction of D.^3 
W = D* (W) 
Compute a PE mapping Get its spectrum PLot its spectrum 
In the next table a number of routines and classifiers are listed that make use of the specific norm that is defined for a PE space. This norm follows from the signature and is stored in a dataset x that is projected in a PE space by mappings that are computed by
: X = D*(D*pe_em
); or X = D*pe_em
(D);pe_em
D = pe_distm(X,Y) 
Square PE dismat between two datasets 
Z = pe_mtimes(X,Y) 
Matrix multiplication (inner product) in PE space 
K = X*pe_kernelm 
Compute linear kernel in PE space 
W = X*pe_nmc 
Nearest mean classifier in PE space 
W = X*pe_knnc 
KNN classifier in PE space 
W = X*pe_parzenc 
Parzen classifier in PE space 
W = X*pe_libsvc 
SVM in PE space 
Here is a small example that splits a dataset in a trainset and a testset and computes the performances for three classifiers
A = gendatb;
D = A*proxm(A,'m',1);
W = D*
;
pe_em
X = D*W;
[XT,XS] = gendat(X,0.5);
W = XT*{pe_nmc,pe_knnc,pe_parzenc,pe_libsvc};
(XS,W)
testc
PRTools User Guide
elements: datasets, datafiles. cells and doubles, mappings, classifiers, mapping types
operations: datasets, datafiles, cells and doubles, mappings, classifiers, stacked, parallel, sequential, dyadic
commands: datasets, representation, classifiers, evaluation, clustering and regression, examples, support