### DisTools examples: Classifiers in pseudo-Euclidean space

Not all classifiers can be computed in a pseudo-Euclidean space. Some examples will be discussed. It is assumed that readers are familiar with PRTools and will consult the following pages where needed:

- PRTools User Guide, See at the bottom of the page for a TOC
- Introduction to DisTools
- Dissimilarity Representation Course
- The following packages should be in the Matlab path: PRTools, DisTools, PRDisData

Classifiers based on distances can be defined in a PE space as distances in this space are well-defined. There are some routines available to facilitate this.

`D = flowcytodis(1);`

`X = D*(pe_em*mapex);`

`prcrossval(X,{pe_knnc,pe_parzenc,pe_nmc},2,5)`

Density based classifiers are not yet well defined for a PE-space. However, procedures based on means and covariance matrices can still be used as in the computation of the covariances the signature cancels. The interpretation of normal distributions is not valid as such distributions are not defined for a PE-space.

`D =`

`flowcytodis`

(1);

`X = D*(pe_em*mapex);`

`prcrossval`

(X,{ldc,udc,qdc},2,5)

Explain why `udc`

and `qdc`

yield the same result.

The support vector classifier is based on semi-positive definite kernels. Kernels in PE space, however, are indefinite. Nevertheless some implementations may for some problems still yield good results. DisTools offers two PE support vector classifiers: `pe_svc`

and `pe_libsvc`

. The latter is based on PRTools `libsvc`

and needs the `libsvm`

package in the path.

`D =`

`(1);`

`flowcytodis`

`X = D*(pe_em*mapex);`

`(X,{`

`prcrossval`

`pe_svc`

,`pe_libsvc`

,svc,`libsvc`

},2,5)

`svc`

and

do not compute pe-kernels, but compute the semi-positive definite kernels in the associated space. They may do equally good or even better than the PE versions. However, none of them is optimal in the sense that the margin in PE-space is maximized.`libsvc`

#### Exercise

In the above example train sets and test sets are generated in PE space. Their representation is based on the total given dissimilarity matrix.

- Perform for some PE classifiers a cross-validation experiment that first splits the data and computes the PE space from the training set alone.
- Compute for one of the
`chickenpieces`

datasets learning curves that compare the two approaches: PE spaces computed from the total dataset with PE spaces determined by the train set alone.

**elements:**
datasets
datafiles
cells and doubles
mappings
classifiers
mapping types.

**operations:**
datasets
datafiles
cells and doubles
mappings
classifiers
stacked
parallel
sequential
dyadic.

**user commands:**
datasets
representation
classifiers
evaluation
clustering
examples
support routines.

**introductory examples:**
Introduction
Scatterplots
Datasets
Datafiles
Mappings
Classifiers
Evaluation
Learning curves
Feature curves
Dimension reduction
Combining classifiers
Dissimilarities.

**advanced examples**.