### PRTools examples: Datasets

Datasets are the key element of PRTools. It is assumed that the reader is familiar with the introductory sections of the user guide:

Here we will present a small example about the creation of a dataset. We will use the kimia dataset, a set of images:

`A = kima`

`show(A,12);`

`classnames(A)`

It is a 18-class dataset with 12 black-and-white images per class. Every image has 64*64 pixels. We select just two classes for a simple experiment.

`B = selclass(A,{'elephant','turtle'})`

`show(B,6);`

% Compute the total number of pixels in the blob

% (they have value 1) as an estimate of the area

feat1 = sum(B,2)

% Compute the number of 0-1 and 1-0 changes in the

% unfolded image as an estimate of the perimeter

`feat2 = sum(abs(B(:,1:end-1)-B(:,2:end)),2)`

Try to understand how `feat2`

estimates the perimeter of a blob in `B`

. Now the two features `feat1`

and `feat2`

are used to construct a new dataset `X`

, using the same labels as `B`

.

`% Construct a dataset, use the orginal labels of B`

`X = prdataset([feat1 feat2],getlabels(B));`

`scatterd`

(X,'legend');

`(X,1) % the leave-one out error of the 1-NN classifier`

`testk`

The nearest neighbors are found in the given feature space. The scatter plot is misleading. Look at a properly scaled version:

`axis equal`

Feature 2 (vertical) is almost not significant. Let us scale the features by their variance (make variances 1):

`Y = X*mapex(`

`scalem`

,'variance');

`figure; scatterd(Y,'legend'); axis equal`

showfigs

`testk`

(Y,1)

The routine `mapex`

takes care of training (here `scalem`

) and executing on the same data. The scatter plot and the result of `testk`

show an improvement.

In order to find a better estimate for the perimeter the objects should be considered as images. This is possible as in the dataset to information of image size is preserved. See the conversion of the dataset to a structure

`struct(B)`

The feature size stores the image size:

`getfeatsize(B)`

A routine like show makes use of this fact. PRTools offers a general routine for computing arbitrary image operations on all objects stored in a dataset:

`C = B*filtim('bwperim');`

`show(C)`

Counting the number pixels of the countour image gives a better esitmate of the contour length as it takes the vertical contributions into account.

#### Exercises:

- compute
`feat3`

, the feature based on the improved contour length estimate. - construct a dataset based on
`feat1`

and`feat3`

and test its performance. - create a scatter plot of the two contour length estimators.
- test how good
`feat3`

alone is. Explain the result on the basis of the object images.

**elements:**
datasets
datafiles
cells and doubles
mappings
classifiers
mapping types.

**operations:**
datasets
datafiles
cells and doubles
mappings
classifiers
stacked
parallel
sequential
dyadic.

**user commands:**
datasets
representation
classifiers
evaluation
clustering
examples
support routines.

**introductory examples:**
Introduction
Scatterplots
Datasets
Datafiles
Mappings
Classifiers
Evaluation
Learning curves
Feature curves
Dimension reduction
Combining classifiers
Dissimilarities.

**advanced examples**.