PRTools, elements, operations, user commands, introductory examples, advanced examples

Mappings

On this page the mapping is introduced. It is one of the key elements of PRTools.

Mapping background

Mappings pave the way from object observations to the recognition of the pattern class they belong to. They define the normalization of the raw measurements, the extraction of the initial features, the reduction to a small, relevant feature set, the estimation of densities of classes or models, the transformation to the output space of a classifier, which can be posteriors, confidences or distances, and finally the selection of the most likely class. In every step a mapping procedure defines how input data is transformed into output data. In programming terms:

output_data = mapping_procedure(input_data,parameters)

In PRTools input_data and output_data are usually defined a dataset or a datafile, but occasionally they may be a cell array or an array of doubles (in which rows represent the objects), or even a set of scalars or a string. There are many mapping procedures available, e.g. procedures for handling mappings, image handling, image operations, image features, feature selection, fixed mappings, trainable mappings, density estimation..

Mappings can be of the following types: , fixed_cell, , , ,

  • Fixed mappings: they are fully user defined by their parameters.
  • Trainable mappings: in this case the parameters defining the mapping are optimized for a given dataset. Consequently they can be in an untrained mode as well as in a trained mode.
  • Combiners, these are mappings that accept as an input other mappings and combine them.
  • Generators, mappings that generate data by sampling or interpolating a given dataset.

Classifiers are a special kind of trainable mapping as they map feature data (input_data) on class labels (output_data) or on class confidences. There is a large set of classifiers available:  linear and quadratic, SVMs, neural networks, various (e.g. decision trees and nearest neighbor rules) and combining classifiers.

Mapping definition

Mappings are typically defined inside functions that compute or apply a mapping, e.g. inside classifiers or routines for feature reduction. This section is thereby just given as a background information for the interested reader. There is no need to study this for users that use PRTools from the command line or write simple experimental scripts on the basis of the available routines.

The mapping constructor looks like

W = prmapping(command,type,data,labels,size_in,size_out)

  • command is a string with the name of the mapping (and thereby also the name of the m-file) that should be called to execute commands like B = A*W, in which A is usually a dataset, and occasionally another mapping.
  • type is the mapping type, 'fixed', 'untrained', 'trained', 'combiner' or 'generator'. See the description op mapping types.
  • data is a structure or a cell array of all information that the specific procedure (name) expects and needs to perform the mapping.
  • labels is a set of strings or numbers that will be used to annotate the resulting dataset B with feature labels. The typical use is that labels stores the class names of A (its lablist field) in case of a classifier.
  • size_in is the dimension of the input space (number of features of A).
  • size_out is the dimension of the output space (number of features of B).

The defined mapping W will usually be used for mapping a dataset A into another space, resulting into a dataset B:

B = A*W

Consequently, if the A has a size of [m,size_in] and W has a size of [size_in,size_out], then the mapping B will have a size of [m,size_out]. Thereby B = A*W is consistent with a Matlab expression in which B, A and W are matrices of the corresponding sizes.

PRTools uses the values of size_in and size_out mainly for error detection and returning appropriate error messages. In many cases the values may be replaced by 0, e.g. when the sizes are unknown at the moment of defining the mapping. This will not effect a correct execution, but may result in badly understandable error messages in case a dataset A of a wrong size is used for input.

It he W defines an untrained mapping a dataset A can be used for training W, resulting into a trained mapping V:

V = A*W

In this case the size of V is irregularly [size_in,k] in which k is determined by the training procedure. In case of a trainable classifier the size of  V is [size_in,c] with c the number of classes.

Mapping overload

Mappings define transformations of vector spaces of possibly different dimensionality. Their size [size_in,size_out] corresponds to these dimensionalities. Consequently they behave somewhat similar as matrices of this size. Linear (affine) mappings are almost identical to such matrices but includes a shift operation. In addition they may carry several other types of information as explained in the mapping definition.

Structure

A full list of all information stored in a mapping can be found by converting a mapping into a structure.

   a = iris;
   w = pcam(a,2);
   struct(w)
%    mapping_file: 'affine'
%    mapping_type: 'trained'
%            data: [1x1 struct]
%          labels: [2x1 double]
%         size_in: 4
%        size_out: 2
%           scale: 1
%        out_conv: 0
%            cost: []
%            name: 'PCA to 2D'
%            user: []
%         version: {[1x1 struct]  '07-Apr-2011 15:21:01'}

All fields have a corresponding set-command (e.g. setdata) to store it and a get-command (e.g. getdata) to retrieve it. In some cases not the exact fields are retrieved but some derived data. In the table more information is given.

> The fields of the mapping structure
mapping_file The name of the command (m-file) that executes the mapping.
mapping_type The type of the mapping: fixed, untrained, trained or combiner.
data A structure or a cell array with all information needed to execute the mapping.
labels Array with features used to annotate the features (featlab) of the output dataset.
size_in Input dimensionality. Number of features of the input dataset.
size_out Output dimensionality. Number of features of the output dataset.
scale A scalar or a vector to scale the output features.
out_conv Type of desired output conversion.
cost Classification costs in case the mapping defines a classifier.
name String with a name, just used for annotation of plots and other outputs.
user User defined field.
version PRTools version number and date of creating the mapping.

Examples

A very simple example of a mapping is the routine featsel which selects a pre-determined set of features. In the following example 10 objects in 5 dimensional space are generated. After that the feature 1, 2 and 5 are selected. The means before and after selection are computed and show to make clear what is going on.

    % generate 10 objects in 5D, mean is [1 2 3 4 5], small variances
    A = gauss(10,[1:5],0.01*eye(5));
    % show the rounded values of the mean of A
    disp(round(mean(A)));
    % select features 1, 2 and 5
    B = featsel(A,[1 2 5]);
    % show the rounded values of the mean of B
    disp(round(mean(B)));

This shows

     1     2     3     4     5
     1     2     5

as B contains just the features 1 2 5 of A.

An important property of the way mappings are implemented in PRTools is that the following statements are equivalent:

    B = featsel(A,[1 2 5]);
    B = A*featsel([],[1 2 5]);
    W = featsel([],[1 2 5]); B = A*W

which is realized by overloading the matrix multiplier * for mappings. It has to be read as a piping symbol: the dataset A is fed into the mapping procedure and replaces the placeholder [] as the first parameter. This only holds for a first parameter in a mapping routine in which instead of a dataset variable a [] is given.

The advantage of the construct in the last line of the above example is that a mapping, which is in fact a procedure, together with some chosen parameters, can be stored in a variable, there called W. This variable can be used as an input for other PRTools routines that operate on arbitrary mappings.

elements: datasets datafiles cells and doubles mappings classifiers mapping types.
operations: datasets datafiles cells and doubles mappings classifiers stacked parallel sequential dyadic.
user commands: datasets representation classifiers evaluation clustering examples support routines.
introductory examples: Introduction Scatterplots Datasets Datafiles Mappings Classifiers Evaluation Learning curves Feature curves Dimension reduction Combining classifiers Dissimilarities.
advanced examples.