PRTools, elements, operations, user commands, introductory examples, advanced examples
Mappings
On this page the mapping is introduced. It is one of the key elements of PRTools.
Mapping background
Mappings pave the way from object observations to the recognition of the pattern class they belong to. They define the normalization of the raw measurements, the extraction of the initial features, the reduction to a small, relevant feature set, the estimation of densities of classes or models, the transformation to the output space of a classifier, which can be posteriors, confidences or distances, and finally the selection of the most likely class. In every step a mapping procedure defines how input data is transformed into output data. In programming terms:
output_data = mapping_procedure(input_data,parameters)
In PRTools input_data
and output_data
are usually defined a dataset or a datafile, but occasionally they may be a cell array or an array of doubles (in which rows represent the objects), or even a set of scalars or a string. There are many mapping procedures available, e.g. procedures for handling mappings, image handling, image operations, image features, feature selection, fixed mappings, trainable mappings, density estimation..
Mappings can be of the following types: , fixed_cell, , , ,
- Fixed mappings: they are fully user defined by their parameters.
- Trainable mappings: in this case the parameters defining the mapping are optimized for a given dataset. Consequently they can be in an untrained mode as well as in a trained mode.
- Combiners, these are mappings that accept as an input other mappings and combine them.
- Generators, mappings that generate data by sampling or interpolating a given dataset.
Classifiers are a special kind of trainable mapping as they map feature data (input_data
) on class labels (output_data
) or on class confidences. There is a large set of classifiers available: linear and quadratic, SVMs, neural networks, various (e.g. decision trees and nearest neighbor rules) and combining classifiers.
Mapping definition
Mappings are typically defined inside functions that compute or apply a mapping, e.g. inside classifiers or routines for feature reduction. This section is thereby just given as a background information for the interested reader. There is no need to study this for users that use PRTools from the command line or write simple experimental scripts on the basis of the available routines.
The mapping
constructor looks like
W = prmapping(command,type,data,labels,size_in,size_out)
command
is a string with the name of the mapping (and thereby also the name of the m-file) that should be called to execute commands likeB = A*W
, in whichA
is usually a dataset, and occasionally another mapping.type
is the mapping type,'fixed'
,'untrained'
,'trained', 'combiner'
or'generator'
. See the description op mapping types.data
is a structure or a cell array of all information that the specific procedure (name) expects and needs to perform the mapping.labels
is a set of strings or numbers that will be used to annotate the resulting datasetB
with feature labels. The typical use is thatlabels
stores the class names ofA
(its lablist field) in case of a classifier.size_in
is the dimension of the input space (number of features ofA
).size_out
is the dimension of the output space (number of features ofB
).
The defined mapping W
will usually be used for mapping a dataset A
into another space, resulting into a dataset B
:
B = A*W
Consequently, if the A
has a size of [m,size_in]
and W
has a size of [size_in,size_out]
, then the mapping B
will have a size of [m,size_out]
. Thereby B = A*W
is consistent with a Matlab
expression in which B
, A
and W
are matrices of the corresponding sizes.
PRTools uses the values of size_in
and size_out
mainly for error detection and returning appropriate error messages. In many cases the values may be replaced by 0, e.g. when the sizes are unknown at the moment of defining the mapping. This will not effect a correct execution, but may result in badly understandable error messages in case a dataset A
of a wrong size is used for input.
It he W
defines an untrained mapping a dataset A
can be used for training W
, resulting into a trained mapping V
:
V = A*W
In this case the size of V
is irregularly [size_in,k]
in which k
is determined by the training procedure. In case of a trainable classifier the size of V
is [size_in,c]
with c
the number of classes.
Mapping overload
Mappings define transformations of vector spaces of possibly different dimensionality. Their size [size_in,size_out] corresponds to these dimensionalities. Consequently they behave somewhat similar as matrices of this size. Linear (affine) mappings are almost identical to such matrices but includes a shift operation. In addition they may carry several other types of information as explained in the mapping definition.
Structure
A full list of all information stored in a mapping can be found by converting a mapping into a structure.
a = iris; w = pcam(a,2); struct(w) % mapping_file: 'affine' % mapping_type: 'trained' % data: [1x1 struct] % labels: [2x1 double] % size_in: 4 % size_out: 2 % scale: 1 % out_conv: 0 % cost: [] % name: 'PCA to 2D' % user: [] % version: {[1x1 struct] '07-Apr-2011 15:21:01'}
All fields have a corresponding set-command (e.g. setdata
) to store it and a get-command (e.g. getdata
) to retrieve it. In some cases not the exact fields are retrieved but some derived data. In the table more information is given.
> The fields of the mapping structure | |
mapping_file |
The name of the command (m-file) that executes the mapping. |
mapping_type |
The type of the mapping: fixed, untrained, trained or combiner. |
data |
A structure or a cell array with all information needed to execute the mapping. |
labels |
Array with features used to annotate the features (featlab) of the output dataset. |
size_in |
Input dimensionality. Number of features of the input dataset. |
size_out |
Output dimensionality. Number of features of the output dataset. |
scale |
A scalar or a vector to scale the output features. |
out_conv |
Type of desired output conversion. |
cost |
Classification costs in case the mapping defines a classifier. |
name |
String with a name, just used for annotation of plots and other outputs. |
user |
User defined field. |
version |
PRTools version number and date of creating the mapping. |
Examples
A very simple example of a mapping is the routine featsel which selects a pre-determined set of features. In the following example 10 objects in 5 dimensional space are generated. After that the feature 1, 2 and 5 are selected. The means before and after selection are computed and show to make clear what is going on.
% generate 10 objects in 5D, mean is [1 2 3 4 5], small variances A = gauss(10,[1:5],0.01*eye(5)); % show the rounded values of the mean of A disp(round(mean(A))); % select features 1, 2 and 5 B = featsel(A,[1 2 5]); % show the rounded values of the mean of B disp(round(mean(B)));
This shows
1 2 3 4 5 1 2 5
as B
contains just the features 1 2 5
of A
.
An important property of the way mappings are implemented in PRTools is that the following statements are equivalent:
B = featsel(A,[1 2 5]); B = A*featsel([],[1 2 5]); W = featsel([],[1 2 5]); B = A*W
which is realized by overloading the matrix multiplier *
for mappings. It has to be read as a piping symbol: the dataset A
is fed into the mapping procedure and replaces the placeholder []
as the first parameter. This only holds for a first parameter in a mapping routine in which instead of a dataset variable a [] is given.
The advantage of the construct in the last line of the above example is that a mapping, which is in fact a procedure, together with some chosen parameters, can be stored in a variable, there called W
. This variable can be used as an input for other PRTools routines that operate on arbitrary mappings.
elements:
datasets
datafiles
cells and doubles
mappings
classifiers
mapping types.
operations:
datasets
datafiles
cells and doubles
mappings
classifiers
stacked
parallel
sequential
dyadic.
user commands:
datasets
representation
classifiers
evaluation
clustering
examples
support routines.
introductory examples:
Introduction
Scatterplots
Datasets
Datafiles
Mappings
Classifiers
Evaluation
Learning curves
Feature curves
Dimension reduction
Combining classifiers
Dissimilarities.
advanced examples.