Pattern recognition studies and exploits the road from observations to knowledge. In technical systems these are encoded in sensory data at the start of the road and pattern classes and their relations at the end. The road itself is constituted by a set of transformations that gradually generalize the observations into knowledge.
The basic elements of PRTools are data, that travels along the road, and mappings, which are the transformations that constitute the road. On this page the elements are further defined. Other pages describe the basic operations between data and mappings (say, how traffic can use the road) and the large set of user commands, each offering the transport between two stops or just paving the way.
All operations in PRTools are between a few basic elements. These are variables of a special type. They are structures that contain next to the data various types of additional information, annotating the data. Operations between these variables using the standard Matlab operators (
+ ,-, *, ...) are overloaded in such a way that the additional information is taken into account and updated where needed.
Data in PRTools almost always refers to a set of objects (real world items or events) represented by observations. So any PRTools data item constitutes of at least one and usually a set of objects. The can be stored in four different variable types: datasets, datafiles, matrices of doubles and cell arrays.
- A dataset is a PRTools defined programming class. It is constituted by a collection of vectors of the same size, representing objects to be recognized or to be used for training. It is stored in memory and contains class labels, sizes, names for classes and features, a creation date and additional information stored by operations or by the user. PRTools routines know how to handle this information, so user have just occasionally to bother about handling this information, e.g. during the creation of a dataset.
A dataset may contain images as objects as well as features. Images are thereby unfolded to a vector and their original size is stored in the dataset for a proper handling. Other frequently used object representations are histograms, (multi-way) spectra, spectrograms and time signals.
- A datafile is a pre-stage of a dataset. It refers to directories in the file system containing a separate file for every object, e.g. an image. In addition pre-processing of this raw data can be stored as a set of commands to be executed later. Together these commands should be sufficient to create a proper dataset, either from the entire datafile or from a user defined subset.
- The use of datasets cause an overhead by carrying around the structure with the additinal information. In case there is no such information the handling of arrays of doubles will be faster. So most PRTools are able to handle such arrays next to datasets, provided that they are organized as matrices in which the rows refer to vectors representing a single object.
- The standard Matlab way to organize data of different sizes is by cell arrays. For that reason many routines that are able to handle datasets and arrays of doubles can also handle cells. Usually they return cells as well.
The basic idea of a mapping is a transformation between two datasets. As a dataset stores vectors of constant size a mapping can be interpreted as a transformation between two vector spaces. A PRTools variable of the type mapping stores the name of a routine that can execute the mapping and all further information that is needed, like parameter values and names of classes and features needed to construct the proper output dataset. There are several mapping types:
- The fixed mapping is fully user defined. Parameters are not automatically optimized by data, but specified on the basis of some prior knowledge by the user.
- A trainable mapping uses some optimization procedure based on a training set of objects to find a good set of parameters. Trainable mappings present themselves in two stages: untrained (in which the training rule parameters can be specified) and trained.
Classifiers are a special type of trainable mappings. Their output space is a set of distances, confidences or class posteriors by which the best class can be selected: the class label.
- Combiners. Mappings can be combined in three ways: stacked (in the same vectors), parallel (in different vector spaces), or sequential (the output space of one mapping is the input of the next).
- Generators. They generate from one dataset a new one, by interpolation or sampling.
cells and doubles
operations: datasets datafiles cells and doubles mappings classifiers stacked parallel sequential dyadic.
user commands: datasets representation classifiers evaluation clustering examples support routines.
introductory examples: Introduction Scatterplots Datasets Datafiles Mappings Classifiers Evaluation Learning curves Feature curves Dimension reduction Combining classifiers Dissimilarities.