FAQ Data preparation - Pattern Recognition Tools

How to prepare my data for PRTools?

This very common question is very difficult to answer in general. It may refer to images, videos, time signals, etcetera. Potential users should answer for themselves the following questions:

what are typical objects to be recognized? Is each of them already given as a single piece of data, e.g. an image, or are the objects of interest somewhere hidden in the data, e.g. a bicycle somewhere in the images. If the latter is the case they should be isolated (segmented) as single data items before PRTools can be used.
how should objects be represented? By features? By dissimilarities with other objects? Do you already have clear procedures to measure them?

If this is clear one of the following should be created:

a single file in which every line corresponds to an object and every item on the line to a feature. Objects should be represented by the same features. Now follow the FAQ on the construction of a PRTools dataset.
a single file in which every line corresponds to an object and every item on the line to a dissimilarity to another object. Objects should be represented by dissimilarities to the same objects. Now follow the FAQ on the construction of a PRTools dataset.
a set of directories (one per class) with files that each contain a single object (e.g. an image or a time signal). Now create a PRTools datafile (see the datafile section of the manual) and use it for further preprocessing and feature extraction. Finally convert it to a PRTools dataset.

It is important to realize that PRTools may be of help to construct recognition procedures if a number of objects per class is available (preferably at least tens, better to have hundreds) and is some specific knowledge exists on how to represent them. See the introductory posts of the blog of September 2012 and October 2012.