The dataset placeholder []
in mappings
PRTools mappings are functions that specify how a dataset (objects represented in a vector space) should be transformed from its given space to a new one. They come in various mapping types. The most important ones are the fixed mappings, in which the transformation is defined by a programmed procedure, and the trainable mappings. The latter ones can be in the state of untrained or trained. An untrained mapping learns (optimizes) the transformation from an input dataset and returns a trained mapping.
An example of a fixed mapping is sigm
, which executes a sigmoid transformation on the input features of dataset A
, returning a dataset B
:
B =
sigm
(A,S)
in which S
is a scaling parameter. An example of a trainable mapping is pcam
, performing a principal component analysis. It optimizes the desired N
components from an input dataset A
and returns a trained mapping in W
, which may be used for mapping new datasets:
W =
pcam
(A,N)
Mappings like these are often combined with other mappings for preprocessing, feature reduction, classification and evaluation. For that reason it is useful that they also can be called like
The dataset A
can now be replaced by a set of operations resulting in a dataset. Moreover, the calls to sigm
and pcam
can be stored in variables:
and supplied to other procedures like cleval
and crossval
that are programmed in a general way to handle arbitrary sets of transformations and classifications.
It appears that the above facility to use dataset placeholders in calls to fixed and untrained mappings are frequently used. Some experimentalists even write their codes almost exclusively in this way as it gives a large flexibility in the definition of experiments. So the placeholder []
is always there. To make their life slightly more comfortable and their code somewhat better readable the need to include the placeholder has been removed for most of the user callable mappings in PRTools version 5.1. It is thereby possible to write
which was already possible for some routines like featsel
right from the start. This holds for about 130 mappings, independent of their number of input parameters. An error is generated when it appeared difficult to program the mapping for this facility. Users should take care not to confuse PRTools by using the default value of the first parameter by a []
in calling mappings with more than a single parameter after the dataset. In that case it is unclear whether the []
holds for the dataset or for the parameter. So the rule is: if this shifting of parameters is used, the first one should be given a value.