How to handle memory and time available for computations?

Memory and time are limited resources. What is available depend on the computer facilities and the circumstances and temperament of the user. What is needed depends on the problem under study and the tools that are selected. There are some possibilities to compromise what is available with what is needed.

The memory

Direct accessible computer memory is usually limited to 2 – 8 Gbyte. Disc space is often almost unlimited, accessible by Matlab but not allocated for its variables. PRTools stores the main data in the PRTools dataset class (defined by prdataset). Routines may create copies or processed versions of the data by which their memory demand can be large. This may result in a Matlab “Out of Memory” error.or even in crashing of Matlab.

In order to prevent the above, some (but not all) PRTools routines process the data in batches. In order to do this properly a maximum batch size should be set. Routines that handle batches check the incoming datasets with this size, stored in a global variable called PRMEMORY. It can be freely set by the user by the prmemory command as long as it is significantly smaller than the available memory within Matlab (not known to PRTools). In case datasets appear with a size larger than PRMEMORY (which counts in variable elements and not in bytes) an error is generated. The user should enlarge the size by prmemory, or define his problem differently.

Some routines have inside a batch processing loop. There is also a general system inside the prmap routine that handles all calls like A*W, in which A is a dataset and W is the mapping that defines the processing. This system is initiated by W = setbatch(W), which sets the batch flag inside W. If this flag is set, objects in A are processed in batches. The batch size can be set by setbatch as well. Not for all mapping procedures batch processing is possible.

The computing time

There is no feasible way to change speed. Computers are expected to run at the highest speed possible. Many pattern recognition procedures, however, are based on optimization procedures that are stopped when a sufficient accuracy is reached. If the user just is interested to inspect the result before that point is reached, the optimization might be stopped prematurely. A valid (in the sense of useful for further processing), but less accurate result will be obtained.

There is a general time measurement routine called prtime that controls most, but not all, optimization procedures in PRTools, e.g. the neural network classifiers, some clustering routines, nonlinear mapping by mds and tsnem, the optimization of the Parzen kernel, the general system of parameter optimization controlled by regoptc, but also the number of weak classifiers generated by adaboostc and the number of trees generated by randomforestc. A level 2 warning (see prwarning) is generated each time a procedure is prematurely stopped in this way.

The default run time used by prtime is 10s. For some procedures this is sufficient, for others, depending on the data size, it might be far too low. The value is chosen to reach some comfortable user interaction. It may be changed (and should be changed for serious research) by a call to prtime.