This page is in preparation!!!

Here the main commands of the ClusterTools toolbox for cluster analysis are described. It is a Matlab toolbox and is built on top of PRTools, which should be in the path. It may be used for active learning as well.

ClusterTools is a Matlab toolbox for cluster analysis and the evaluation of clusters. It is built on PRTools, which is a toolbox for supervised learning. Most of the PRTools procedures assume the availability of class labels. ClusterTools tries to find such labels in an unsupervised way. The evaluation procedures are thereby focused on the comparison of the obtained cluster labels with given true labels.

The toolbox contains about 10 basic cluster procedures. All are feasible for datasets up to about 5000 objects, some can also handle millions of objects. Features sizes (dimensionalities) may be in the order of 10^4 – 10^5, with a few exceptions. The toolbox is thereby useful to supply base routines for evaluation of new procedures.

A significant feature of the toolbox is that all cluster procedures produce pointers to prototypes (objects in the supplied dataset) as cluster labels. Every cluster procedure can thereby also be used for active labeling. For instance, after obtaining 100 clusters in a dataset of 5000 objects the user may supply the true class labels of the 100 prototypes of the clusters. The toolbox can return class label estimates for the 4900 yet unlabeled objects.

This is a way of active learning. It should be understood, however, that no classifier is returned. Classification of objects is done on the basis of their cluster memberships. Consequently, new objects can not be classified in this way. If needed, PRTools may be used to train classifiers on the basis of the 100 prototypes, or by using the estimated labels of all objects in the clustered dataset.