Here the main properties and commands of the ClusterTools toolbox for cluster analysis are described. It is a Matlab toolbox and is built on top of PRTools, which should be in the path. It may be used for active learning as well.

Significant properties:

  • ClusterTools contains routines for the traditional algorithms like KMeans, KCentres and various hierarchical clustering schemes, as well as more advanced algorithms like MeanShift, KNN-modeseeking and Exemplar.
  • Most algorithms run on feature representations as well as on dissimilarity matrices.
  • As a standard multilevel clusterings are returned. These are sets of clusterings between a small set of clusters (2-10) up to thousands of clusters.
  • Clusters are represented by a prototype; results are given by pointers to the cluster prototype
  • Evaluation is always done by various comparisons with given sets of labels.
  • The toolbox contains routines for active learning and semi-supervised learning.
  • Some routines, like KNN-modeseeking, may run on millions of objects given by hundreds of features.
  • All routines have a preclustering option using the KNN-modeseeking, Thereby they can be applied to any dataset of the size KNN-modeseeking can handle.

Documentation

As ClusterTools is based on PRTools, users should be aware of their relation as well as of the global settings made by PRTools.

A summary of the user commands to find clusters in unlabeled data and to evaluate them by labeled data:

cluster routines, classification, reclustering, evaluation, support

A set of examples and results