Here the main properties and commands of the ClusterTools toolbox for cluster analysis are described. It is a Matlab toolbox and is built on top of PRTools, which should be in the path. It may be used for active learning as well.

Significant properties:

  • It contains routines for the traditional algorithms like KMeans, KCentres and various hierarchical clustering schemes, as well as more advanced algorithms like MeanShift, KNN-modeseeking and Exemplar.
  • Most algorithms run on feature representations as well as on dissimilarity matrices.
  • As a standardly multilevel clusterings are returned. These are sets of clusterings between a small set of clusters (2-10) up to thousands of clusters.
  • Clusters are represented by a prototype; results are given by pointers to the cluster prototype
  • Evaluation is always done by various comparisons with given sets of labels.
  • The toolbox contains routines for active learning and semi-supervised learning.
  • Some routines, like KNN-modeseeking, may run on millions of objects given by hundreds of features.

Documentation

As ClusterTools is based on PRTools, users should be aware of their relation as well as of the global settings made by PRTools.

A summary of the user commands to find clusters in unlabeled data and to evaluate them by labeled data:

cluster routines, classification, reclustering, evaluation, support