Cluster routines
ClusterTools contains two groups of commands for clustering sets of objects: feature based and dissimilarity based routines. These as well as some backend routines and some PRTools cluster routines are listed below. It is assumed that readers are familiar with the basics of ClusterTools. There is a separate page discussing the relation between PRTools and ClusterTools.
Other pages introducing user commands of ClusterTools: classification, reclustering, evaluation, support
Feature based cluster routines
The below routines all operate on a dataset A
of M
objects represented in a L
dimensional vector space. This can be an M*L
double array or a PRTools dataset. They all result in a M*N
multilevel clustering LABC
. These are N
clusterings of the M
objects. The sizes of these clusterings (desired number of clusters) can be set by a vector K
. Commands are thereby similar to:
LABC = cluste(A,K);
LABC = A*cluste(K);
cluste 
Exemplar clustering by passing messages between objects, Frey and Dueck.  
clustf 
Clustering by the Farthest First Traversal algorithm  
clustw 
Clustering by the Worst First Traversal algorithm  
clusth 
Hierarchical agglomerative clustering. Repeatedly the two nearest clusters are merged. Various types are possible:


clustk 
EM based clustering, various types are possible:


clustkh 
Hierarchical clustering on top of kMeans by clustk in order to make clusth for large datasets feasible. 

clustm 
KNN mode seeking clustering  
clusts 
Mean shift, kernel based mode seeking clustering  
clustr 
Random clustering 
In a 2D example the use and some differences between these procedures will be shown and discussed.
Distance based cluster routines
Cluster analysis often depends on the set of all distances between a given set of objects. Such distances may often be based on a nonEuclidean distance measure, resulting, in PRTools terminology, in a dissimilarity matrix. The below routines all operate on a square dissimilarity matrix D
of size M*M
, if M
is the number of objects. They all result in a M*N
multilevel clustering LABC
. These are N
clusterings of the M
objects. The sizes of these clusterings are always determined by a second parameter K
. As Matlab matrices of size M*M
should fit in memory, dissimilarity matrices of about M = 5000
objects is about the limit.
LABC = cluste(A,K);
LABC = A*cluste(K);
These distance based routines have a unique relation with one of the above feature based routines by computing the Euclidean distances matrix of the set of objects. The reverse is not true for the mean shift procedure (clusts
), nor for kmeans (an option of clustk
) as they necessarily need the feature space. Moreover, the exemplar clustering (dcluste
) as well as hierarchical agglomerative clustering (dclusth
) are in fact defined on given (dis)similarity matrices. Thereby they suffer from the dataset size restrictions, as mentioned above.
dcluste 
Exemplar clustering by passing messages between objects, Frey and Dueck.  
dclustf 
Clustering by the Farthest First Traversal algorithm  
dclusth 
Hierarchical agglomerative clustering. Repeatedly the two nearest clusters are merged. Various types are possible:


dclustk 
EM based clustering, two types are possible:


dclustm 
KNN mode seeking clustering  
dclustr 
Random clustering 
In a 2D example the use and some differences between these procedures will be shown and discussed.
Backend routines
The above routines offer a standard user call. Some of them are based on other, more general routines in which more specific parameters can be set. These on some PRTools cluster routines are listed in the below table.
emclust 
A PRTools clustering routine the uses an arbitrary untrained classifier in an EM algorithm.  
hclust 
The PRTools routine for hierarchical clustering. It is called by dclusth , which is also the basis for clusth , reclusth and reclustk . 

kcentres 
The PRTools version of the KCentres procedure. It is based on a different implementation than clustk and dclustk . 

meanshift 
This is the ClusterTools backend routine for clusts and It offers some additional parameters. 

modeclust 
This is the full, most recent version of the ClusterTools KNN mode seeking procedure. It offers some additional options, a.o. the possibility of a user defined distance measure. The routine is called by modeclustf and thereby by clustm for small datasets. 

modeclustf 
The fast version of modeclust. It approximates the results of that routine depending of a complexity parameter. 

modeseek 
The original PRTools version of KNN mode seeking clustering. It is not used by ClusterTools. It operates on dissimilarity matrices only. The code is very simple and makes thereby clear how the algorithm works.  
prkmeans 
The PRTools KMeans algorithm. It is based on a different implementation than clustk and not used by ClusterTools. 
Other pages introducing user commands of ClusterTools: classification, reclustering, evaluation, support