ct_kmeans Centroid-based clustering

PRTools and ClusterTools should be in the path.

Goto ClusterTools examples for a review of all examples.

Download the m-file from here.

Contents

Prepare environment

prtime(10)                   % restrict iterative optimisation to 10s
delfigs                      % delete existing figures
randreset      ;             % takes care of reproducability
prwarning(2)                 % show warnings

Define a dataset

we use some standard routines to create 8 two-dimensional clusters. They are labeld to compare cluster results with the desired labeling.A

m = 1000;
a = gendatclust2(m);
scattern(a); axis equal
title(getname(a));

Run two clustering procedures after removing the label information.

x = +a; % remove labels
K = [2 3 5 8 12 20 30 50 100]; % desired cluster sizes
procs = {'KMeans','KCentres'};
err_I = cell(1,2);
err_II= cell(1,2);
figure;
for i=1:2
  subplot(2,2,i);
  lab = x*clustk(K,procs{i});  % execute KMeans or KCentres
  scatn(lab(:,4),x,procs{i});  % scatterplot
  markcols(1);                 % show distinguishable colors colors
  axis equal;
  err_I{i}  = clusteval(lab,a,'actl');
  err_II{i} = clusteval(lab,a,'roc');
end

subplot(2,2,3); h1 = plote(err_I,10,'legend',procs);
subplot(2,2,4); h2 = plote(err_II,10,'legend',procs);

Comments

KMeans clustering looks somewhat better in the scatterplots. The bottom left curves show that in comparison with the desired labels (figure 1) this is supported by active labeling by which entire clusters are assigned to the label of their mediod or centre. (A cluster mean has no label, so the label of the nearest object, the cluster mediod, is used).

The bottom right figure shows the trade-off between the two types of errors as a function of the number of clusters. Error I is the fraction of object pairs that has been erroneously assigned to the same cluster. Error II is the fraction of object pairs that has been erroneously assigned to different clusters.