modeclustf

MODECLUSTF

Fast KNN mode-clustering, based on overlaping cells

[LAB,NNLAB,NDIST] = MODECLUSTF(A,C,K,NEST)

Input
A Dataset of M objects.
C Integer, complexity parameter 1 <= C <= M, default C = 2.
K Vector with neighborhood sizes of interest, default is a geometric series between 1 and M.
NEST Logical, if TRUE the output set of clusterings (columns of LAB) will be made nested by RECLUSTN. Default: TRUE.

Output
LAB Indices of mode samples, size [M,N] with N the number of clusterings (NUMEL(K)).
NNLAB [M,1} vector with indices of nearest neighbors.
NDIST Total number of distance calculations.

Description

This is a fast version of MODECLUST, useful and essential for very large dataset (more than a million objects). It makes for every object a rough estimate of the potential set of nearest neighbors (which should be larger than max(K)). This set is larger for larger values of C, resulting in a slower, but more accurate procedure. In many practical problems it appeared that C = 6 was sufficient.

The computational complexity of this algorithm (number of distances that are actually computed) is M x SQRT(M). However, for small M it is not faster than MODECLUST. Therefor, it will jump to that routine for small values of M as well as for C == 0.

Reference(s)

R.P.W. Duin and S. Verzakov, Fast kNN mode seeking clustering applied to active learning, arXiv:1712.07454, 2017, 1-23.

Fast KNN mode-clustering, based on overlaping cells

Description

Reference(s)

See also