Evaluation in ClusterTools is based on datasets with given true labels. Routines, meant for unlabeled data, are evaluated by applying them to labeled data. The motivation is that in labeled datasets classes usually are defined by human judgment. By evaluating cluster procedures on the basis of such datasets it is thereby established whether the defined classes correspond to the cluster structure. In cases only a weak correspondence can be found other information may have been used in defining the classes.
Traditionally in the evaluation of cluster procedures the estimation of the the correct number of clusters is a significant issue. In the multilevel clustering approach followed here this is not the case. At some level the ‘correct’ number of clusters may be passed. The levels with more as well as with less clusters should not conflict this. In some of the below procedures this is measured.
In evaluating clustering results by given true labels the performance of active learning procedures may be a worthwhile quantity. The dependency of specific supervised classifiers, however, should be avoided to obtain an unbiased judgment of cluster algorithms.
The below algorithms have been developed with the above in mind.
||Summarizes and plots various cluster evaluation measures based on comparisons with true labels.|
||Computes the active learning error of a single multilevel clustering.|
||Plots the learning curve for active learning for a single multilevel clustering.|
||Computes the active learning error for combining levels in a single multilevel clustering.|
||Plots the learning curve for active learning for combining levels in a single multilevel clustering.|
||Computes and plots learning curves for active learning by cluster assignments to the classes of the cluster prototypes arising from a set of cluster procedures.|