ct_lcurves Learning curves of active learning for MNIST8
PRTools and ClusterTools should be in the path
A review of all Clustertools examples.
Download the m-file from here.
Some learning curves are computed for active learning using the cluster prototypes for labeling the corresponding cluster. The fast modeseeking procedure MODECLUSTF is used for clustering. Three modifications are shown
Contents
Prepare environment and read data
prtime(10) % restrict iterative optimisation to 10s delfigs % delete existing figures randreset; % takes care of reproducability prwarning(2) % show warnings a = mnist8; % 8x8 version of the MNIST dataset
Find multilevel clusterings
- fast modeseeking
- nested
- interpolated
labm = a*modeclustf(6,[],false); % avoid nesting labn = labm*reclustn; % with nesting labk = labsort(labm,'upred',100)*reclustk([2 5 10 20 50 100]);
Compute learning curves
- Nested: the original multilevel clustering is replaced by a nested version: higher resolution clusters are always entirely assigned to a single lower resolution cluster.
- Interpolated: the original multilevel clustering is replaced by a set of clusterings with a user defined number of clusters [2 5 10 20 50 100].
- LabelProp: Prototype labels and class confidences of lower resolution clusterings are propagated to higher resolution clusterings. After that the classes are assigned according to the clusters in the highest level.
em = clustlcurve(labm,a);
en = clustlcurve(labn,a);
ek = clustlcurve(labk,a);
ec = clusteval(labm,a,'comb');
Show results
figure; plote({em,en,ek,ec},'nolegend'); legend('ModeSeek','Nested','Interpolated','LabelProp') title(['Active learning curve ' getname(a)])
Comments
- Computing times for the four curves are about 30, 1, 10 and 125 seconds
- The Nested, Interpolated and LabelProp results are based on the ModeSeek clustering result only. The data is not reused.
- Results of nesting and label propagation are almost identical. However, label propagation is very slow and nesting fast.
- Interpolation is based on 2, 5, 10, 20 50 and 100 clusters.
- Interpolation does not improve, as expected, but nesting and label propagation do.