Evaluation

This page belongs to the User Guide of the DisTools Matlab package. It describes some of its commands. Links to other pages are listed above. More information can be found in the pages of the PRTools User Guide. Links are given at the bottom of this page.

Here some examples are given illustrating how for the three ways a dissimilarity representation can be handled, classifiers can be computed and tested.

 [DT,DS] = genddat(D,0.5) W = knndc(DT); E_DM= DS*w*testc Split dissimilarity matrix, ds contains dissimilarities to trainset Compute kNN classifier (optimize k) on given dissimilarities. Classification error on testset [DT,DS]= gendat(D,0.5); W = knnc(DT); E_DS= DS*W*testc Split dataset in disspace (based on all data!) in trainset and testset Compute classifier in disspace on trainset Classification error on testset [DT,DS]= genddat(D,0.5); W = knnc(DT); E_DS= DS*W*testc Split dataset in disspace (based on trainset) in trainset and testset Compute classifier in disspace on trainset Classification error on testset X = D*pe_em(D); [XT,XS] = gendat(X,0.5); W = XT*pe_knnc; E_PE = XS*W*testc Compute PE space from all data Split in trainset and testset Compute classifier in PE space on trainset Classification error on testset [DT,DS]= genddat(D,0.5); V = pe_em(DT); XT = DT*V; XS = DS*V; W = XT*pe_knnc; E_PE= XS*W*testc Split dataset in disspace (based on trainset) in trainset and testset Find PE space Map trainset on PE space Map testset on PE space Compute classifier in PE space on trainset Classification error on testset

In the second and the fourth example the spaces are defined on all data as the gendat function just selects objects in a given space. In the construction of that space the labels are not used. The test results are thereby still fair, but give an estimate of the performance when test sets of a similar size are used and classifiers are recomputed. This is also called transductive learning: the classifiers are adapted to the test data.

In the third and the fifth example just the training sets are used for building the spaces as the genddat function uses by default a representation set (columns) that is equal to the trainset.  Thereby classifiers can be computed that are valid for all future data. For some small, severely non-Eucledian datasets the projection of the test data on a PE space that has been computed on just the train data yields bad, e.g. complex, results. Therefor many studies are based on transductive learning.

PRTools contains special routines for crossvalidation, learning curves and feature curves: crossval, cleval and clevalf. Sometimes special versions, crossvald and clevald, are needed for dissimilarity data in order to use genddat instead of gendat for splitting datasets. In the transductive approaches this is not needed. Here are some examples.

 % crossval(D,nmc,10,5) crossval(D*pe_em(D),pe_nmc,10,5) Transductive learning 10-fold crossvalidation of nmc (5 times) in disspace. Embedding in PE space and run a PE classifier. % crossvald (D,nmc,10.20,5) crossvald (D,pe_em*pe_nmc,10,[],5) Use of crossvald builds spaces from the trainset only. Disspaces with at random 20 training objects for representation. PE spaces recomputed for for every new trainset. E = cleval(D,nmc,[],5); plote(e) nmc learning curve in disspace, repset is all data. 5repeats. Plot learning curves of test and apparent error E = clevald(D,{nmc,ldc,qdc}, ... [],1,5); plote(E,'noapperror') Compare learning curves of 3 classifiers in disspace with a repset of 1 training object per class. E = clevald(D,{knndc,knnc}, ... [],[],5); plote(E,'noapperror') Learning curves of knndc (on dismat) and knnc (in disspace), repset is trainset, plot without apparent error curve. E = clevald(D,{pe_em*pe_knnc, ... knnc,knndc},[],[],25); plote(E,'noapperror') Compare learning curves of kNN in PE space, disspace and dismat. The representation set equals the training set. E = clevalf(D,ldc,[1:50:216], ... 0.5,10); plote(E,'noapperror') Feature curve in disspace using all objects for representation and 50% of the objects for training. E = clevalf(D*pe_em(D),ldc, ... [1:50:215],0.5,10); plote(E,'noapperror') Feature curve in PE space using all objects for representation and 50% of the objects for training.