PRTools examples: Scatterplots
The main purpose of these examples are to show the possibilities of the scatter plot.
- PRTools 2D datasets
- Sample size
- Multiple classifiers
- High dimensional feature spaces
PRTools 2D datasets
PRTools offers a series of 2D problems. A one-liner to generate a dataset, produce a scatter plot and show a classifier is:
Copy and paste this line to the Matlab window and repeat it a number of times (use the ‘up’ arrow of the keyboard). The classifier,
parzenc, estimates an optimal density kernel and uses the Bayes rule to compute the classifier. The plot command
plotc needs a corresponding scatter plot for the right axes.
Repeat the above for the following 2D examples:
gendatb, the banana set
gendats, a simple problem, two identical uncorrelated Gaussian distributions
gendatd, the difficult classes, two identical, correlated Gaussian distributions
gendath, the Highleyman classes, two different Gaussian distributions
gendatc, the circular classes
gendatl, the Lithuanian classes
gendatm, a multi-class problem
Why does the classifier perform so bad for the spirals problem?
We repeat the above experiment for one case but with different sample sizes. First, again:
Repeat it a few times and observe the variability. Here 50 samples per class are used. Now for 1000 samples per class:
Note that it takes longer to produce the scatter plot and much longer to compute the classifier. The computing time for training as well as for executing the Parzen classifier is highly sensitive for the size of the training set. Note in addition that even for 1000 samples per class in this 2D problem the classifier is not stable!
Scatter plots can show a set of classifiers, e.g.:
W1 = A*fisherc;
W2 = A*qdc;
W3 = A*knnc;
W4 = A*dtc;
One liners are sometimes nice:
Note that classifier names are shown in the legend of the plot. They may be changed by the user, e.g.:
W1 = A*
W2 = A*
; W2 = setname(W2,'qdc');
W3 = A*
W4 = A*
; W4 = setname(W4,'dtc');
Why are in these experiments the classifiers
Some classifiers like
are based on estimated densities. In addition there are some specific routines to estimate densities, see the page in the user guide. Let us take a single class:
A = gendatgauss(100,[0 0]); % generate a 2D Gaussian distribution
W = A*gaussm; % estimate the density from the data
(A); % scatter plot of the data
plotm(W); % plot the estimated density
plotm is similar to
. It draws density curves instead of a separation boundary. There is always the corresponding scatter plot needed for a proper result. The routine gaussm always estimate a Gaussian density, also when it is not appropriate. In the next examples the class densities related to the classifiers qdc and parzenc are shown.
delfigs % PRTools routine for deleting all figures
); title('Gaussian densities')
);; title('Parzen densities')
% PRTools routine for showing all figures
A special option of
is that it can show the class regions filled by colors:
Usually some white, unassigned areas are shown. This is the result of of a low resolution of the plotting grid. PRTools uses a low resolution to speed up plotting. The resolution can be increased by the command
gridsize. By default it is 30. Increasing too 100 is often sufficient:
If this is not sufficient, increase the gridsize further to 300.
High dimensional feature spaces
Scatter plots are good to get some feeling about the distribution of datasets and the properties of classifiers. However, most pattern recognition are multidimensional and cannot be shown in a 2D space. Projections of this space can be shown but may give the wrong impression. Here are two examples. We use the Iris dataset. It can be loaded in PRTools by the command
prdatasets. This facilitates the loading of a set of standard real world problems from the PRTools website.
A = iris
This shows a 2D PCA space. Another option, for not so high-dimensional spaces, is to inspect, some, and in this case all combinations of 2 features:
(A(:,[1 2])); title('1 2');
(A(:,[1 3])); title('1 3');
(A(:,[1 4])); title('1 4');
(A(:,[2 3])); title('2 3');
(A(:,[2 4])); title('2 4');
(A(:,[3 4])); title('3 4');
Note that PRTools shows the feature names (called feature labels in PRTools terminology) along the axis (may be it is needed to enlarge a figure to see it).
A much more convenient way to inspect 2D subspaces is by the routine scatterdui. It has clickable axes which makes it possible to interactively select subspaces.
Of course, PCA spaces can be inspected as well;
Play around with arrows adjusting the selected dimensions.
An important point to realize is that multi-dimensional classifiers cannot be shown in a 2D plot. See the faq about this.
cells and doubles
operations: datasets datafiles cells and doubles mappings classifiers stacked parallel sequential dyadic.
user commands: datasets representation classifiers evaluation clustering examples support routines.
introductory examples: Introduction Scatterplots Datasets Datafiles Mappings Classifiers Evaluation Learning curves Feature curves Dimension reduction Combining classifiers Dissimilarities.