How to generate a multi-dimensional banana set?
The ‘banana set’ is a geometrically defined 2-dimensional dataset. An interesting property is that it demands a non-linear classifier of a higher degree than two. It is possible to generate multi-dimensional versions by just horizontally concatenating 2-dimensional datasets, e.g.:
m = [100 100]; a = [gendatb(m) gendatb(m) gendatb(m)];
generates a 6-dimensional problem.
Two points should be realized. First, by horizontally concatenating random sets an independence is created between the constituting 2-dimensional problems. By a random rotation the alignment of this independence with the given axes is removed. The mapping cmapm(k,'randrot')
may be used to this end for a k-dimensional dataset.
A second point to realize is that with every additional concatenation of problems the class overlap decreases. This may be compensated by increasing the overlap of the individual ones. gendatb has an additional parameter that may be used for this.
Combining these two modifications yields:
m = [100 100]; % class sizes
k = 7; % desired dimensionality
s = 2; % standard deviation influencing class overlap
a = prdataset;
for n=1:ceil(k/2);
a = [a gendatb(m,s)];
end
a = a(:,1:k);a = a*cmapm(k,'randrot');