page under construction

PRDisData is a Matlab toolbox that offers a collection of dissimilarity matrices. Some of them are generated, others are taken from the internet. A dissimilarity matrix is a square matrix of dissimilarities between a set of objects. These are not necessarily proper distances: the may be non-Euclidean (not embeddable in an Euclidean space), even non-metric (violations of the triangle inequality), asymmetric and sometimes diagonal terms (dissimilarities between objects and themselves) are non-zero. Usually all values are non-negative.

Dissimilarities have the property that the larger the values the more different the objects. This makes them primarily useful for nearest neighbor rules: the nearest neighbor of an object is expected to be the most similar one in the dataset. The rule (procedure) according to which the dissimilarities are computed is often unknown. It is designed by some expert and applied by him to the available objects in such a way that the above property holds.

In some cases similarities instead of dissimilarities have been published. The routines of PRDisData download the dataset and convert them by default to dissimilarities. Below some more information is given about the available datasets and links to the help files of the corresponding routine. It is assumed that the DisTools toolbox for handling dissimilarities is available as well. Also PRTools, in which DisTools has been built, should be in the path.

Some useful links:

The datasets

  • Catcortex: the connection strength between 65 cortical areas of a cat, cite{Graepel2}.
  • Chickenpieces: a dataset consisting of 44  dissimilarity matrices derived from a weighted edit distance between blobs cite{Bunke93}, cite{SpillmannReport}. Table ref{tab:properties} lists the averages of the characteristics over these datasets.
  • CoilDelftDiff, CoilDelftSame, CoilYork: three datasets based on the same sets of SIFT points in COIL images cite{XiaoHancock} compared by different graph distances.
  • Delftgestures: dynamic time warping distances between a set of gestures measured in a sign language study cite{Lichtenauer_Sign}.
  • FlowCyto: histogram based dissimilarities between flow cytometry measurements cite{Rodijnen}, including an automatic calibration correction.
  • Newsgroups: a small part of the so-called 20Newsgroups data, as considered by Roweis cite{Newsgroups} using non-metric correlation measure.
  • Pedestrians: dissimilarities between detected objects in street images used in a pedestrain detection study cite{Celik}. They are based on cloud distances between sets of feature points derived from single images.
  • Pendigits: a random subset of one of the Pendigits datasets computed by Bunke and Spillman cite{Spillmann06}. It is based on a weighted edit distance measure.
  • PolyDisH57: an artificial dataset based on the Hausdorff distances between two sets of polygons (pentagons and heptagons).
  • PolyDisM57: like PolyDisH57 but using a modified Hausdorff distance.
  • Prodom: a distance measure between protein sequences cite{Roth02}. Protein measures protein sequence differences using an evolutionary distance measure cite{Graepel1}.
  • WoodyPlants is a subset of the shape dissimilarities between leaves of woody plants cite{Jacobs}. We used classes with more than 50 objects.
  • Zongker dataset has been computed by deformable template matching between handwritten digits cite{Jain2}.

References

  • Bunke, H., B¨uhler, U.: Applications of approximate string matching to 2D shape recognition. Pattern recognition 26(12), 1797–1812 (1993)
  • Celik, H., Hanjalic, A., Hendriks, E.A.: Towards unsupervised learning for automatic multi-class object detection in surveillance videos. In: ICASSP. pp. 3521–3524. IEEE (2009), http://dx.doi.org/10.1109/ICASSP.2009.4960385
  • Dubuisson, M., Jain, A.: Modified Hausdorff distance for object matching. In: Int. Conference on Pattern Recognition. vol. 1, pp. 566–568 (1994)
  • Duin, R., Pekalska, E., Loog, M.: Non-euclidean dissimilarities: Causes, embedding and informativeness. In: Pelillo, M. (ed.) Similarity-Based Pattern Analysis and Advances in Computer Vision and Pattern Recognition, Springer London (2013), http://dx.doi.org/10.1007/978-1-4471-5628-4_2
  • Graepel, T., Herbrich, R., Bollmann-Sdorra, P., Obermayer, K.: Classification on pairwise proximity data. In: Advances in Neural Information System Processing 11. pp. 438–444 (1999)
  • Graepel, T., Herbrich, R., Sch¨olkopf, B., Smola, A., Bartlett, P., M¨uller, K.R., Obermayer, K., Williamson, R.: Classification on proximity data with LPmachines. In: ICANN. pp. 304–309 (1999)
  • Jacobs, D., Weinshall, D., Gdalyahu, Y.: Classification with Non-Metric Distances: Image Retrieval and Class Representation. IEEE TPAMI 22(6), 583–600 (2000)
  • Jain, A.K., Zongker, D.E.: Representation and recognition of handwritten digits using deformable templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(12), 1386–1391 (1997)
  • Lichtenauer, J.F., Hendriks, E.A., Reinders, M.J.T.: Sign language recognition by combining statistical DTW and independent classification. IEEE Trans. Pattern Analysis and Machine Intelligence 30(11), 2040–2046 (Nov 2008)
  • P,ekalska, E., Duin, R.: The Dissimilarity Representation for Pattern Recognition. Foundations and Applications. World Scientific, Singapore (2005)
  • van Rodijnen, N.M., Pieters, M., Hoop, S., Nap, M.: Data-driven compensation for flow cytometry of solid tissues. Advances in Bioinformatics 2011 (2011)
  • Roth, V., Laub, J., Buhmann, J.M., M¨uller, K.R.: Going metric: Denoising pairwise data. In: Becker, S., Thrun, S., Obermayer, K. (eds.) NIPS. pp. 817–824. MIT Press (2002), http://books.nips.cc/papers/files/nips15/AA43.pdf
  • Roweiss, S.: 20 newsgroups, http://www.cs.nyu.edu/ roweis/data.html
  • Spillmann, B., Neuhaus, M., Bunke, H., Pekalska, E., Duin, R.P.W.: Transforming strings to vector spaces using prototype selection. In: SSPR/SPR. LNCS, vol. 4109, pp. 287–296. Springer (2006)
  • Spillmann, B.: Description of the distance matrices. Tech. rep. (2004), http://www.iam.unibe.ch/fki/databases/string-edit-distance-matrices/dmdocu.pdf
Print Friendly, PDF & Email