Photo by Lisa Travers

Features are defined such that they focus on isolated aspects of objects. They may neglect other relevant aspects, leading to class overlap. Pixels in images describe everything but pixel-based vector spaces tear the objects apart because their structure is not consciously encoded in the representation. Structural descriptions are rich and describe the structure well, yet they do not construct a vector space. As a result, we lack a proper representation for learning from a set of examples.

Is there a way out, or are we trapped?

Pairwise dissimilarities may cover all the significant differences between objects. Every detail may be expressed as a contribution to the dissimilarity. However, experts often construct a dissimilarity measure (for their domain of expertise) which cannot be interpreted as arising from an Euclidean space. Objects may thereby be represented in a vector space, but it has to be equipped with a non-Euclidean distance measure.

What if we interpret the dissimilarities D_{i,j} = d (o_i,o_j) between an object o_i and all other objects o_j, just as features representing o_i? This means that the object o_i is represented by a dissimilarity-based vector [d(o_i,o_1), d(o_i,o_2),...,d(o_i,o_n)]^T for which we choose to represent it in an Euclidean space. Now, a feature is an object-dependent measurable property that tells us something about its class membership.

Such dissimilarity-based vectors can be used as properties both if these objects belong to the same or another class in the same classification problem. What we need is a suitable measure defined on the sensor inputs for two objects that is zero if these sensor inputs are identical, and thereby, hopefully the objects as well. Moreover, every difference in the inputs should contribute to a positive increase of the measure.

Such dissimilarity-based features make sense because they are able to discriminate between the classes, at least in a weak way if the measure is poor. If a good dissimilarity measure is chosen, then the dissimilarities between the objets o_i and o_j should be small if they share class membership and large if they don’t. The quality of the measure directly relates to the discriminative power of the dissimilarity-based feature.

Let us interpret dissimilarities as features!  We first compute suitable dissimilarities and use the results to construct a vector space and may equip it with an Euclidean norm.  In doing so, we forget they are dissimilarities.

Assume n objects, O = {o_1,o_2,...,o_n} and an n \times n dissimilarity matrix D:=D(O,O).

D_{i,j} = d(o_i,o_j) is the dissimilarity between the sensor inputs for objects o_i and o_j.

D(O,o_k) := [d(o_1,o_k),\ldots,d(o_n,o_k)]^T is a feature defined by pairwise dissimilarities to o_k.

x_i:=D(o_i,O) = [d(o_i,o_1), \ldots,d(o_i,o_n)]^T is a dissimilarity-based representation for o_i.

X := D, defines an n-dimensional vector space in which the dimension k is defined by D(O,o_k) and the vector x_i represents o_i.

In this way we postulate the dissimilarity space X. More formally, a dissimilarity representation D is addressed as a data-dependent mapping D(*, O) : Y \to R^n from an initial representation Y (or a collection of objects Y:=O) to the so-called dissimilarity space. Each dimension describes a dissimilarity to a given object from O. We endow it with the traditional inner product and the Euclidean metric. Since dissimilarities are nonnegative, all data are mapped as vectors to a nonnegative orthotope of a dissimilarity space.

In this space classifiers can be constructed as in any other feature space. This space is however different in comparison with traditional feature spaces. Features such as color and shape will be expressed by unrelated measures and may thereby have incomparable ranges. This causes a scaling problem: how to scale different features such that they contribute their discriminative information in a similar way? Dissimilarities, on the other hand, are in the same range if the objects to which we compare cover the entire domain of objects.

A second difference with the feature representation is that, initially, a dissimilarity space contains as many objects as it has dimensions. For some classifiers this is bad and it will result in overtraining. But there is no need to take them all! It is to be expected that dissimilarities to similar objects are similar. So, for large data sets with many similar objects, there will be high correlations. The representation set, the set of objects to which we compare, can thereby be a subset of the available set of objects. Even a random subset may do. It has been shown that a careful selection based on some optimization criterion is only needed to obtain a good, small representation set.

The dissimilarity space seems to be an attractive new representation. As dissimilarities may be defined in a variety of ways, including comparisons between structural object descriptions such as graphs and strings, the dissimilarity space may constitute a bridge between structural and statistical approaches to pattern recognition. This path seems to go into a promising direction, but as we lack the map, we have to step into the darkness.

Personal history on the dissimilarity representation
Non-Euclidean and non-metric dissimilarities

Filed under: HistoryRepresentation