A Matlab toolbox for pattern recognition Imported pages from 37Steps

Home   Guide   Software   Glossary   FAQ   Blog   About

<< Non-metric dissimilarities are all around

Post - Monday, March 4th, 2013 a


A big advantage of the representation of objects by a dissimilarity space over the use of kernels is that it has no problems with the usage of non-Euclidean dissimilarity measures. More specifically, it can handle non-metric measures as well. Here, we will show common examples that such dissimilarities arise easily, both, in daily life as well as in science.

Lack of information

The largest glacier in Europe is the Jostedalbreen in central Norway. Today it has the size of almost 500 square km and ends in about 50 glacier outlets. The road around it has a length of several hundreds km. During the “little ice age” in the 18th and 19th century the Norwegian glaciers grew significantly. As a result, people living at different sides of the glacier were unable to cross it. For instance, the farmers in Vetledalen (V) got separated from their relatives in Jostedalen (J). They had to use the 200 km long road around the glacier which would cost them about a week.

When the ice started to melt again the Jostedalbreen became easier and easier accessible. So at some moment youngsters from both sides discovered that it became possible to reach the same point X. So they found that:D(V,J) = 5 days, but D(V,X) = 6 hours and D(X,J) = 6 hours. This is a non-metric situation as it clearly violates the triangle inequality:

D(V,J) \le D(V,X) + D(X,J)

Thereby, the path over the glacier became to be preferred by some over the long road around it. Farmer boys from Vestledalen now crossed the glacier at Saturday night (in summer there is sufficient light) to visit the church (and new girl friends :-)) on Sunday in Jostedalen and were able to return on time on Monday morning for their work on the fields in Vetledalen.

The violation of the triangle inequality is in this case caused by the lack of information. It is missing as long as the distances are just given pairwise. It can be solved as soon as the three pairs are known simultaneously.

Objects have a size

Here is an example in which it is not easy to solve the violation of the triangle inequality. One and the same person may feel close to two friends. Both of them may share these feelings for the same lady separately. We assume in this example that these relations are symmetric.

For the two friends, however, being close to one and the same lady, does not imply at all that there is also a good relation between them. On the contrary, the closer each of them is to the lady, the larger may become the distance between them. Eventually it may even end up in a fight!

One may think that the violation of the triangle inequality is caused in this case by the inherent inconsistency of the human soul and will not happen for well defined distances. But look at this:

On the left there is a non-metric example we discussed before. There are three objects, a book, a mug and a table. The book and the mug are both touching the table. In spite of the fact that these two objects both have a zero distance to one and the same table, their mutual distance is not zero. For a big table it could even be very large. The “touch distance measure” used here is the same as the single-linkage distance sometimes used in cluster analysis: the distance between two clusters is as large as the smallest distance between any two members of these clusters. Distances to different clusters are computed from different objects like the distance between the table and the book is computed from a different point on the table than the distance between the table and the mug.

Fisher criterion

Another example is the Fisher criterion, illustrated on the left. It is used to measure how suitable a single variable is for separating two classes A and B. The distance between the class means is related to the average class variance in the following way:

J_F = \frac{|\mu_A-mu_B|^2}{\sigma_A^2+\sigma_B^2}

This criterion is suitable for class distributions with about the same variance. In a situation as the picture on the right, however, this is not appropriate. If the criterion value is interpreted as a distance, then the distance between A and C is zero. The distances between A and B, and between C and B, however, are not equal. For metric distances this should be the case. This violation is here caused by the fact that the distance between A and C is zero but that the corresponding distributions are not equal.

Physical example

The final example is based on a physical phenomenon. The speed of sound is in the earth about a factor of 15 larger than in the air. Consequently, an acoustic signal that travels from a speaker S through the air, via the earth E and finally  to a microphone M is faster than over the direct connection through the air. In terms of distances, we have: D(S,M) > D(S,E) + D(E,M). This violates the triangle inequality.


Distances between objects will be non-metric (and consequently non-Euclidean) if the objects are not vectors or points in a vector space, but have a size and a shape. There is an inner world that is captured by this shape. In addition there is also an outer world in which the objects are related. These worlds are different, which causes the phenomenon to occur. To phrase it more poetically: object distances tend to become non-metric when the objects have an inner life.