||Generalization by dissimilarities
Post - Monday, January 7th, 2013 a
Dissimilarities have the advantage over features that they potentially consider the entire objects and thereby may avoid class overlap. Dissimilarities have the advantage over pixels that they potentially consider the objects as connected totalities, where pixels tear them apart in thousands of pieces. Consequently, the use of dissimilarities may result in better classification performances and may require smaller training sets. But how should this be realized? How to generalize from dissimilarities?
The nearest neighbor rule
Traditionally researchers used the nearest neighbor rule once they have collected a training set of objects to show the performance of their approach. The computation of dissimilarities however is time consuming if the appropriate measures are used needed for the above advantages. For large training sets this may become prohibitive and the dissimilarities to all training objects have to be computed for every new object to be classified.
A straight forward use of the nearest neighbor rule can hardly be considered as a generalization. It requires the storage and usage of all objects without any attempt to learn from them. It does not involve an analysis of the training set itself. How can such an analysis improve the performance and/or the computational effort needed to classify new objects?
An obvious choice to reduce the computational complexity of classification is to reduce the training set in such a way that the expected classification error is not increased. Many techniques are available. Most of them assume that the objects are represented in a vector space or that a set of Euclidean distances between them are available. The performance of these approaches usually remains about equal after reducing the training set. Two questions arise for further research.
First it is of interest what approaches have to be followed in case the given dissimilarities are non-euclidean or even asymmetric. In many applications such measures arise and are preferred over euclidean ones. A second point is whether it is not possible to use the objects that removed from the training set are really fully redundant and do not contain anymore useful, additional information. Another type of classifier then the nearest neughbor rule would be needed to avoid that the computational complexity again significantly increases. Moreover, as it appeared to be possible to remove these objects without an increasing error for the nearest neighbor rule, they are not useful for that rule.
What is lacking in the above is the flexibility of a vector space. Just dissimilarities do not give us much possibilities to apply other classifiers. Statistical pattern recognition, machine learning and multi-variate statistics offer many alternatives, but then the dissimilarities have to be converted into a vector space. Two ways to do this are embedding and the postulation of an euclidean dissimilarity space. The first tries to respect as good as possible the dissimilarity characteristic of the measured values. The second way rather uses these values as a type of features.
Embedding is computational intensive during training. Moreover, the inclusion (‘projection’) of new objects needed for testing is not well defined for non-euclidean dissimilarities. The construction of the dissimilarity space as well as the inclusion of new objects in that space are very straight forward and do not require any optimization. As classification results for the dissimilarity space are usually not worse than for embedding, that approach may be preferred in spite of the fact that it does not use the dissimilarity origin of the data.
As classifiers in an embedded space as well as in the dissimilarity space need for classifying new objects the dissimilarities to all training objects, a reduction of the computational complexity is hardly possible after representation. So the order of approaches should be: first prototype selection, then the construction of a vector space (this is now of a reduced dimensionality) and finally the computation of a classifier. For the embedded space the problem of the inclusion of new objects needs more research.
Before the classifier is computed use can be made of the discarded objects in the prototype selection procedure. They may be included in the vector spaces after their construction. Thereby classification procedures may be improved as they will profit from the increased training set. As a result an approach is defined for which the computational complexity is controlled by the size of the set of selected prototypes and for which the classification performance can be improved by using larger training sets.