A Matlab toolbox for pattern recognition Imported pages from 37Steps

Home   Guide   Software   Glossary   FAQ   Blog   About

<< Random Representations

Post - Wednesday, January 8th, 2014 a


The goal of representation is in pattern recognition to map objects into a domain in which they can be compared. Usually, this domain is a vector space. It might also be a graph or a symbolic sequence or any other modality that allows the computation of distances between the represented objects. The quality of a representation is determined by the performance of recognition schemes that can be applied on it.

Often, second levels of representation are considered to improve the original one. Subspaces of a vector space are used to obtain a lower dimensionality, e.g. by a principal component analysis (PCA), a Kohonen map or an auto-encoder neural network. If kernels or dissimilarities are used the dimensionality of the representation may also grow significantly. In this case the purpose is to introduce some non-linearity.

The number of possibilities to construct representations in this way is virtually unlimited. Optimization procedures may thereby become unfeasible.  To overcome this,  proposals have been made to introduce some randomness in making choices or in the setting of parameters. When and why will this be good?

Coverage versus dimensionality

The are two phenomena to be aware of in creating a random representation: the coverage of the input and the dimensionality of the resulting output representation. In taking a random representation there is the danger that not all information is preserved. Some aspects, groups of objects or subspaces might not be taken into account. The probability that this happens may be reduced by generating more random functions. This will result, however, into a large output dimensionality. Some classifiers may suffer from that, requiring some dimension reduction on top of the representation. A good alternative solution is to use a sparse classifier that automatically selects the appropriate random functions. In both cases the danger of overtraining has to be faced.

Some examples

Why and when does it work?

Random choices in building a representation may perform well if the original information is sampled well. It may thereby reduce the dimensionality as well as enlarge it if (non-)linear functions of the inputs are generated. In the generalization step that follows the representation the possibly redundant or insignificant elements of the representation can be removed if the size of the training set is sufficiently large. This gives a key to the choice of the random representation. It should generate a space with a dimensionality that this is an lower (e.g. a factor 10) than the size of the training set.

A great advantage of a random representation is that it is fast: no optimization. If sufficient time is available optimization may be better, but this is not guaranteed as it also includes adaptation to the data and will thereby overtrain more easily. So, optimized representation should result in lower dimensional spaces to avoid this.


  1. W.F. Schmidt, M.A. Kraaijveld, and R.P.W. Duin, Feed forward neural networks with random weights, Proc. 11th IAPR Int. Conf. on Pattern Recognition, Volume II, 1992, 1-4.
  2. Guang-Bin Huang, Dianhui Wang: Advances in Extreme Learning Machines (ELM2011). Neurocomputing, vol 102, 2013, 1-2
  3. Tin Kam Ho: The Random Subspace Method for Constructing Decision Forests. IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no 8, 1998, 832-844
  4. M. Skurichina and R.P.W. Duin, Bagging, Boosting and the Random Subspace Method for Linear Classifiers, Pattern Analysis and Applications, vol. 5, no. 2, 2002, 121-135.
  5. M. van Breukelen and R.P.W. Duin, Neural Network Initialization by Combined Classifiers, ICPR’98, Proc. 14th Int. Conference on Pattern Recognition, 1998, 215-218.
  6. E. Pekalska, R.P.W. Duin, and P. Paclik, Prototype selection for dissimilarity-based classification, Pattern Recognition, vol. 39, no. 2, 2006, 189-208.