Dimension reduction is used and sometimes necessary to decrease the computational complexity and to increase the classification performance. Dimension reduction itself, however, may be time consuming. We will present some examples. It is assumed that the reader is familiar with the introductory sections of the user guide:
Dimension reduction can be realized by feature selection or by extraction. Extraction is traditionally based on finding linear subspaces. Non-linear procedures may be based on neural networks, kernels or dissimilarities.
There is usual a criterion and a strategy to optimize this criterion.
In case of feature selection one distinguishes filters and wrappers. The criteria for filters are different from the classifier to be used in the final space. The wrapper approach simply minimizes the final classification error over the family of considered spaces. To avoid that the wrapper approach overtrains, cross-validation may be applied. The filter approach may be faster. As they are more simple than classifiers there is a smaller danger of overtraining.
Finally criteria may be supervised, i.e. depending on class labels, or unsupervised.
In PRTools terms, procedures for dimension reduction are trainable mappings. Here are some examples for selection and extraction.
There are routines for different strategies like individual, forward, backward and floating selection. They are based on criteria computed by feateval. Some are supervised, others are unsupervised. The criteria can be computed on the training set, by cross-validation or on a fixed test set. This is important is the classification error in a wrapper approach is used as a criterion. Experiments are usually very time consuming, so we will restrict to very simple examples based on finding 3 features for the Malaysia dataset, which has in total 8 features and 291 objects in 20 classes.
A = malaysia;
A = setprior(A,getprior(A)); % avoid heaps of warnings[T,S] = gendat(A,0.5); % generate train / test set
W1 = A*
featseli
([],'NN',4); W1 =setname
(W1,'ISel NN');W2 = A*
featseli
([],'maha-s',4); W2 =setname
(W2,'ISel
maha-s');W3 = A*
featseli
([],'maha-m',4); W3 =setname
(W3,'ISel
maha-m');W4 = A*
featseli
([],'in-in',4); W4 =setname
(W4,'ISel in-in
');W5 = A*
featseli
([],ldc,4,5); W5 =setname
(W5,'ISel
wrapper');
disp([+W1;+W2;+W3;+W4;+W5]) % show selected featuresV = T*({W1,W2,W3,W4,W5}*ldc); % train all selectors and classifiers
S*V*
testc
% show test result
Note that the criteria maha-s
and in-in
behave very similar. Moreover, they match good with the classifier ldc
as all are based on maximizing the between scatter while minimizing the within scatter. We will use this criterion to compare several strategies.
U1 = featseli([],'maha-s',4);
U1
= setname(U1
,'ISel maha-s');U2 = featself([],'maha-s',4);
U2
=setname
(U2
,'FSel maha-s');U3 = featselb([],'maha-s',4);
U3
=setname
(U3
,'BSel maha-s');
U4 = featselo([],'maha-s',4);
U4
=setname
(U4
,'OSel maha-s');
W = T*{U1,U2,U3,U4}; % use train set for feature selection
disp([+W{1};+W{2};+W{3};+W{4}]) % show selected featursV = T*(W*
ldc
); % use train set also for classifier trainingS*V*testc % show test results
These results are based on a single split of the data. A full 8-fold cross-validation can be executed by:
prcrossval(A,{U1,U2,U3,U4}*
ldc
,8,'DPS')
ldc
).By feature extraction from existing vector representations (non)linear functions are found as a basis for a new representation. Three simple linear feature extractors offered by PRTools are:
pcam
) based on the eigenvectors found for the entire dataset. This approach is unsupervised. However, if applied to a labeled dataset class priors are used to weight the object contributions to the overall covariance matrix. If class priors are equal to class frequencies objects have equal weights.fisherm
), also known as LDA (Linear Discriminant Analysis). It finds the linear subspace that maximizes the class between scatter w.r.t. the class within scatter.klm
), it computes the eigenvectors of the averaged class covariance matrices.In case of images the axes representing a linear subspace point into a direction of the original feature space based on the image pixels. These directions consitute also images. They are the so-called eigen-images.
delfigs
prdatafiles; % make sure that prdatafiles in in the path
a = faces; % the ORL database of faces.
a = prdataset(a) % convert datafile into a dataset
w1 = pcam(a,10);
figure; show(w1); % show first 10 eigenfaces
w2 = fisherm(a,10);
figure; show(w2); % show first 10 fisherfaces
w3 = klm(a,10);
figure; show(w3); % show first 10 KL-faces
showfigs
ldc
as well as the qdc
classifier in 5-dimensional spaces of the three feature representations.ldc
.ldc
.elements: datasets datafiles cells and doubles mappings classifiers mapping types.
operations:datasets datafiles cells and doubles mappings classifiers stacked parallel sequential dyadic.
user commands:datasets representation classifiers evaluation clustering examples support routines.
introductory examples:IntroductionScatterplotsDatasets Datafiles Mappings Classifiers Evaluation Learning curves Feature curves Dimension reductionCombining classifiers Dissimilarities.
advanced examples.