PRTools A Matlab toolbox for pattern recognition Imported pages from 37Steps

### PRTools examples: Feature Curves

Feature curves evaluate classifiers as a function of the number of features, i.e. the dimensionality. It is assumed that the reader is familiar with the introductory sections of the user guide:

The main tool we will use is `clevalf`, which is a modification of `cleval`, the routine used for studying learning curves. It computes by cross-validation the classification error for a single size of the training set, but for multiple sizes of the feature set. The given ranking of the feature set is used. Like for `cleval`, the result may be averaged over a set of runs and several classifiers can be studied simultaneously. Here is a simple example:

`delfigs`
`A = sonar;ĀĀ % 60 dimensional dataset`
`% compute feature curve for the original feature ranking`
`E = clevalf(A,{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);`
`figure; plote(E); title('Original'); axis([1 60 0 0.5])`
`% Compute feature curve for a randomized feature ranking`
`R = randperm(60);`
`E = clevalf(A(:,R),{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);`
`figure; plote(E); title('Random'); axis([1 60 0 0.5])`
`% Compute feature curve for an optimized feature ranking`
`W = A*featself('maha-m',60);`
`E = clevalf(A*W,{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);`
`figure; plote(E); title('Optimized (Maha)'); axis([1 60 0 0.5])`
`showfigs`

In this experiment the entire dataset A is used for computing the feature ranking according to featself. More correctly one has to use just the training set for this. However, a call like:

`U = featself('',60);E = clevalf(A,U*{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);`

will first take the desired number of features from `A` and then use this reduced set for training the feature selection and the classifier. `A` solution is offered by `clevalfs`, which is an extension of `clevalf`:

`U = featself('',60);E = clevalfs(A,U,{nmc,ldc,qdc,knnc},[1:5 7 10 15 20 30 45 60],0.7,25);`

which first splits `A` in sets for training and testing, then trains `U` and finally computes feature curves for the specified classifiers.

### Exercise

1. Why show the feature curves a rather noisy behavior, in spite of averaging over 25 repetitions?
2. Why isnt it always true that the more features yield a better result?
3. Why are the results for using all the 60 features not exactly the same over the 3 experiments?
4. Extend the set of experiments with a 4th one in which the forward feature selection is based on the nearest neighbor criterion ‘NN’, see `feateval`.

There is a post about the well-known example by Trunk showing the peaking phenomenon in feature curves.