PRTools offers more than 300 Matlab routines for building pattern recognition systems. They can be used for preprocessing raw data, representation of objects in vector spaces, classification and evaluation.
The power of PRTools is based on the carefully designed operations between variables of three specific programming classes:
- dataset. A dataset is defined as a set of objects represented by vectors of the same size. In addition to these vectors additional information is stored on the individual objects (e.g. class labels), features (e.g. domains and names), classes (e.g. priors) and the entire dataset (e.g. the original size of the images object vectors refer to).
- datafile. It defines the way a specific set of raw objects (e.g. images of different size) stored on disk or in cell arrays can be transformed into a dataset.
- mapping. A mapping stores the names of the routines that define the transformation of objects from one space to another. In addition, it stores the parameter values and various other types of information like space dimensions and class names.
Various procedures can be used for defining datafiles, transforming datafiles into datasets, changing representations (e.g. dimension reduction), classification or evalution. Here is a selection.
- preprocessing. Matlab itself or various public domain Matlab toolboxes offer an enormous set of routines for processing images, signals in general, strings, graphs, etcetera. PRTools contains general purpose routines that can apply arbitrary routines to all files stored in sets of disk directories in an automatic way. Some important ones are made available explicitly as a PRTools routine, e.g. using the DIP-Image package for morphological operations. Other examples are the extraction of multiple patches from images and the selection of blobs for their use as individual objects and the resizing of images to make their dimensions equal.
- feature extraction. Patch statistics, blob dimensions, various moments (e.g. Zernike and Hu), histograms, 1-D and 2-D spectra, Harris points.
- feature spaces. Scaling, feature selection (individual, forward, backward, floating, branch and bound), PCA, LDA, Fisher mapping, Chernoff mapping, Mahalanobis distances, various proximity mappings and kernels, multi-dimensional scaling.
- density estimation. Various Gaussian models, mixture of Gaussians, Parzen and nearest neighbor density estimation.
- classifiers. 1-NN, k-NN, Parzen, various Gaussian models, nearest mean, logistic, Fisher, SVM, adaboost, decision trees, random forest, perceptron, feed-forward neural network, radial basis neural network, dissimilarity space classifier. PRTools offers an interface to libsvm for people who downloaded this package.
- combining classifiers. Fixed and trained rules, bagging, boosting, random subspaces.
- evaluation. Crossvalidation, learning curves, confusion matrices, reject options, ROC curves.
In addition there are numerous routines for data generation, basic clustering and regression. Some specific properties of PRTools are:
- automatic crossvalidation based optimization of classifier parameters like the number of neighbors (k-NN), the smoothing parameter (Parzen), the regularizaton parameters (Gaussian classifiers, SVM).
- soft labels and multiple labeling systems.
- sequential, parallel and stacked combining of classifiers.
- simple training and testing of complex classifier systems. For example, an untrained combiner of three classifiers defined for three different subspaces can be defined, trained and evaluated by:
% definition untrained classifer U = [pcam(,5)*fisherc fisherm(,5)*loglc knnc]*maxc; %training by a previously defined dataset TRAINSET W = TRAINSET*U; %testing by a previouly defined dataset TESTSET TESTSET*W*testc
There are hundreds of references in the scientific literature to PRTools by researchers who used it for their experiments and analyses. The most recent version as well as information on related packages can be found here. For more see the documentation page.