Post - Monday, November 26th, 2012 a
Scientists should build their own instruments. Or at least, be able to open, investigate and understand the tools they are using. If, however, the tools are provided as a black box there should be a manual or literature available that fully explains the ins and outs. In principle, scientists should be able to create their measurement devices from scratch, otherwise the progress in science has no foundations. An interesting paper on this topic was published in the Journal of the Learning Sciences.
This was what my teachers taught me when I was a student in the field of physical instrumentation. Around 1990 I decided to leave the design of parallel computers for image analysis and returned to my original PhD topic: statistical pattern recognition. Should I build my own computer, like my group did for image analysis? That is not needed as any computer is a Turing machine and could be exactly simulated by any other computer. Should I start to build my own basic software?
Some interesting packages were already around, e.g. LINPACK and later Numerical Recipes in Fortran. Sources were available and the numerical analysis they are based on had been very well described. However, somebody pointed me to Matlab, originally built on top of LINPACK. This was really “standing on the shoulders of giants”! In spite of the fact that the basic routines were black boxes and not accessible, I decided that this should become my platform. Easy, interactive programming and the availability of a fast growing set of basic routines in linear algebra, signal and image processing and later a beautiful neural network toolbox were very tempting.
In the beginning PRTools was just a collection of routines, without a well-defined style and specified interaction. Many were straightforwardly translated versions of older routines written in Fortran and even Algol60. I proved the well-known statement that physicists are able to write Fortran programs in any language ;). My preference for one-character variables is still based on this ability. Moreover, I still find it hard to resist the temptation to write recursive routines as it was the way to do in Algol60, but which is not to be preferred in Matlab.
Early users: Ludmille Kuncheva and Sarunas Raudys
Some master students contributed with interesting routines, like the decision tree classifier
treec designed after a study of Breiman’s CART system. It was originally a complete toolbox in itself, but has now been put together in a single m-file. My PhD students still preferred C. Matlab was for them a programming language for middle-aged scientists for whom the real work has become too difficult. As a result, the first colleagues in science using PRTools were from abroad: Sarunas Raudys and Ludmilla Kuncheva. In our cooperation their both critical and encouraging comments have been very helpful.
During a sabbatical leave in 1995 with the CVSSP group of Surrey University headed by Josef Kittler, PRTools was entirely rewritten and a first attempt was made to create a uniform style. All information of classifiers was packed in order to prevent frequent retyping of parameters. In the years thereafter Marina Skurichina assisted me to create the first lab courses. The availability of an extensive toolbox changed my teaching entirely. Pattern recognition was transformed from a theoretical course into an applied one. Marina became my first PhD student performing her experiments in Matlab. Her mathematical way of thinking and programming was a great help in defining PRTools in a structured manner. The way the combining classifiers have been implemented is for a large deal thanks to our cooperation.
The next Matlab version that came available to me brought object oriented programming. It was a natural, but also challenging step to use it for PRTools. Around 2000 a new setup was ready that included the object classes ‘dataset’ and ‘mapping’. This version, attracted new users like my own PhD students, but early users didn’t like it at all as a number of basic operations, e.g. the use of prior probabilities, was now entirely hidden.
The increased number of users boosted the package with, for instance, support vector machine based tools. They also offered to go through the entire package, created a uniform style for programming and help comments, increased other comments and unraveled unnecessarily complicated constructs. This was a great group effort in defining and designing a large software package. The fact that the team included professional computer scientists, mathematicians and physicists made it a rich experience.
Dick de Ridder, David Tax, Pavel Paclik, Elzbieta Pekalska and Piotr Juszczak
The PRTools redesign team consisted of David Tax, Pavel Paclik, Elzbieta Pekalska, Piotr Juszczak, Marina Skurichina and was headed by Dick de Ridder. Later Sergey Verzakov joined and contributed with many proposals for better implementations. PRTools4 was finished in 2004. The package could now be used by PhDs from other universities and by users in applied institutes and industry. It became the basic programming toolbox of a textbook to which we contributed on an invitation by Ferdi van der Heijden. Additional toolboxes, based on PRTools, were designed by David Tax for one-class classifiers (DD-Tools), Pavel Paclik and Sergey Verzakov for hyperspectral data analysis (HyperTools) and Elzbieta Pekalska for dissimilarity based pattern recognition (DisTools). Pavel Paclik founded his own company (perClass) inspired by these efforts. For a number of years he continued to support PRTools by facilitating a forum.
Two major steps made later, in cooperation with Pavel Paclik and David Tax are the design of a multi-labeling system and the creation of a third object class, the datafile. This child of the dataset makes it possible to integrate raw data, e.g. images or time signals of varying sizes, as well as their (pre-)processing, as files inside the PRTools setup of mappings and datasets. The addition of the datafile class was a complicated major upgrade which implementation is still not fully stable. It completes PRTools as a package for representation and generalization. See the manual for PRTools4.1.
State of the art
In 2010 PRTools was very successfully used in a classification competition organized during the ICPR 2010 in Istanbul. A combination of about 20 classifiers was used to classify a benchmark of 300 datasets. Progress in 2012 still continues. Recent contributions are an implementation of the random decision forest (
randomforestc, David Tax) and another, very good decision tree (
dtc, Sergey Verzakov).
PRTools has been used in many courses and PhD projects and received hundreds of citations. It is especially useful for researchers and engineers who need a complete package for prototyping recognition systems as it includes tools for representation. It offers most traditional and state-of-the-art off-the-shelf procedures for transformations and classification and evaluation. Thereby, it is well suited for comparative studies. It is the purpose of this website to present examples and background information that is also useful for researchers outside the domains of pattern recognition and machine learning.
Added in July 2013: PRTools5 was created in order to solve conflicts with Stats, Matlab’s statistical toolbox. It offers some Stats classifiers that can be called from PRTools.
Added in July 2019. PRTools is now distributed without p-code. It can thereby be used under Octave as well. For that reason a number of checks are included to avoid crashes when Mathworks built-in code is called. See also the update page for changes.
How does PRTools relate to the initial desire that scientists should build their own tools? It works for me, personally for a large deal, but not entirely. At some places routines for optimization, eigenvalue decomposition or regularization are used that I do not fully understand. They come with Matlab or make use of Matlab routines that are still black boxes for me. May be it would have been better for my own peace of mind if I had taken the effort to rewrite them myself. I doubt whether it would have improved the performance, though.
For other users than myself, and even for myself after a few months without using it, the available code has to be studied to understand fully what is going on. I realize that many students and colleagues take the code for granted. Sometimes, however, I receive emails from people who really went into the most dark corners of the sources. For 90% or perhaps even 99% of the users large parts of the code appears to be hardly accessible. This has to be regretted, for sure, but it would have been an enormous effort and certainly above my original skills to write it in a completely transparent way.
How bad is this? Is PRTools a black box? As long as it is used for engineering it is OK. A hammer can still be used if it is unknown how it is created. It may take some effort to study how to handle it and may be we find out that another hammer is more handy, but it is still responsible to use it without knowing everything about it. Such tools can also be applied in scientific studies as a comparison, as long as proper references are given.
The problem arises if somebody wants to claim that a particular proposal from literature, e.g. LDA, has specific properties and uses PRTools to show it. In that case the researcher should fully understand the tool he is using and may prefer to write it himself instead of using a tool design by somebody else.
In short, PRTools can be used for applications, for teaching, in comparative studies with proper references, but for scientific investigations into the properties of particular algorithms the researcher should program these algorithms himself.