Prototype Methods

Here is a quotation from one of my papers which explains what the prototype methods are.

  1. "The prototype methods. The prototype selection methods constitute a family within a broader area, known as reference vector selection algorithms (Garcia, 2012; Jankowski and Grochowski, 2004; Grochowski and Jankowski, 2004). If n is the number of training samples, a reference vector selection method can be defined as a process of finding the smallest optimal set S (out of 2n − 1 subsets) of cases representing the same population as the original training set T and leading to the correct classification of the unseen cases. Quite often, a statistically indistinguishable difference in the results of classification between the prototype selection methods and the ones, in which the original unpruned training sets are taken, is obtained. Sometimes, usually for noisy data, prototype instance selection algorithms may lead to statistically significant improvement of generalization ability. The improvement often depends on the size of the datasets under study. For very large datasets, which consist of hundreds of thousands of samples, the standard deviation of classification accuracy is often lower than one percent. In such cases, there is a higher chance for a prototype method to arrive at a statistically significant improvement of generalization over the one attained on the unpruned data. One obviously has to have a sufficiently fast prototype method at hand in order to cope with such large data. Prototype generation (Triguero et al., 2011) extends the prototype selection framework by letting the samples change their positions. The best known example of the prototype generation algorithm is the LVQ system (Hyninen, 1996; Kohonen, 2001). A certain number of techniques sharing the same strategies can be identified in the family of reference vector selection algorithms.

    1. Noise filters. Noise filters or editing rules constitute a category of reference vector selection algorithms that is based on rejecting noisy cases or outliers from a training set T . These techniques are employed as the first data preprocessing step. The rate of data pruning is usually low. After performing this step other methods come into play. ENN, RENN (Wilson, 1972), All k-NN (Tomek, 1976) and ENRBF (Jankowski, 2000) are the key examples of the algorithms belonging to this group.

    2. Data condensation algorithms. Data condensation algorithms or data pruning (compression) techniques aim at attaining the highest possible training data reduction with minimal sacrifice of generalization. The aim of the algorithms belonging to this group is to find and to remove the training vectors that have a small influence on learning and thus their presence negatively affects classifier time requirements and memory consumption. This is usually accomplished by discarding these instances that are located far from decision borders. CNN (Hart, 1968), RNN (Gates, 1972), GA, RNGE (Bhattacharya et al., 1981), ICF (Brighton and Mellish, 2002) and Drop 1-5 (Wilson and Martinez, 1997-2000) are the main systems that can be included into this group.

    3. Prototype selection methods This group of reference vector selection algorithms is aimed at finding an extremely low number of samples that carry particularly large amount of information and which are capable of representing a large number of training cases. The difference between data condensation algorithms and prototype selection methods is, definitely, a very subtle one. In our opinion, the prototype selection methods push the reduction of the training set to the extreme, thus allowing for slightly larger degradation of generalization ability than it is the case for the data condensation algorithms. It should not be surprising that some of the algorithms, particularly these that permit controlling the number of samples to be retained, may be assigned either to data condensation or to the prototype selection group of reference vector selection methods. MC1 and RMHC (Skalak, 1994), IB3 (Aha, 1991), ELH, ELGrow and Explore (Cameron-Jones, 1995) and our own methods PM-M (Grudzinski, 2004) and EkP (Grudzinski, 2010) belong to the prototype selection group of methods".