In supervised classification, learning vector quantization (LVQ) methods are commonly used due to their intuitive structure based on prototypical instances that reduce considerably the computations in the classification process. Several… Click to show full abstract
In supervised classification, learning vector quantization (LVQ) methods are commonly used due to their intuitive structure based on prototypical instances that reduce considerably the computations in the classification process. Several improvements of LVQ have been proposed based on heuristics including LVQ3, and GLVQ. All these methods use the Euclidean distance to evaluate the similarity between prototypes and objects, which may be inappropriate if features are not equally scaled. Metric adaption techniques try to alleviate this problem by learning discriminative distance measures from the training data. Generalized relevance learning vector quantization is one of such improvements. However, in big data problems LVQ algorithms require incremental learning mechanisms. This paper introduces an LVQ-algorithm based on granular computing for prototye-based classification equipped with incremental learning mechanisms. The proposed algorithm is able to group entities with similar features, and at the same time proposes new prototypes to better cover the class distribution with prototyping elements. Two steps for the automatic control of prototypes are proposed: the first one controls the number of prototypes by a usage-frequency indicator; whereas the second one, is designed to learn the relevance of data dimensions, producing an automatic pruning of useless dimensions, avoiding a high computational load and increasing the interpretability of the resulting model. The proposed method is evaluated in benchmark data and obtains competitive performance with state-of-the-art solutions. In the case of big data sets, we obtained the best accuracy rate of about 72 % with a good compression rate of around 94 %.
               
Click one of the above tabs to view related content.