Pack or product classification is a quite common task in market research, particularly for sales tracking audits and related services. Electronic data sources have led to increased volumes, both in… Click to show full abstract
Pack or product classification is a quite common task in market research, particularly for sales tracking audits and related services. Electronic data sources have led to increased volumes, both in the sales volume being tracked and also the number of packs (or stock keeping units). The increase in packs needing to be classified presents a problem, in that, it needs to be done accurately and quickly. Traditional solutions using people for the classifications can be costly, due to the large number of people required to process the classifications in a timely and accurate manner. Reducing the manual work is a priority for audit-based market research businesses, leading to interest in automation, such as through machine learning techniques. In this article, we apply such methods. These include support vector machine, decision tree, XGBoost, AdaBoost, random forest, and neural network–based methods that are trained on the textual descriptions of already classified packs. We also implement a hierarchical classification method to take advantage of the structure of classes of the products. Once the models are trained, they can be used on unclassified data. Where the methods are not confident in their classifications, humans can be asked to classify. The hope is that the methods can learn to classify accurately enough that the manual workloads are reduced to manageable levels. This article reviews various methods and then outlines tests using these methods on two datasets collected by Nielsen, showing good performance.
               
Click one of the above tabs to view related content.