The effectiveness of any Machine Learning process depends on the accuracy of annotated data that is used to train a learner. However, manual annotation is expensive. Hence, researchers adopt a… Click to show full abstract
The effectiveness of any Machine Learning process depends on the accuracy of annotated data that is used to train a learner. However, manual annotation is expensive. Hence, researchers adopt a semi-supervised approach called active learning that aims to achieve state-of-the-art performance using minimal number of samples. Although it boosts classifier performance, the underlying query strategies are unable to eliminate redundancy in selected samples. Redundant samples lead to increased cost and sub-optimal performance of learner. Inspired by this challenge, the study proposes a new representation-based query strategy that selects highly informative and representative subsets of samples for manual annotation. Data comprises messages of a set of customers sent to a service provider. Series of experiments are conducted to analyze the effectiveness of the proposed query strategy, called “Entropy-based Min Max Similarity” (E-MMSIM), in the context of topic classification for churn prediction. The foundation of E-MMSIM is an algorithm that is popularly used to sequence proteins in protein databases. The algorithm is modified and utilized to select the most representative and informative samples. The performance is evaluated using F1-score, AUC and accuracy. It is observed that “E-MMSIM” outperforms popular query strategies, and improves performance of topic classifiers for each of the 4 topics of churn prediction. The trained topic classifiers are used to derive qualitative features. These features are further integrated with structured variables for the same group of customers to predict churn. Experiments provide evidence that inclusion of qualitative features derived using E-MMSIM, enhance the performance of churn classifiers by 5%.
               
Click one of the above tabs to view related content.