Sparse Multinomial Logistic Regression (SMLR) is widely used in the field of image classification, multi-class object recognition, and so on, because it has the function of embedding feature selection during… Click to show full abstract
Sparse Multinomial Logistic Regression (SMLR) is widely used in the field of image classification, multi-class object recognition, and so on, because it has the function of embedding feature selection during classification. However, it cannot meet the time and memory requirements for processing large-scale data. We have reinvestigated the classification accuracy and running efficiency of the algorithm for solving SMLR problems using the Alternating Direction Method of Multipliers (ADMM), which is called fast SMLR (FSMLR) algorithm in this paper. By reformulating the optimization problem of FSMLR, we transform the serial convex optimization problem to the distributed convex optimization problem, i.e., global consensus problem and sharing problem. Based on the distributed optimization problem, we propose two distribute parallel SMLR algorithms, sample partitioning-based distributed SMLR (SP-SMLR), and feature partitioning-based distributed SMLR (FP-SMLR), for a large-scale sample and large-scale feature datasets in big data scenario, respectively. The experimental results show that the FSMLR algorithm has higher accuracy than the original SMLR algorithm. The big data experiments show that our distributed parallel SMLR algorithms can scale for massive samples and large-scale features, with high precision. In a word, our proposed serial and distribute SMLR algorithms outperform the state-of-the-art algorithms.
               
Click one of the above tabs to view related content.