LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Random Partition Based Adaptive Distributed Kernelized SVM for Big Data

Photo from wikipedia

In this paper, we present a distributed classification technique for big data by efficiently using distributed storage architecture and data processing units of a cluster. While handling such large data,… Click to show full abstract

In this paper, we present a distributed classification technique for big data by efficiently using distributed storage architecture and data processing units of a cluster. While handling such large data, the existing approaches consider specific data partitioning techniques which demand complete data be processed before partitioning. This leads to an excessive overhead of high computation and data communication. The proposed method does not require any pre-structured data partitioning technique and is also adaptive to big data mining tools. We hypothesize that an effective aggregation of the information generated from data partitions by subprocesses of the complete learning process can lead to accurate prediction results while reducing the overall time complexity. We build three SVM based classifiers, namely one phase voting SVM (1PVSVM), two phase voting SVM (2PVSVM), and similarity based SVM (SIMSVM). Each of these classifiers utilizes the support vectors as the local information to construct the synthesized learner for efficiently reducing the training time and ensuring minimal communication between processing units. In this context, an extensive empirical analysis demonstrates the effectiveness of our classifiers when compared to other existing approaches on several benchmark datasets. However, among existing methods and three of our proposed (1PVSVM, 2PVSIM, and SIMSVM) methods, SIMSVM is the most efficient. Considering MNIST dataset, SIMSVM achieves an average speedup ratio of 0.78 and minimum scalability of 73% when the data size is scaled up to 10 times. It also retains high accuracy (99%) similar to centralized approaches.

Keywords: big data; distributed kernelized; random partition; adaptive distributed; partition based; based adaptive

Journal Title: IEEE Access
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.