LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

A parallel computation of skyline using multiple regression analysis-based filtering on MapReduce

Photo by ventiviews from unsplash

In the last decade, skyline query processing has become widely important because of its usefulness in decision making applications. Since the size of the datasets used for skyline query processing… Click to show full abstract

In the last decade, skyline query processing has become widely important because of its usefulness in decision making applications. Since the size of the datasets used for skyline query processing are huge, algorithms for MapReduce-based skyline query processing have been widely studied. However, existing algorithms suffer from low-filtering efficiency for local skyline computation, and unrealistically assume both uniform data distributions and dimensional independence. In this paper, we propose a parallel skyline query processing algorithm for MapReduce using multiple regression analysis. The goal of our algorithm is to efficiently find a set of skylines from a large dataset by reducing the number of candidates prior to the skyline computation. To develop the skyline computation algorithm on anti-correlated datasets, we computed a data filtering threshold line based on a multiple regression analysis of the sampled dataset. To guarantee the accuracy of the skyline result, we considered both a filtering threshold line and a grid-based cell dominance condition. Thus, only relevant data could be computed in the real skyline computation step. For local skyline computation, we utilized an angle-based partitioning of data space that effectively eliminates non-promising points in partitions. For the global skyline computation, we used the dominance relationship among grid-based partitions to prune out unnecessary skyline points. Performance analyses showed that our parallel skyline query processing algorithm outperformed existing algorithms, under various settings.

Keywords: skyline computation; computation; skyline; skyline query; multiple regression; query processing

Journal Title: Distributed and Parallel Databases
Year Published: 2017

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.