High-dimensional LASSO (Hi-LASSO) is a powerful feature selection tool for high-dimensional data. Our previous study showed that Hi-LASSO outperformed the other state-of-the-art LASSO methods. However, the substantial cost of bootstrapping… Click to show full abstract
High-dimensional LASSO (Hi-LASSO) is a powerful feature selection tool for high-dimensional data. Our previous study showed that Hi-LASSO outperformed the other state-of-the-art LASSO methods. However, the substantial cost of bootstrapping and the lack of experiments for a parametric statistical test for feature selection have impeded to apply Hi-LASSO for practical application. In this paper, the Python package and its Spark library are efficiently implemented in a parallel manner for practice with real-world problems, as well as providing the capability of the parametric statistical tests for feature selection on high-dimensional data. We demonstrate Hi-LASSO’s outperformance with various intensive experiments in a practical manner. Hi-LASSO will be efficiently and easily performed by using the packages for feature selection. Hi-LASSO packages are publicly available at https://github.com/datax-lab/Hi-LASSO under the MIT license. The packages can be easily installed by Python PIP, and additional documentation is available at https://pypi.org/project/hi-lasso and https://pypi.org/project/Hi-LASSO-spark. Author summary We provide the brief presentation in the literature of Hi-LASSO comparing to Random LASSO. Then, we describe Hi-LASSO’s open-source packages in Python and Apache Spark, specifying parameters. The open-source packages improves efficiency and scalability of Hi-LASSO, so that the time-consuming bootstrapping-based parametric statistical test can be practically applied for high-dimensional data. We conducted intensive experiments to assess the performance of the packages with the parametric statistical test using simulation data, semi-real datasets, and TCGA cancer dataset. The Hi-LASSO packages showed outstanding and robust performance in feature selection. The packages are available through PyPI and can be easily installed using Python PIP.
               
Click one of the above tabs to view related content.