LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

On what to permute in test-based approaches for variable importance measures in Random Forests

Photo by bermixstudio from unsplash

In bioinformatics applications, it is currently customary to permute the outcome variable in order to produce inference on covariates to test novel methods or statistics whose distributions are poorly known.… Click to show full abstract

In bioinformatics applications, it is currently customary to permute the outcome variable in order to produce inference on covariates to test novel methods or statistics whose distributions are poorly known. The seminal publication of Altmann et al. (2010) in Bioinformatics uses the same permutation scheme to obtain p-values that can be treated as corrected measure of feature importance to rectify the bias of the Gini variable importance in Random Forests. Since then, such method has been used in applied work to also draw statistical conclusions on VIMs from resulting p-values. In this paper, we show that permuting the outcome may produce unexpected results, including p-values with undesirable properties and illustrate how more refined permutation schemes can be appropriate to obtain desirable results, including high power in discovering relevant variables. Supplementary Information Supplementary data are available at Bioinformatics online.

Keywords: test based; importance; permute test; variable importance; random forests; based approaches

Journal Title: Bioinformatics
Year Published: 2019

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.