In this paper, a novel single-index model tree (SIMTree) is proposed. It adopts the recursive partitioning strategy and each data segment is modeled by a single-index model (SIM), which is… Click to show full abstract
In this paper, a novel single-index model tree (SIMTree) is proposed. It adopts the recursive partitioning strategy and each data segment is modeled by a single-index model (SIM), which is a flexible extension of linear regression with non-parametric link functions. The proposed SIMTree has two major advantages: a) with only a few leaf nodes, it can achieve competitive predictive performance compared to complicated black-box models; b) SIMs fitted on each local data segment are intrinsically interpretable. However, using conventional techniques to build such a SIMTree can be extremely time-consuming. SIM estimation typically involves iterative optimization via Newton-type algorithms; such a resource-intensive estimation procedure is repeatedly used for fitting leaf node SIMs and the search of optimal splits. To make the computation burden affordable, an effective training algorithm is proposed as enabled by the efficient utilization of Stein's lemma and several accelerating strategies in the tree construction algorithm. Moreover, a new Python package simtree is developed with elegant visualization modules that can further facilitate the model interpretation. Numerical results on extensive regression datasets show that SIMTree is an accurate and interpretable machine learning model.
               
Click one of the above tabs to view related content.