Ensemble methods can be used to identify causal relationships in data for a better understanding and taking the right decision in processes that involve high risk. This paper explores the… Click to show full abstract
Ensemble methods can be used to identify causal relationships in data for a better understanding and taking the right decision in processes that involve high risk. This paper explores the idea of a causal decision tree forest and proposes a regularized ensemble method by integrating optimal causal trees for improved prediction accuracy while not compromising on accurately estimating heterogeneous treatment effects. The proposed method is based on selecting a subset of the most accurate causal trees from a sufficiently large pool based on their out-of-sample error estimates. The selected trees are integrated to form an ensemble that is used for estimating heterogeneous treatment effect and predicting unseen data. The proposed method is applied on Pakistan’s income function consisting of 27964 observations on wages of workers age 10 and above as an example dataset. The paper gives a detailed simulation study where datasets are generated under 5 different designs. The proposed method is assessed against ordinary least square (OLS), least absolute shrinkage and selection operator (LASSO), Ridge, Causal Tree and the standard decision trees forest via mean square error (MSE), root mean square error (RMSE), mean absolute deviation (MAD) and Pearson correlation (r) as performance metrics. The analyses given in the paper reveal that the proposed method can be used effectively for estimating heterogeneous treatment effects and achieves better prediction performance and as compared to the rest of the methods given in the paper.
               
Click one of the above tabs to view related content.