Learning‐to‐rank (LtR) has become an integral part of modern ranking systems. In this field, the random forest–based rank‐learning algorithms are shown to be among of the top performers. Traditionally, each… Click to show full abstract
Learning‐to‐rank (LtR) has become an integral part of modern ranking systems. In this field, the random forest–based rank‐learning algorithms are shown to be among of the top performers. Traditionally, each tree of a random forest is learnt using a bootstrapped copy of the training set, where approximately 63% of the examples are unique. The goal of using a bootstrapped copy instead of the original training set is to reduce the correlation between individual trees, thereby making the prediction of the ensemble more accurate. In this regard, the following question may be raised: how can we leverage the correlation between the trees in favor of performance and scalability of a random forest–based LtR algorithm? In this article, we investigate whether we can further decrease the correlation between the trees while maintaining or possibly improving accuracy. Among several potential options to achieve this goal, we investigate the size of the subsamples used for learning individual trees. We examine the performance of a random forest–based LtR algorithm as we control the correlation using this parameter. Experiments on LtR data sets reveal that for small‐ to moderate‐sized data sets, substantial reduction in training time can be achieved using only a small amount of training data per tree. Moreover, due to the positive correlation between the variability across the trees and performance of a random forest, we observe an increase in accuracy while maintaining the same level of model stability as the baseline. For big data sets, although our experiments did not observe an increase in accuracy (because, with larger data sets, the individual tree variance is already comparatively smaller), our technique is still applicable as it allows for greater scalability.
               
Click one of the above tabs to view related content.