LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Reducing correlation of random forest–based learning‐to‐rank algorithms using subsample size

Photo by sebastian_unrau from unsplash

Learning‐to‐rank (LtR) has become an integral part of modern ranking systems. In this field, the random forest–based rank‐learning algorithms are shown to be among of the top performers. Traditionally, each… Click to show full abstract

Learning‐to‐rank (LtR) has become an integral part of modern ranking systems. In this field, the random forest–based rank‐learning algorithms are shown to be among of the top performers. Traditionally, each tree of a random forest is learnt using a bootstrapped copy of the training set, where approximately 63% of the examples are unique. The goal of using a bootstrapped copy instead of the original training set is to reduce the correlation between individual trees, thereby making the prediction of the ensemble more accurate. In this regard, the following question may be raised: how can we leverage the correlation between the trees in favor of performance and scalability of a random forest–based LtR algorithm? In this article, we investigate whether we can further decrease the correlation between the trees while maintaining or possibly improving accuracy. Among several potential options to achieve this goal, we investigate the size of the subsamples used for learning individual trees. We examine the performance of a random forest–based LtR algorithm as we control the correlation using this parameter. Experiments on LtR data sets reveal that for small‐ to moderate‐sized data sets, substantial reduction in training time can be achieved using only a small amount of training data per tree. Moreover, due to the positive correlation between the variability across the trees and performance of a random forest, we observe an increase in accuracy while maintaining the same level of model stability as the baseline. For big data sets, although our experiments did not observe an increase in accuracy (because, with larger data sets, the individual tree variance is already comparatively smaller), our technique is still applicable as it allows for greater scalability.

Keywords: correlation; random forest; forest based; learning rank

Journal Title: Computational Intelligence
Year Published: 2019

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.