Multitask multiple kernel learning (MKL) algorithms combine the capabilities of incorporating different data sources into the prediction model and using the data from one task to improve the accuracy on… Click to show full abstract
Multitask multiple kernel learning (MKL) algorithms combine the capabilities of incorporating different data sources into the prediction model and using the data from one task to improve the accuracy on others. However, these methods do not necessarily produce interpretable results. Restricting the solutions to the set of interpretable solutions increases the computational burden of the learning problem significantly, leading to computationally prohibitive run times for some important biomedical applications. That is why we propose a multitask MKL formulation with a clustering of tasks and develop a highly time-efficient solution approach for it. Our solution method is based on the Benders decomposition and treating the clustering problem as finding a given number of tree structures in a graph; hence, it is called the forest formulation. We use our method to discriminate early-stage and late-stage cancers using genomic data and gene sets and compare our algorithm against two other algorithms. The two other algorithms are based on different approaches for linearization of the problem while all algorithms make use of the cutting-plane method. Our results indicate that as the number of tasks and/or the number of desired clusters increase, the forest formulation becomes increasingly favorable in terms of computational performance.
               
Click one of the above tabs to view related content.