Heterogeneous defect prediction (HDP) is a promising research area in the software defect prediction domain to handle the unavailability of the past homogeneous data. In HDP, the prediction is performed… Click to show full abstract
Heterogeneous defect prediction (HDP) is a promising research area in the software defect prediction domain to handle the unavailability of the past homogeneous data. In HDP, the prediction is performed using source dataset in which the independent features (metrics) are entirely different than the independent features of target dataset. One important assumption in machine learning is that independent features of the source and target datasets should be relevant to each other for better prediction accuracy. However, these assumptions do not generally hold in HDP. Further in HDP, the selected source dataset for a given target dataset may be of small size causing insufficient training. To resolve these issues, we have proposed a novel heterogeneous data preprocessing method, namely, Transfer of Data from Target dataset to Source dataset selected using Relevance score (TDTSR), for heterogeneous defect prediction. In the proposed approach, we have used chi-square test to select the relevant metrics between source and target datasets and have performed experiments using proposed approach with various machine learning algorithms. Our proposed method shows an improvement of at least 14% in terms of AUC score in the HDP scenario compared to the existing state of the art models.
               
Click one of the above tabs to view related content.