LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Predicting Duplicate in Bug Report Using Topic-Based Duplicate Learning With Fine Tuning-Based BERT Algorithm

Photo from wikipedia

As the usage and coverage of software increase, various functional improvements and bugs are occurring. The Eclipse, Mozilla open-source projects receive more than about 300 bug reports per day. Usually,… Click to show full abstract

As the usage and coverage of software increase, various functional improvements and bugs are occurring. The Eclipse, Mozilla open-source projects receive more than about 300 bug reports per day. Usually, when a user finds a bug, they write a bug report. The developer assigned to the bug reads the content of the bug, and if it has already been fixed, the developer marks it as a duplicate bug report. However, if duplicate bug reports are submitted, the developer must manually identify the same bug, and this process requires a lot of effort by the developer. If redundancies in bug reports can be identified automatically, unnecessary effort on the part of the developer can be reduced. To resolve this problem, this paper predicts redundancy using the BERT (Bidirectional Encoder Representations from the Transformer) algorithm and topic-based duplicate/non-duplicate feature extraction. First, a bug report by bug status is extracted from the bug repository, and topic models are constructed by status by applying topic modeling to each status. In each topic, feature selection is performed using the non-duplicate status and the duplicate status. It learns the extracted features as inputs to the BERT algorithm and predicts duplicate bug reports. In this paper, Precision, Recall, F-measure, and Accuracy were used to evaluate the proposed model, and Eclipse, Mozilla, Apache, and KDE open sources were used. The proposed model shows about 87.67%, 89.85%, 87.03%, and 88.95% performance in Eclipse, Mozilla, Apache, and KDE, respectively. In addition, performance comparison with baselines (Naïve Bayes, Randomforest, Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Convolutional Neural Networks-Long Short-Term Memory Networks (CNN-LSTM)) in Eclipse, Mozilla, Apache, and KDE about 36.33%, 44.46%, 47.77%, and 45.17%, improvement, respectively, showed that the proposed model is better at detecting duplicates than the baselines.

Keywords: bug report; developer; duplicate; bug; duplicate bug

Journal Title: IEEE Access
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.