All bug tracking systems contain large numbers of issue duplicates, which take a lot of time to manually detect and resolve (deduplicate). There are many approaches to automating this process, as evidenced by more than 80 papers, each presenting a different algorithm and evaluated on different datasets and in different settings. This leads to two challenges: comprehending the multitude of existing approaches and comparing them in an accurate and meaningful way.
The project's goal is to address these problems by completing a comprehensive literature study and implementing a system for accurate and unified evaluation of existing models.
The purpose of the literature review is to index the existing approaches and store them in a format that renders previously reported results suitable for analysis.
The system should provide a unified interface for the various approaches to issue deduplication, including the means of evaluating the approaches, promoting the reproducibility of the results, and the availability of the approaches’ source code.