A Semi-Supervised Clustering Tool for Crowdsourced Test Reports with Deep Image Understanding
Due to the openness of crowdsourced testing, mobile app crowdsourced testing has been subject to duplicate reports. The previous research methods extract the textual features of the crowdsourced test reports, combine with shallow image analysis, and perform unsupervised clustering on the crowdsourced test reports to clarify the duplication of crowdsourced test reports and solve the problem. However, these methods ignore the semantic connection between textual descriptions and screenshots, making the clustering results unsatisfactory and the deduplication effect less accurate.
This paper proposes a semi-supervised clustering tool for crowdsourced test reports with deep image understanding, namely SemCluster, which makes the most of the semantic connection between textual descriptions and screenshots by constructing semantic binding rules and performing semi-supervised clustering. SemCluster improves six metrics of clustering results in the experiment compared to the state-of-the-art method, which verifies that SemCluster has achieved a good deduplication effect.
downloading URL: link
downloading URL: link
The demo file has four columns. The "index" column represents the index of the test report. The "description" column represents the textual description of the test report. The "img_url" column represents the download url of the screenshot of the test report. The "tag" column represents the ground truth of the cluster of the test report.
downloading URL: link