Text Similarity Detection

using Machine Learning algorithms with Character-based similarity measures

by Emil Kalbaliyev and Samir Rustamov

Project description

Text similarity detection is one of the significant research problems in the Natural Language Processing field. In this project, we propose an approach that uses machine learning models with character-based similarity measures to classify texts based on similarity. The model was trained on news articles collected from Azerbaijani news websites. Our proposed method outperforms results gained from individual character-based similarity measurements.

Azerbaijani News Similarity Detection

The similarity score between 2 Azerbaijani news articles can be determined by using the following application. In order to determine the similarity between 2 news articles, you should place texts of news in the dedicated textboxes, select the type of the desired measurement, and click the submit button.

Paper

E. Kalbaliyev, S. Rustamov. Text Similarity Detection Using Machine Learning Algorithms with Character-Based Similarity Measures. In Proceedings of MIDI’2020 – 8th Machine Intelligence and Digital Interaction Conference, December 9-10, 2020, Warsaw, Poland (online). Advances in Intelligent Systems and Computing, vol 1376. Springer, Cham. (Best Paper Award)

PDF

Video

Code

Data

Authors

Emil Kalbaliyev

MSc student,

Eötvös Loránd University

Samir Rustamov

Assistant Professor,

ADA University

Acknowledgement

This work has been carried out at the Center of Data Analytics and Research at ADA University. We express our deep gratitude to Mardan Safarov, Vasif Vahidov, and Ulvi Mammadli for their assistance in this research work.

Page updated

Google Sites

Report abuse