using Machine Learning algorithms with Character-based similarity measures
by Emil Kalbaliyev and Samir Rustamov
Text similarity detection is one of the significant research problems in the Natural Language Processing field. In this project, we propose an approach that uses machine learning models with character-based similarity measures to classify texts based on similarity. The model was trained on news articles collected from Azerbaijani news websites. Our proposed method outperforms results gained from individual character-based similarity measurements.
The similarity score between 2 Azerbaijani news articles can be determined by using the following application. In order to determine the similarity between 2 news articles, you should place texts of news in the dedicated textboxes, select the type of the desired measurement, and click the submit button.
E. Kalbaliyev, S. Rustamov. Text Similarity Detection Using Machine Learning Algorithms with Character-Based Similarity Measures. In Proceedings of MIDI’2020 – 8th Machine Intelligence and Digital Interaction Conference, December 9-10, 2020, Warsaw, Poland (online). Advances in Intelligent Systems and Computing, vol 1376. Springer, Cham. (Best Paper Award)
MSc student,
Eötvös Loránd University
Assistant Professor,
ADA University
This work has been carried out at the Center of Data Analytics and Research at ADA University. We express our deep gratitude to Mardan Safarov, Vasif Vahidov, and Ulvi Mammadli for their assistance in this research work.