Rest-Mex: Research on Sentiment Analysis Task for Mexican Tourist Texts

At IberLef 2023

REST-MEX 2023

Sentiment analysis task in tourist texts has gained relevance in the last decade; however, the most significant scientific communication efforts have focused on the English language. Although some studies have focused on Spanish, few address Spanish who is not from Spain. These approaches are typically applied to collections taken from social networks such as tweets, so tourist texts have not been directly addressed.

Also, this is the first time within the framework of Rest-Mex that an unsupervised classification task has been proposed. This task is named Thematic Unsupervised Classification. For this task, 50,000 news items were obtained on 5 different topics related to tourism. The idea is that given all the collected texts, 5 groups are generated. The system that obtains the classification most similar to the ideal classification (Gold Standard) will obtain the highest result. All data was obtained from google news. News spread over the last two years regarding the 5 tourism themes (for reasons of competition, these themes will not be revealed) were carefully downloaded and tagged.

Challenges to solve

For the Sentiment Analysis, the problem is defined as follows:

"Given an opinion about a Mexican tourist place, the goal is to determine the polarity, between 1 and 5, of the text, the type of opinion (hotel, restaurant or attraction) and, the country of the place of which the opinion is being given (Mexico, Cuba, Colombia) "

For the Thematic Unsupervised Classification, the problem is defined as follows:

"Given set of text related to Mexican Tourism, the goal is to build 5 gruops with this set trying that each group represents an important topic of tourism"