Rest-Mex 2022: Recommendation System, Sentiment Analysis and Covid Semaphore Prediction for Mexican Tourist Texts


2022

The importance of NLP in Tourism in Mexican Spanish Text Data

Tourism is a social, cultural, and economic phenomenon related to people's movement to places outside their usual residence for personal or business/professional reasons. This activity is vital in various countries, including Mexico, representing 8.7 % of the national GDP, generating around 4.5 million direct jobs.

With the pandemic generated by the SARS-COV-2 virus, which began in Mexico in mid-March 2020, tourism was one of the most affected sectors. Tourism is trying to re-establish itself through improvements in the quality and safety of touristic products and services.

Natural Language Processing (NLP) is an artificial intelligence area that can help restore tourism by generating mechanisms for detecting problems from identifying the polarities of tourists' opinions on virtual platforms. Systems can also be developed that consider the user and destination information to recommend the places where the user will have better tourist experiences. In this way, the tourism sector and the tourists themselves could be supported by the NLP.

REST-MEX 2022

Few Recommendation systems for tourist sites are based on a user's profile's affinity compared to each place's description. The data collections to train these systems are from users and places in English-speaking countries. Considering the importance of Ibero-American countries in tourism, it is vitally important to generate Spanish resources that allow the generation of systems that help develop intelligent systems in tourism.


On the other hand, Sentiment analysis task in tourist texts has gained relevance in the last decade; however, the most significant scientific communication efforts have focused on the English language. Although some studies have focused on Spanish, few address Spanish who is not from Spain. These approaches are typically applied to collections taken from social networks such as tweets, so tourist texts have not been directly addressed.


Finally, the Epidemiological semaphore is a Mexican system implemented by the government to determine the activities allowed according to the severity of the covid pandemic. This system consists of 4 colors. Tourist activities are the most affected by any changes in the direction of the pandemic. Interestingly, many of the factors taken into account to determine the color of the semaphore are published in the news (hospital capacity, contagion curve, number of oxygen tanks, vaccinated population, etc.). In this way, the question arises if it will be possible to determine the color of the epidemiological semaphore in the future from published covid news.

Challenges to solve

"Given a TripAdvisor tourist and a Mexican tourist place, the goal is to automatically obtain the degree of satisfaction (between 1 and 5) that the tourist will have when visiting that place."

"Given an opinion about a Mexican tourist place, the goal is to determine the polarity, between 1 and 5, of the text. and the type of opinion (hotel, restaurant or attraction)"

"Given the news related to covid of a Mexican region, the goal is to determine the semaphore color of the weeks 0, 2, 4 and 8 in the future"