The dataset used in this study comprises 543,351 tweets collected between August 30, 2020, and June 8, 2021. Tweets were gathered from ten major Italian cities, including Milan, Turin, Bologna, Venice, Florence, Rome, Naples, Bari, Palermo, and Cagliari. The cities were chosen to cover a diverse set of geographical regions: the North, Center, South, and Islands of Italy.
Stripping out links, mentions, and special characters to focus on the textual content. Removing punctuation and normalizing text to create a standardized dataset for analysis.
Replacing emojis with their corresponding descriptions to retain emotional content. Cleaning hashtags embedded within sentences while preserving their meaning.