Creator: Jalleke Vanooteghem | Credit: Jalleke Vanooteghem on Unsplash
Copyright: © Jalleke Vanooteghem
Jammoussi and her team developed a method of sentiment analysis. This type of analysis relies on Natural Language Processing (NLP) to determine the emotional tone that a text conveys, whether it is happy, sad, or neutral. Then, the researchers used auto-encoders, a type of neural network, to help computers recognise emotions in text, specifically whether the text sounds positive or negative. Their method, Contextual Recursive Auto-Encoder, also known as CoRAEs, focuses on analysing words and the surrounding context within sentences to better understand the sentiment expressed. The author and her team explain that, traditionally, sentiment analysis can be done using two main approaches: 1) the use of dictionaries that associate words with specific emotions; or 2) the use of mathematical and computational techniques to infer sentiment from word usage patterns. However, these methods can be challenging, as they require careful selection of words and rules.
In this context, Jammoussi introduced a new Point-wise Mutual Information (PMI) approach. She called it PMI-SA. This new model enables a distributed vector representation of words that makes it easier for computers to understand sentiment more effectively without relying heavily on predefined rules or dictionaries. Additionally, this method allows computers to parse words within sentences in a unique way, improving their ability to understand emotions expressed in text, particularly on social media platforms such as Twitter and Facebook, as datasets of these were used. Finally, the main motivation of this work was to propose a deep compositional model that generates a reduced vector representation respecting word order and without using a computer science device called parse tree structure.
On the eve of the 2016 elections in the United States, there were two different views of public opinion of the electorate. On the one hand, political pundits considered Donald J. Trump's candidacy inconsequential and almost a joke, while at the same time, they saw the Democratic contest between Bernie Sanders and Hillary Clinton as the one that would produce a new president. On the other hand, a group of computer scientists from data-focused companies such as Brandwatch or Lexalytics had a different idea. Twitter posts showed that Trump was more popular among its users. Polls that had long shown in great detail who was likely to win an election no longer worked as they once did. For example, many people polled by Public Policy Polling (PPP) mentioned that they would vote for a gorilla that had recently been killed in a zoo, which at the time became a popular meme. It was in this context that the importance of data became even more evident. As Clive Humby said a decade earlier, "Data is the new oil." Those scientists analysed billions of rows of raw and processed data and got a better idea of who would become the next US president. Like Dr Jammoussi testing new methods, with reduced vector representation, this kind of data analysis has become essential to better understand electorates' emotions. So, while pundits and some pollsters doubted Trump, data scientists analysed social media texts to understand trends. They discovered that Trump was the most popular candidate on social media. The rest is history.
Some companies like X (formerly Twitter) have made accessing data via APIs more complex and expensive. In this new context, how can researchers access those datasets?
Do you use programming languages such as Python to carry out this type of research? If so, what libraries do you use and why?