Applying Sentiment and Network Analyses to, Hopefully, Produce Some Results
Taiwan's democratic functioning is a matter of heated debate and hotter geopolitical tension, in light of China's documented efforts to interfere in Taiwan's presidential elections. To explore Taiwan's media space in times of elections, I explore methods to illustrate the volume of online content featuring priorly selected keywords. I replicate Chang’s 2021 framework on observing digital civic participation and misinformation during Taiwan’s 2020 presidential election via the Cofacts database of LINE posts. Further, I perform basic time-series analysis on this data in order to observe the co-incidence of two keywords, "Lai" and "illegitimate," in the time period after the conclusion of the 2024 election period.
Taiwan is a self-governing island of 23 million people with a vibrant democracy. Its January 2024 presidential election pitted pro-independence and pro-unification camps against each other under the shadow of Beijing’s insistence that Taiwan must be reunited with China. That geopolitical tension made the campaign a prime target for influence operations designed to undermine confidence in Taiwan’s institutions.
China has long deployed state-backed networks and proxy accounts to spread strategic falsehoods about Taiwan’s leaders and key industries. A notable example accused TSMC, the island’s global semiconductor champion, of compromising national security. Attacks on Taiwan’s chip sector carry extra weight because the island’s geographic position and economic role make it central to international technology supply chains.
At the same time, Taiwan’s media landscape mixes traditional outlets with peer-to-peer messaging apps such as LINE. Sensational rumors can ignite on private chats, trigger algorithmic promotion, and then spill into public forums. Small, targeted perturbations, such as questions about energy policy or public health, can rapidly morph into widespread controversies when amplified by platform incentives.
Global experience underscores the stakes. After 2016 in the United States, researchers documented how automated bots, micro-targeted ads, and sensational content can erode trust in elections. The concept of "information disorder" highlights disinformation as a systemic threat to democratic legitimacy when citizens cannot agree on basic facts. Taiwan has already mapped bot networks and launched fact-checking initiatives, but tactics evolve with AI-driven deep fakes and generative text models.
To keep pace, we must treat the information space as a dynamic system shaped by user behavior and external interventions. This line of research could quantify precisely how narrative clusters, such as posts mentioning the DPP candidate’s surname, Lai, propagate and dissipate, revealing the mathematical dynamics behind disinformation in a high-stakes election, as opposed to simple econometric tracking.
Researchers have explored several approaches to studying how false or misleading stories spread online. In this section we review the most influential techniques without assuming you already know them.
Label-Propagation and Veracity Tagging
In Chang’s 2021 study, the Cofacts dataset is introduced. This dataset contains over a hundred thousand posts from the Taiwanese messaging/media app LINE. Each social media post is tagged by volunteers or fact-checking groups with one of several labels: true, rumor, opinion or misinformation.
Statistical Linking of Exposure to Behavior
Wang’s 2020 work examined whether seeing fake news actually changes people’s votes. He used regression analysis, a statistical technique that measures how changes in one factor are associated with changes in another factor, such as support for a particular candidate. By comparing regions or demographic groups with different exposure levels, he could estimate how much fake news shifted vote shares, while controlling for other influences like age or income.
Thematic Content Coding
Tatsumi’s 2019 research focused on energy and cybersecurity narratives. In this work was first defined a set of themes such as “nuclear risk” or “renewable cost.” Human coders were then instructed to read each article or post and assign it to one or more themes. Over time, this manual coding revealed which themes appeared most often in disinformation campaigns. Although it requires significant human effort, thematic coding provides a clear map of the storylines that malicious actors prefer.
Sentiment Analysis Tools
Many projects use off-the-shelf software, such as VADER or TextBlob, to measure how positive or negative a piece of text feels. These tools scan each sentence for words like “good,” “bad,” “critical” or “amazing” and then assign a score from very negative to very positive. While not designed specifically for political content, they offer a quick way to see if disinformation tends to come wrapped in anger, fear or praise.
Basic Network Metrics
When posts are shared or retweeted, they form a network of connections. Two simple but powerful measures help identify key players. Degree centrality counts how many direct links a user has. Betweenness centrality measures how often a user sits on the shortest sharing path between two others, like finding the most common bridge between two islands. High values in these metrics often point to “superspreaders” who can inject content into many parts of the network.
These well-established methods create a strong foundation. They show us how to tag large volumes of posts, link exposure to real-world effects, map story themes, gauge emotional tone and spot influential accounts. Yet each treats the information space as a collection of isolated posts or actors. In our work we move beyond these building blocks to capture the dynamic, system-wide patterns that emerge when narratives compete for attention.
I have not yet built every tool described below, but I have explored these approaches to understand how narratives flow through Taiwan’s information space:
Transformer-Based Theme Classification
I looked into using modern “transformer” language models to tag each LINE post by theme. For example, instead of just saying whether a post is true or false, the model could tell me if it’s about a candidate, economic policy, or another topic. I experimented with a pre-trained model and confirmed it can learn these themes when fine-tuned on a labeled subset of posts.
Temporal Clustering of Keyword Spikes
I sketched out how to plot the daily frequency of a search term and then apply a clustering algorithm to highlight sudden jumps ("spikes") in usage. In practice, this means I could flag days when the name “Lai” appears much more often than usual, suggesting a burst of attention I should investigate further. I ran small tests on sample data to verify that clear spikes emerge around key events.
Community Detection in Sharing Networks
I reviewed methods like the Louvain algorithm that can sift through repost and reply connections to find tightly knit user groups. Although I haven’t yet coded the full pipeline, I confirmed on a toy network that these algorithms reliably partition users into distinct clusters based on their sharing patterns.
Emergent Dynamics Metrics
Finally, I explored mathematical measures to quantify how narrative clusters form and spread. For example:
Clustering Coefficient: I tested formulas that estimate how interconnected users are within a cluster, indicating the tightness of an echo chamber.
Growth Exponent: I simulated simple growth curves to see how rapidly a cluster’s size can expand over consecutive days.
Each of these methods remains at the exploration stage. I intend to integrate them into a coherent workflow once I secure the cleaned dataset and finalize my analysis plan.
My research unto the current point has involved the following methods:
Accessing the Cofacts dataset of LINE posts. This dataset was compiled by the Taiwanese think-tank CoLabs. The dataset allowed me to perform basic