URLs serve as bridges between social media platforms and the broader web, linking user-generated content to external information resources. While prior studies have examined the motivations of authors who share URLs, such author-centered intentions are difficult to observe in practice. To enable broader downstream use, this work investigates reader-centered interpretations—how users perceive the intentions behind hyperlinks included in posts. We develop an intent taxonomy for including hyperlinks in social posts through a hybrid approach that begins with a bottom-up, data-driven process using large-scale crowdsourced annotations, and is then refined using large language model (LLM) assistance to generate descriptive category names and precise definitions. We further compare our taxonomy with existing taxonomies and demonstrate its utility in a microblog retrieval task by incorporating intent as an additional feature.
We discuss the importance of understanding readers' interpretations of intentions behind including URLs in posts and describe how a well-structured taxonomy enables systematic investigation of these interpretations.
We present our process of developing an intent taxonomy for categorizing the intentions behind URLs in social posts. It leverages a hybrid approach that combines human annotation and LLM-based assistance. The resulting taxonomy comprises 6 broad intention categories and 26 specific intention classes.
We present two user studies that examine how users interpret the intentions behind URL-sharing posts. The first study analyzes the distribution of intentions for including URLs across a large collection of tweets. The second study examines how annotation quality changes when expert annotators are involved and additional contextual information is provided, aiming to improve the labeling quality of low-agreement posts.
We compare our intent taxonomy with those proposed in prior work and present a potential application in microblog retrieval, where intent information is incorporated as an additional feature for reranking and filtering retrieval results.
The complete taxonomy with representative examples can be found at https://github.com/lanfangping/intent-taxonomy-for-hyperlinked-posts.
URL-sharing Intent can improve the quality of microblog retrieval:
Filtering semantically mismatched results that are otherwise ranked highly by lexical similarity alone
Improve Retrieval Quality (e.g., the relevance to query intent: Information Sharing > Information Provision)
Figure: An example of intent-aware retrieval.
URLs serve as bridges between social media platforms and the broader web, linking user-generated content to external information resources. On Twitter (X), approximately one in five tweets contains at least one URL, underscoring their central role in information dissemination. While prior studies have examined the motivations of authors who share URLs, such author-centered intentions are difficult to observe in practice. To enable broader downstream use, this work investigates reader-centered interpretations—how users perceive the intentions behind hyperlinks included in posts. We develop an intent taxonomy for including hyperlinks in social posts through a hybrid approach that begins with a bottom-up, data-driven process using large-scale crowdsourced annotations, and is then refined using large language model (LLM) assistance to generate descriptive category names and precise definitions. The final taxonomy comprises 6 top-level categories and 26 fine-grained intention classes, capturing diverse communicative purposes. Applying this taxonomy, we annotate and analyze 1,000 user posts, revealing that advertising, arguing, and sharing are the most prevalent intentions. We further compare our taxonomy with existing taxonomies and demonstrate its utility in a microblog retrieval task by incorporating intent as an additional feature. Overall, our taxonomy provides a foundation for intent-aware information retrieval and NLP applications, enabling more accurate retrieval, recommendation, and interpretation of social media content.
Contact [edragut@temple.edu] to get more information on the project