Code & Data
Debatepedia dataset Download
Reference: Swapna Gottipati, Minghui Qiu, Yanchuan Sim, Jing Jiang, and Noah A. Smith. Learning Topics and Positions from Debatepedia. EMNLP'13.
Topic Expertise Model
Code: Java Code (Github TEM)
This package implements Gibbs sampling for Topic Expertise Model for jointly modeling topics and expertise in CQA sites.
Reference: CQARank: Jointly Model Topics and Expertise in Community Question Answering, CIKM'13.
PMF Model for Mining User Relations
Description: Probabilistic Matrix Factorization with both user-user and user-item relations
Code: Code (Github)
Data: 6 data sets from CreateDebate Download
Reference: Mining User Relations from Online Discussions using Sentiment Analysis and Probabilistic Matrix Factorization, NAACL'13.
B-LDA (Joint Behavior-Topic Model)
Code: Java Code (Github B-LDA)
We propose an LDA-based behavior-topic model (B-LDA) which jointly models user topic interests and behavioral patterns. We focus the study of the model on on-line social network settings such as microblogs like Twitter where the textual content is relatively short but user inter-actions on them are rich.
Reference: It's Not What We Say But How We Say Them: LDA-based Behavior-Topic Model, Minghui Qiu, Feida Zhu and Jing Jiang, SDM'13, Austin, Texas, USA, May, 2013.
Twitter-LDA
The original setting in Latent Dirichlet Allocation (LDA), where each word has a topic label, may not work well with Twitter as tweets are short and a single tweet is more likely to talk about one topic. Hence, Twitter-LDA (T-LDA) has been proposed in this paper "Wayne Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng Lim, Hongfei Yan and Xiaoming Li. Comparing Twitter and traditional media using topic models. In Proceedings of the 33rd European Conference on Information Retrieval (ECIR'11) " to address this issue. T-LDA also addresses the noisy nature of tweets, where it captures background words in tweets.