Code & Data

  • Debatepedia dataset Download

  • Topic Expertise Model

    • Code: Java Code (Github TEM)

    • This package implements Gibbs sampling for Topic Expertise Model for jointly modeling topics and expertise in CQA sites.

    • Reference: CQARank: Jointly Model Topics and Expertise in Community Question Answering, CIKM'13.

  • PMF Model for Mining User Relations

  • B-LDA (Joint Behavior-Topic Model)

    • Code: Java Code (Github B-LDA)

    • We propose an LDA-based behavior-topic model (B-LDA) which jointly models user topic interests and behavioral patterns. We focus the study of the model on on-line social network settings such as microblogs like Twitter where the textual content is relatively short but user inter-actions on them are rich.

    • Reference: It's Not What We Say But How We Say Them: LDA-based Behavior-Topic Model, Minghui Qiu, Feida Zhu and Jing Jiang, SDM'13, Austin, Texas, USA, May, 2013.

  • Twitter-LDA

    • Code: Java Code (Github Twitter-LDA)

    • The original setting in Latent Dirichlet Allocation (LDA), where each word has a topic label, may not work well with Twitter as tweets are short and a single tweet is more likely to talk about one topic. Hence, Twitter-LDA (T-LDA) has been proposed in this paper "Wayne Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng Lim, Hongfei Yan and Xiaoming Li. Comparing Twitter and traditional media using topic models. In Proceedings of the 33rd European Conference on Information Retrieval (ECIR'11) " to address this issue. T-LDA also addresses the noisy nature of tweets, where it captures background words in tweets.