Data & Code

Data Sets Developed in Our Research

Open Source Project Code & Software

  • Sentiment analysis for online discussion forums Java Code (Github NLPForumPostOTE)

    • This package implements the construction of opinion matrices which are the input of PMF model. The main features include aspect identification, opinion expression identification and opinion relation extraction based on dependency path rules.

  • Twitter-LDA Java Code (Github Twitter-LDA)

    • The original setting in Latent Dirichlet Allocation (LDA), where each word has a topic label, may not work well with Twitter as tweets are short and a single tweet is more likely to talk about one topic. Hence, Twitter-LDA (T-LDA) has been proposed in this paper "Wayne Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng Lim, Hongfei Yan and Xiaoming Li. Comparing Twitter and traditional media using topic models. In Proceedings of the 33rd European Conference on Information Retrieval (ECIR'11) " to address this issue. T-LDA also addresses the noisy nature of tweets, where it captures background words in tweets.

  • MatchZoo: MatchZoo is a toolkit for deep neural text matching. It was developed with a focus on facilitating the designing, comparing and sharing of deep text matching models. The implemented models include ARC-I/ARC-II, DSSM, CDSSM, MatchPyramid, DRMM, aNMM, MV-LSTM, Duet, etc.

  • NeuralResponseRanking: NeuralResponseRanking is an open source package for several neural matching models for response ranking in information-seeking conversations.


Technical Notes