Past and Current Projects
RumorSleuth: Joint Detection of Rumor Veracity and User Stance
The penetration of social media has had deep and far-reaching consequences in information production and consumption. Widespread use of social media platforms has engendered malicious users and attention seekers to spread rumors and fake news. This trend is particularly evident in various microblogging platforms where news becomes viral in a matter of hours and can lead to mass panic and confusion. One intriguing fact regarding rumors and fake news is that very often rumor stories prompt users to adopt different stances about the rumor posts. Understanding user stances in rumor posts is thus very important to identify the veracity of the underlying content. While rumor veracity and stance detection have been viewed as disjoint tasks we demonstrate here how jointly learning both of them can be fruitful. In this paper, we propose RumorSleuth, a multi-task deep learning model which can leverage both the textual information and user profile information to jointly identify the veracity of a rumor along with users' stances. Tests on two publicly available rumor datasets demonstrate that RumorSleuth outperforms current state-of-the-art models and achieves up to 14% performance gain in rumor veracity classification and around 6% improvement in user stance classification.
DeepDiffuse: Predicting the ‘Who’ and ‘When’ in Cascades
Cascades are an accepted model to capturing how information diffuses across social network platforms. A large body of research has been focused on dissecting the anatomy of such cascades and forecasting their progression. One recurring theme involves predicting the next stage(s) of cascades utilizing pertinent information such as the underlying social network, structural properties of nodes (e.g., degree) and (partial) histories of cascade propagation. However, such type of granular information is rarely available in practice. We study in this paper the problem of cascade prediction utilizing only two types of (coarse) information, viz. which node is infected and its corresponding infection time. We first construct several simple baselines to solve this cascade prediction problem. Then we describe the shortcomings of these methods and propose a new solution leveraging recent progress in embeddings and attention models from representation learning. We also perform an exhaustive analysis of our methods on several real world datasets. Our proposed model outperforms the baselines and several other state-of-the-art methods.
SIGNet: Scalable Embeddings for Signed Networks
Recent successes in word embedding and document embedding have motivated researchers to explore similar representations for networks and to use such representations for tasks such as edge prediction, node label prediction, and community detection. Such network embedding methods are largely focused on finding distributed representations for unsigned networks and are unable to discover embeddings that respect polarities inherent in edges. We propose SIGNet, a fast scalable embedding method suitable for signed networks. Our proposed objective function aims to carefully model the social structure implicit in signed networks by reinforcing the principles of social balance theory. Our method builds upon the traditional word2vec family of embedding approaches and adds a new targeted node sampling strategy to maintain structural balance in higher-order neighborhoods. We demonstrate the superiority of SIGNet over state-of-the-art methods proposed for both signed and unsigned networks on several real world datasets from different domains. In particular, SIGNet offers an approach to generate a richer vocabulary of features of signed networks to support representation and reasoning.
Inferring Multi-dimensional Ideal Points for US Supreme Court Justices
In Supreme Court parlance and the political science literature, an ideal point positions a justice in a continuous space and can be interpreted as a quantification of the justice's policy preferences. We present an automated approach to infer such ideal points for justices of the US Supreme Court. This approach combines topic modeling over case opinions with the voting (and endorsing) behavior of justices. Furthermore, given a topic of interest, say the Fourth Amendment, the topic model can be optionally seeded with supervised information to steer the inference of ideal points. Application of this methodology over five years of cases provides interesting perspectives into the leaning of justices on crucial issues, coalitions underlying specific topics, and the role of swing justices in deciding the outcomes of cases.
Topic Modeling:
Can we predict the root of a topic in a series of documents? In 1700 Physics was a scientific topic (term) and in 1900 we get quantum physics as a new topic and now Large Hadron Collider is more popular topic in scientific papers. Can we construct a timeline of these topics and subtopics from the scientific documents and when they come to light? Currently we are investigating how Latent Dirichlet Model (LDA) can be used to solve this problem.
Model user experience from online review:
Can we predict how expert is the reviewer from the review he has given in sites like Yelp or Amazon? Can we determine whether the review is useful or not? We are trying to construct better recommender system resolving these issues.
Past Projects:
Detect Novel class in continuous data streams:
The projects was to construct a classification system to detect novel class as well as recurring class the data stream where new data arrives after definite time interval. An ensemble of classifiers were constructed and updated regularly to fit the model with newest data.
Function optimization:
A particle swarm optimizer (PSO) was constructed for function optimization. To ensure diversity among particles multiple layers of neighborhood were constructed using K-means clustering. Experimental results show the effectiveness of the methods compared to other variations of PSO.