Schedule:
|
Every Wednesday, 4:00p.m. to 5:30p.m.
|
Venue:
|
Meeting Room 4-4, School of Information Systems, Singapore Management University | Downloads of papers and slides in this page is ONLY available for authorized group members.
|
posted Jan 12, 2011 2:10 AM by Aek Palakorn Achananuparp
[
updated Jan 16, 2011 6:50 PM
]
Time is of the Essence: Improving Recency Ranking Using Twitter Data ( paper)( slides) Author: Anlei Dong, Ruiqiang Zhang, Pranam Kolari, Bai Jing, Yi Chang, Fernando Diaz, Zhaohui Zheng, Hongyuan Zha In Proceedings of WWW'10
Abstract:Twitter is a social network and a micro-blogging
service, which becomes very popular nowadays. People use Twitter to
exchange messages, which contain fresh and useful information. This
paper proposes a ranking system for web search which utilizes Twitter
data to improve ranking results, especially to improve the freshness of
ranking results. We treat the urls that were ever referred by Twitter
users (called as Twitter urls) differently compared with regular urls. A
challenging problem for Twitter urls is that they lack click
information and anchor-text information due to their freshness, which
restrict them from being promoted appropriately in ranking results. We
analyze the unique characteristics within the twitter microcosm such as
Twitter users’ following relationship and the texts of tweets, and we
use them as new evidences for ranking Twitter urls appropriately in web
search. We then use a compositional modeling algorithm to fully use the
available data and different categories of rank features. This approach
solves the dilemma in recency ranking that fresh documents cannot be
promoted appropriately due to the lack of favorable rank features that
need to be aggregated over time. To evaluate ranking results, we not
only incorporate recency demotion into discounted cumulative grade (DCG)
for stale documents, but also use discounted cumulative freshness (DCF)
to evaluate the most fresh documents in ranking results. The efficacy
of this approach is illustrated by the experiments on real data. |
posted Oct 12, 2010 2:16 AM by Cane Leung
Group
Formation in Large Social Networks: Membership, Growth, and Evolution (paper) by Lars Backstrom, Dan Huttenlocher, Jon Kleinberg and Xiangyang Lan, In Proceedings of KDD'06
Abstract:
The processes by which communities come
together, attract new members, and develop over time is a central
research issue in the social sciences—political movements,
professional organizations, and religious denominations all provide
fundamental examples of such communities. In the digital domain,
on-line groups are becoming increasingly prominent due to the growth of
community and social networking sites such as MySpace
and LiveJournal. However, the challenge of collecting and analyzing
large-scale time-resolved data on social groups and communities has
left most basic questions about the evolution of such
groups largely unresolved: what are the structural features that
influence whether individuals will join communities, which communities
will grow rapidly, and how do the overlaps among pairs of
communities change over time? Here we address these questions using two
large sources of data:friendship links and community membership
on LiveJournal, and co-authorship and conference publications
in DBLP. Both of these datasets provide explicit user-defined
communities, where conferences serve as proxies for communities in DBLP.
We study how the evolution of these communities relates
to properties such as the structure of the underlying social
networks. We find that the propensity of individuals to join
communities, and of communities to grow rapidly, depends in subtle ways on
the underlying network structure. For example, the tendency of an
individual to join a community is influenced not just by the number of
friends he or she has within the community, but also crucially by
how those friends are connected to one another. We use
decision-tree techniques to identify the most significant structural determinants
of these properties. We also develop a novel methodology for
measuring movement of individuals between communities, and show
how such movements are closely aligned with changes in the
topics of interest within the communities.
|
posted Oct 1, 2010 8:14 PM by Meiqun Hu
Characterizing Microblogs with Topic Models (paper) (slides) (demo by the authors) (recorded talk by the first author) by Daniel Ramage, Susan Dumais and Dan Liebling In ICWSM '10AbstractAs microblogging grows in popularity, services like Twitter are coming to support information gathering needs above and beyond their traditional roles as social networks. But most users’ interaction with Twitter is still primarily focused on their social graphs, forcing the often inappropriate conflation of “people I follow” with “stuff I want to read.” We characterize some information needs that the current Twitter interface fails to support, and argue for better representations of content for solving these challenges. We present a scalable implementation of a partially supervised learning model (Labeled LDA) that maps the content of the Twitter feed into dimensions. These dimensions correspond roughly to substance, style, status, and social characteristics of posts. We characterize users and tweets using this model, and present results on two information consumption oriented tasks. |
posted Aug 31, 2010 2:22 AM by jianshu weng
How Does the Data Sampling Strategy Impact the Discovery of Information Diffusion in Social Media? (paper)(slides)Author: Munmun De Choudhury, Yu-Ru Lin, Hari Sundaram, K. Selc¸uk Candan, Lexing Xie, and Aisling Kelliher Appeared
in the Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media (ICWSM 2010) Abstract
Platforms such as Twitter have provided researchers with ample opportunities to analytically study social phenomena. There are however, significant computational challenges due to the enormous rate of production of new information: researchers are therefore, often forced to analyze a judiciously selected “sample” of the data. Like other social media phenomena, information diffusion is a social process–it is affected by user context, and topic, in addition to the graph topology. This paper studies the impact of different attribute and topology based sampling strategies on the discovery of an important social media phenomena–information diffusion. We examine several widely-adopted sampling methods that select nodes based on attribute (random, location, and activity) and topology (forest fire) as well as study the impact of attribute based seed selection on topology based sampling. Then we develop a series of metrics for evaluating the quality of the sample, based on user activity (e.g. volume, number of seeds), topological (e.g. reach, spread) and temporal characteristics (e.g. rate). We additionally correlate the diffusion volume metric with two external variables–search and news trends. Our experiments reveal that for small sample sizes (30%), a sample that incorporates both topology and usercontext (e.g. location, activity) can improve on naive methods by a significant margin of ~15-20%. |
posted Aug 23, 2010 10:37 PM by Byung-Won On
[
updated Aug 31, 2010 2:22 AM by jianshu weng
]
Detecting Leaders in Behavioral Networks (paper)(slides)Author: Ilham Esslimani, Armelle Brun, Anne Boyer Appeared in the Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2010) Abstract
The development of the Web engendered the emergence of virtual communities. Analyzing information flows and discovering leaders through these communities becomes thus, a major challenge in different application areas. In this paper, we present an algorithm that aims at detecting leaders in the context of behavioral networks. This algorithm considers the high connectivity and the potentiality of propagating accurate appreciations so as to detect reliable leaders through these networks. This approach is evaluated in terms of precision using a real usage dataset. The results of the experimentation show the interest of our approach to detect TopN behavioral leaders that predict accurately the preferences of the other users. Besides, our approach can be harnessed in different application areas caring about the role of leaders. |
posted Aug 19, 2010 8:08 PM by Freddy Chua
[
updated Aug 19, 2010 8:17 PM
]
Learning to detect events with Markov-modulated poisson processes (paper)(slides)Alexander Ihler University of California, Irvine Jon Hutchins University of California, Irvine Padhraic Smyth University of California, Irvine, CA. The paper was published in ACM Transactions on Knowledge Discovery from Data (TKDD) Volume 1, Issue 3, (December 2007). Abstract
Time-series of count data occur in many different contexts, including Internet navigation logs, freeway traffic monitoring, and security logs associated with buildings. In this article we describe a framework for detecting anomalous events in such data using an unsupervised learning approach. Normal periodic behavior is modeled via a time-varying Poisson process model, which in turn is modulated by a hidden Markov process that accounts for bursty events. We outline a Bayesian framework for learning the parameters of this model from count time-series. Two large real-world datasets of time-series counts are used as testbeds to validate the approach, consisting of freeway traffic data and logs of people entering and exiting a building. We show that the proposed model is significantly more accurate at detecting known events than a more traditional threshold-based technique. We also describe how the model can be used to investigate different degrees of periodicity in the data, including systematic day-of-week and time-of-day effects, and to make inferences about different aspects of events such as number of vehicles or people involved. The results indicate that the Markov-modulated Poisson framework provides a robust and accurate framework for adaptively and autonomously learning how to separate unusual bursty events from traces of normal human activity.
|
posted Aug 5, 2010 8:33 PM by Meiqun Hu
PET: A Statistical Model for Popular Events Tracking in Social Communities (paper) (slides)
by Cindy Xide Lin, Bo Zhao, Qiaozhu Mei and Jiawei Han
in proceedings of KDD '10
Abstract
User generated information in online communities has been characterized with the mixture of a text stream and a network structure both changing over time. A good example is a web-blogging community with the daily blog posts and a social network of bloggers. An important task of analyzing an online community is to observe and track the popular events, or topics that evolve over time in the community. Existing approaches usually focus on either the burstiness of topics or the evolution of networks, but ignoring the interplay between textual topics and network structures. In this paper, we formally define the problem of popular event tracking (PET) in online communities, focusing on the interplay between textual content and social networks. We propose a novel statistical method that models the popularity of events over time, taking into consideration the burstiness of user interest, information diffusion in the network structure, and the evolution of textual topics. Specifically, a Gibbs Random Field is defined to model the influence of historical status of actors in the network and the dependency relationships among them; thereafter a topic model generates the words in text content of the event, regularized by the Gibbs Random Field. We prove that two classical models of information diffusion and text burstiness are special cases of our model under certain conditions. Empirical experiments with two different communities and datasets (i.e., Twitter and DBLP) show that our approach is effective and outperforms existing methods.
|
posted Jul 20, 2010 1:58 AM by Cane Leung
Suggesting
Friends Using the Implicit Social Graph (paper) (slides) by M. Roth, A. Ben-David, D. Deutscher, G. Flysher,
I. Horn, A. Leichtberg, N. Leiser, Y. Matias & R. Merom In KDD'10
Abstract:
Although users of online communication tools rarely categorize their contacts into groups such as "family", "co-workers", or "jogging buddies", they nonetheless implicitly cluster contacts, by virtue of their interactions with them, forming implicit groups. In this paper, we describe the implicit social graph which is formed by users' interactions with contacts and groups of contacts, and which is distinct from explicit social graphs in which users explicitly add other individuals as their "friends". We introduce an interaction-based metric for estimating a user's affinity to his contacts and groups. We then describe a novel friend suggestion algorithm that uses a user's implicit social graph to generate a friend group, given a small seed set of contacts which the user has already labeled as friends. We show experimental results that demonstrate the importance of both implicit group relationships and interaction-based affinity ranking in suggesting friends. Finally, we discuss two applications of the Friend Suggest algorithm that have been released as Gmail Labs features.
|
posted Jun 15, 2010 2:24 AM by jianshu weng
Title: Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors (paper)(slide)appeared in WWW '10 Abstract: Twitter, a popular microblogging service, has received much attention recently. An important characteristic of Twitter is its real-time nature. For example, when an earthquake occurs, people make many Twitter posts (tweets) related to the earthquake, which enables detection of earthquake occurrence promptly, simply by observing the tweets. As described in this paper, we investigate the real-time interaction of events such as earthquakes, in Twitter, and propose an algorithm to monitor tweets and to detect a target event. To detect a target event, we devise a classifier of tweets based on features such as the keywords in a tweet, the number of words, and their context. Subsequently, we produce a probabilistic spatiotemporal model for the target event that can find the center and the trajectory of the event location. We consider each Twitter user as a sensor and apply Kalman filtering and particle filtering, which are widely used for location estimation in ubiquitous/pervasive computing. The particle filter works better than other compared methods in estimating the centers of earthquakes and the trajectories of typhoons. As an application, we construct an earthquake reporting system in Japan. Because of the numerous earthquakes and the large number of Twitter users throughout the country, we can detect an earthquake by monitoring tweets with high probability (96% of earthquakes of Japan Meteorological Agency (JMA) seismic intensity scale 3 or more are detected). Our system detects earthquakes promptly and sends e-mails to registered users. Notification is delivered much faster than the announcements that are broadcast by the JMA. |
posted Jun 8, 2010 7:37 PM by Byung-Won On
[
updated Jun 8, 2010 7:50 PM
]
Preferential Behavior in Online Groups (paper)(slides) which appeared in 3rd ACM Int'l Conf. on Web Search and Data Mining, 2008. L. Backstrom at Cornell Univ.
R. Kumar, C. Marlow, J. Novak, A. Tomkins at Yahoo! Research
ABSTRACT
Online communities in the form of message boards, listservs, and newsgroups continue to represent a considerable amount of the social activity on the Internet. Every year thousands of groups flourish while others decline into relative obscurity; likewise, millions of members join a new community every year, some of whom will come to manage or moderate the conversation while others simply sit by the sidelines and observe. These processes of group formation, growth, and dissolution are central in social science, and in an online venue they have ramifications for the design and development of community software.
In this paper, we explore a large corpus of thriving online communities. These groups vary widely in size, moderation and privacy, and cover an equally diverse set of subject matter. We present a broad range of descriptive statistics of these groups. Using metadata from groups, members, and individual messages, we identify users who post and are replied-to frequently by multiple group members; we classify these high-engagement users based on the longevity of their engagements. We show that users who will go on to become long-lived, highly engaged users experience significantly better treatement than other users from the moment they join the group, well before there is an opportunity for them to develop a long standing relationship with members of the group.
We present a simple model explaining long-term heavy engagement as a combination of user-dependent and group dependent factors. Using this model as an analytical tool, we show that properties of the user alone are sufficient to explain 95% of all memberships, but introducing a small amount of per group information dramatically improves our ability to model users belonging to multiple groups. |
|