Learn how to work with Twitter and other web 2.0 data

Post date: May 05, 2012 9:47:54 PM

SPRING WORKSHOP ON COMPUTATIONAL SOCIAL SCIENCE

May 30- June 1, 2012

The Institute for Quantitative Social Science, Harvard University

Contact m.lee@neu.edu to register (note registration fee $50/day)

Space is limited!

Sponsored by:

The Northeastern Centers for Computational Social Science and Digital Humanities

The Institute for Quantitative Social Science, Harvard

The Human Dynamics Lab, MIT

May 30

8:30am-9am: Registration

9-10am: Opportunities and challenges in the study of digital traces

David Lazer, welcome and introductory remarks

10am-6pm (with breaks)

Workshop 1: From Tweets to Results: How to obtain, mine, and analyze Twitter data

Derek Ruths (McGill University)

Since Twitter's creation in 2006, it has become one of the most popular microblogging platforms in the world. By virtue of its popularity, the relative structural simplicity of Twitter posts, and a tendency towards relaxed privacy settings, Twitter has also become a popular data source for research on a range of topics in sociology, psychology, political science, and anthropology. Nonetheless, despite its widespread use in the research community, there are many pitfalls when working with Twitter data.

In this day-long workshop, we will lead participants through the entire Twitter-based research pipeline: from obtaining Twitter data all the way through performing some of the sophisticated analyses that have been featured in recent high-profile publications. In the morning, we will cover the nuts and bolts of obtaining and working with a Twitter dataset including: using the Twitter API, the firehose, and rate limits; strategies for storing and filtering Twitter data; and how to publish your dataset for other researchers to use. In the afternoon, we will delve into techniques for analyzing Twitter content including constructing retweet, mention, and follower networks; measuring the sentiment of tweets; and inferring the gender of users from their profiles and unstructured text.

We assume that participants will have little to no prior experience with mining Twitter or other social network datasets. As the workshop will be interactive, participants are encouraged to bring a laptop. Code examples and exercises will be given in Python, thus participants should have some familiarity with the language. However, all concepts and techniques covered will be language-independent, so any individual with some background in scripting or programming will benefit from the workshop.

May 31

9am-5pm (with breaks): Workshop 2: Network Visualization

Yu-Ru Lin (Northeastern/Harvard Universities)

The recent availability of new cutting edge datasets such as open government data, cell phone call records and social media communication streams offers unprecedented opportunities to study human behaviors and their relationship to the social system. Relationships between various types of entities arise naturally in the study of social networks as well as many applications such as information retrieval and business intelligence. The interrelated information can be effectively represented as networks, where nodes are various types of entities and edges are relationships. Network visualization serves as a powerful tool to build intuitions, to systematically explore the structures or peculiar patterns of the data, and to communicate findings.

This tutorial aims to provide practical knowledge on network visualization, using the open source tool Gephi. The tutorial will cover three components:

(1) Understand the visual complexity and an effective way of communicating networked data.

(2) Convey network properties and structure through Gephi’s functionality.

(3) Use Gephi’s advanced features to explore the networks of political contributions, political texts, etc. The tutorial is intended for scholars and researchers who wish to learn how to incorporate network visualization to speed up the data exploration and to communicate the data insights.

Requirements: Familiarity with basic network concepts is preferred but not essential. Participants should come with their own laptop with Gephi installed (The installation instructions will be given to the participants prior to the tutorial).

June 1

10am-12pm: Self-organized discussions

This will be an opportunity for workshop participants to organize into groups to discuss particular opportunities and challenges in specific substantive domains.

1pm-5pm: Workshop 3: Studying the dynamics of human proximity

Human Dynamics Lab, MIT/ Prof. Alex Pentland, Director.

During the last decade we have developed measurement toolkits based on electronic badges, smart phones, and signal processing that allow us to accurately quantify human behavior in everyday situations on a continuous basis over long time periods. In this tutorial we will describe the sociometric badges and Android platform sociometric software that we have developed, covering their function, capability, and typical use. These tools will be made available to interested participants. We will also cover the mathematical toolkit we have developed, describing the theory, capability, and typical use. These tools will also be made available to participants.

Finally, we will illustrate the use of our sociometric measurement tools together with our mathematical analysis tools on a variety of problems, including individual (e.g., passive screening for health problems), small group (e.g., providing a real-time performance meter for groups), organizations (e.g., reengineering communication patterns for greater productivity), and large-scale sociocultural outcomes (e.g., diabetes risk, crime risk). For additional information see http://hd.media.mit.edu