Welcome to the homepage of the SUM (Social User Mining) project at Penn State University.

The SUM project aims to develop novel algorithmic solutions, working prototypes, and innovative applications for mining about users from their "publicly-available" social media data. Note that no illegally crawled or scraped data violating users' privacy or terms-of-service of social network sites will be used. Some of interesting information that we aim to mine about social users include:
  • Demographics  (e.g., gender, ethnicity, age, marital/parental status)
  • Profile (e.g., personality, job, religion, hobby, political opinions)
  • Temporal Pattern (e.g., daily/weekly/monthly pattern)
  • Spatial Pattern (e.g., home location, human traffic pattern, frequent POI types)
While many social network sites offer users to provide their demographic and profile information upon registration, only small percentage of users  provides them for various reasons (e.g., privacy concern, laziness). As such, being able to automatically discover such missing profile information of users in social media bears important and practical implications in real settings. For instance, social network sites themselves can use the home location information of users to personalize user accounts or contents accordingly. Similarly, companies can use the gender or age information of users to focus their advertisement campaigns further.

In the SUM project, in particular, we plan to study the following problems:
  • What is the technical landscape and solution space of the problem in general?
  • How to create good quality ground truth data set to various profile information?
  • Which social media data (e.g., user-generated contents, social network features, metadata) is the most useful one to discover user profile information?
  • How to combine different social media data to improve the performance of overall solutions? What is the holistic framework?
  • What are the effective and scalable data mining solutions to mine such social user  information?