Location-Based Social Network Data Generation

Based on Patterns of Life

Authors

Joon-Seok Kim (George Mason University), Hyunjee Jin (George Mason University), Hamdi Kavak (George Mason University), Ovi Chris Rouly (Tulane University), Andrew Crooks (George Mason University), Dieter Pfoser (George Mason University), Carola Wenk (Tulane University), Andreas Züfle (George Mason University)

Abstract

Location-based social networks (LBSNs) have been studied extensively in recent years. However, utilizing real-world LBSN data sets in such studies yields several weaknesses: sparse and small data sets, privacy concerns, and a lack of authoritative ground-truth. To overcome these weaknesses, we leverage a large-scale geospatial simulation to create a framework to simulate human behavior and to create synthetic but realistic LBSN data based on human patterns of life. Such data not only captures the location of users over time but also their social interactions via their social networks. Such patterns of life are simulated by giving agents (i.e., people) an array of "needs" that they aim to satisfy. For instance, agents go home when they are tired, go to restaurants when they are hungry, they go to work to fulfill their financial needs, and go to recreational sites to meet friends and satisfy their social need. While existing real-world LBSN data sets are trivially small, the proposed framework provides a source for massive LBSN benchmark data that closely mimics the real-world. As such it allows us to capture 100% of the (simulated) population without any data uncertainty, privacy-related concerns, or incompleteness. It allows researchers to see the (simulated) world through the lens of an omniscient entity having perfect data. Our framework is made available to the community. In addition, we provide a series of simulated benchmark LBSN data sets using different real-world urban environments obtained from OpenSteetMap. These data sets, which comprise gigabytes of spatio-temporal and temporal social network data taken at 5-minute intervals, are made available to the research community.

Overview of Location-Based Social Network (LBSN)

Why Synthetic Data?

Publicly available real-world data sets have been the driving force for LBSN research in recent years, but such data sets exhibit certain weaknesses:

  • Data sparsity: LBSN data exhibits an extreme long-tail distribution of user behavior. In all existing available data sets, the vast majority of users has less than ten check-ins. Besides, the number of locations visited by a user is usually only a small portion of all locations that user has visited. This results in the density of the data used in experimental studies on LBSNs to be only usually around 0.1%.
  • Small data sets: Existing data sets used to train models are small. They tend to only cover a short period of time, a small number of users, or a small number of check-ins.
  • Privacy Concerns: Most LBSN data was published by users and consented for public use. However, some users may revoke this consent, for instance, by deleting their LBSN account. Such changes will not be reflected in existing LBSN data sets and thus creating severe privacy concerns.
  • No ground-truth: There is no way to assess, in existing LBSN data, whether check-ins are missing or if the social network is correct and complete. Without knowing the ground truth, it is difficult to assess the accuracy and robustness of existing experimental results using LBSN data.

Socio-spatial Simulation Settings

Social network and data visualization

The following demo videos visualize social network evolution over time with basic statistics.

Number of Agents: 1,000 / Maps: TownS (virtual city)

Number of Agents: 1,000 / Maps: TownL (virtual city)

Number of Agents: 1,000 / Maps: NOLA (New Orleans, Louisiana)

Number of Agents: 1,000 / Maps: GMU (George Mason University, Fairfax, Virginia)

Maps used for Location-Based Social Network Simulation

New Orleans, Louisiana (NOLA), Mississippi River, Lake Pontchartrain, and the French Quarter

Maps for NOLA dataset

Virtual City (TownL)

Maps for TownL dataset

George Mason University (GMU), Fairfax, VA.

Maps for GMU dataset

Virtual City (TownS)

Maps for TownS dataset

Analysis on LSBN Datasets

The following graphs show the comparison of the average social network degrees that chance over time between different scenarios.

avgNetworkDegree-GMU.pdf

GMU scenarios


avgNetworkDegree-NOLA.pdf

NOLA scenarios


avgNetworkDegree-TownS.pdf

TownS scenarios


avgNetworkDegree-TownL.pdf

TownL scenarios


avgNetworkDegree-1K.pdf

1K scenarios


avgNetworkDegree-3K.pdf

3K scenarios


avgNetworkDegree-5K.pdf

5K scenarios


all-avgNetworkDegree.pdf

All scenarios


Resources:

  • Source code is publicly available at GitHub.
  • Generated LSBN Data is publicly available at OSF. When using the dataset, please cite:
    • J.-S. Kim, H. Jin, H. Kavak, O. Rouly, A. Crooks, D. Pfoser, C. Wenk, and A. Züfle, “Location-Based Social Network Data Generation Based on Patterns of Life,” IEEE International Conference on Mobile Data Management (MDM 2020)
  • A accepted version of the paper is available (PDF).

Related Research:

  • J-S. Kim, H. Kavak, C. O. Rouly, H. Jin, A. Crooks, D. Pfoser, C. Wenk, A. Züfle, “Location-Based Social Simulation for Prescriptive Analytics of Disease Spread,” SIGSPATIAL Special, March 2020, Volume 12, Issue 1, pp 53-61
  • H. Kavak, J.-S. Kim, A. Crooks, D. Pfoser, C. Wenk, and A. Züfle, “Location-Based Social Simulation,” In Proceedings of International Symposium on Spatial and Temporal Databases (SSTD 2019), August 2019, pp. 218-221
  • J.-S. Kim, H Kavak, U. Manzoor, A. Crooks, D. Pfoser, C. Wenk, and A. Züfle, “Simulating Urban Patterns of Life: A Geo-Social Data Generation Framework,” In Proceedings of ACM SIGSPATIAL GIS ’19, November 2019, pp. 576-579