Location-Based Social Network Data Generation
Based on Patterns of Life
Joon-Seok Kim (George Mason University), Hyunjee Jin (George Mason University), Hamdi Kavak (George Mason University), Ovi Chris Rouly (Tulane University), Andrew Crooks (George Mason University), Dieter Pfoser (George Mason University), Carola Wenk (Tulane University), Andreas Züfle (George Mason University)
Location-based social networks (LBSNs) have been studied extensively in recent years. However, utilizing real-world LBSN data sets in such studies yields several weaknesses: sparse and small data sets, privacy concerns, and a lack of authoritative ground-truth. To overcome these weaknesses, we leverage a large-scale geospatial simulation to create a framework to simulate human behavior and to create synthetic but realistic LBSN data based on human patterns of life. Such data not only captures the location of users over time but also their social interactions via their social networks. Such patterns of life are simulated by giving agents (i.e., people) an array of "needs" that they aim to satisfy. For instance, agents go home when they are tired, go to restaurants when they are hungry, they go to work to fulfill their financial needs, and go to recreational sites to meet friends and satisfy their social need. While existing real-world LBSN data sets are trivially small, the proposed framework provides a source for massive LBSN benchmark data that closely mimics the real-world. As such it allows us to capture 100% of the (simulated) population without any data uncertainty, privacy-related concerns, or incompleteness. It allows researchers to see the (simulated) world through the lens of an omniscient entity having perfect data. Our framework is made available to the community. In addition, we provide a series of simulated benchmark LBSN data sets using different real-world urban environments obtained from OpenSteetMap. These data sets, which comprise gigabytes of spatio-temporal and temporal social network data taken at 5-minute intervals, are made available to the research community.
Overview of Location-Based Social Network (LBSN)
Why Synthetic Data?
Publicly available real-world data sets have been the driving force for LBSN research in recent years, but such data sets exhibit certain weaknesses:
Data sparsity: LBSN data exhibits an extreme long-tail distribution of user behavior. In all existing available data sets, the vast majority of users has less than ten check-ins. Besides, the number of locations visited by a user is usually only a small portion of all locations that user has visited. This results in the density of the data used in experimental studies on LBSNs to be only usually around 0.1%.
Small data sets: Existing data sets used to train models are small. They tend to only cover a short period of time, a small number of users, or a small number of check-ins.
Privacy Concerns: Most LBSN data was published by users and consented for public use. However, some users may revoke this consent, for instance, by deleting their LBSN account. Such changes will not be reflected in existing LBSN data sets and thus creating severe privacy concerns.
No ground-truth: There is no way to assess, in existing LBSN data, whether check-ins are missing or if the social network is correct and complete. Without knowing the ground truth, it is difficult to assess the accuracy and robustness of existing experimental results using LBSN data.
Socio-spatial Simulation Settings
Social network and data visualization
The following demo videos visualize social network evolution over time with basic statistics.
Number of Agents: 1,000 / Maps: TownS (virtual city)
Number of Agents: 1,000 / Maps: TownL (virtual city)
Number of Agents: 1,000 / Maps: NOLA (New Orleans, Louisiana)
Number of Agents: 1,000 / Maps: GMU (George Mason University, Fairfax, Virginia)
Maps used for Location-Based Social Network Simulation
New Orleans, Louisiana (NOLA), Mississippi River, Lake Pontchartrain, and the French Quarter
Maps for NOLA dataset
Virtual City (TownL)
Maps for TownL dataset
George Mason University (GMU), Fairfax, VA.
Maps for GMU dataset
Virtual City (TownS)
Maps for TownS dataset
Analysis on LSBN Datasets
The following graphs show the comparison of the average social network degrees that chance over time between different scenarios.
Source code is publicly available at GitHub.
Generated LSBN Data is publicly available at OSF. When using the dataset, please cite:
J.-S. Kim, H. Jin, H. Kavak, O. Rouly, A. Crooks, D. Pfoser, C. Wenk, and A. Züfle, “Location-Based Social Network Data Generation Based on Patterns of Life,” In Proceedings of 21st IEEE International Conference on Mobile Data Management (MDM 2020), pp. 158-167
An accepted version of the paper is available (PDF).
J-S. Kim, H. Kavak, C. O. Rouly, H. Jin, A. Crooks, D. Pfoser, C. Wenk, A. Züfle, “Location-Based Social Simulation for Prescriptive Analytics of Disease Spread,” SIGSPATIAL Special, March 2020, Volume 12, Issue 1, pp 53-61
Project link: https://geosocial.joonseok.org
H. Kavak, J.-S. Kim, A. Crooks, D. Pfoser, C. Wenk, and A. Züfle, “Location-Based Social Simulation,” In Proceedings of International Symposium on Spatial and Temporal Databases (SSTD 2019), August 2019, pp. 218-221
J.-S. Kim, H Kavak, U. Manzoor, A. Crooks, D. Pfoser, C. Wenk, and A. Züfle, “Simulating Urban Patterns of Life: A Geo-Social Data Generation Framework,” In Proceedings of ACM SIGSPATIAL GIS ’19, November 2019, pp. 576-579
Project link: http://sigspatial19demo.joonseok.org