Joon-Seok Kim (George Mason University), Hyunjee Jin (George Mason University), Hamdi Kavak (George Mason University), Ovi Chris Rouly (Tulane University), Andrew Crooks (George Mason University), Dieter Pfoser (George Mason University), Carola Wenk (Tulane University), Andreas Züfle (George Mason University)
Location-based social networks (LBSNs) have been studied extensively in recent years. However, utilizing real-world LBSN data sets in such studies yields several weaknesses: sparse and small data sets, privacy concerns, and a lack of authoritative ground-truth. To overcome these weaknesses, we leverage a large-scale geospatial simulation to create a framework to simulate human behavior and to create synthetic but realistic LBSN data based on human patterns of life. Such data not only captures the location of users over time but also their social interactions via their social networks. Such patterns of life are simulated by giving agents (i.e., people) an array of "needs" that they aim to satisfy. For instance, agents go home when they are tired, go to restaurants when they are hungry, they go to work to fulfill their financial needs, and go to recreational sites to meet friends and satisfy their social need. While existing real-world LBSN data sets are trivially small, the proposed framework provides a source for massive LBSN benchmark data that closely mimics the real-world. As such it allows us to capture 100% of the (simulated) population without any data uncertainty, privacy-related concerns, or incompleteness. It allows researchers to see the (simulated) world through the lens of an omniscient entity having perfect data. Our framework is made available to the community. In addition, we provide a series of simulated benchmark LBSN data sets using different real-world urban environments obtained from OpenSteetMap. These data sets, which comprise gigabytes of spatio-temporal and temporal social network data taken at 5-minute intervals, are made available to the research community.
Overview of Location-Based Social Network (LBSN)
Publicly available real-world data sets have been the driving force for LBSN research in recent years, but such data sets exhibit certain weaknesses:
Data sparsity: LBSN data exhibits an extreme long-tail distribution of user behavior. In all existing available data sets, the vast majority of users has less than ten check-ins. Besides, the number of locations visited by a user is usually only a small portion of all locations that user has visited. This results in the density of the data used in experimental studies on LBSNs to be only usually around 0.1%.
Small data sets: Existing data sets used to train models are small. They tend to only cover a short period of time, a small number of users, or a small number of check-ins.
Privacy Concerns: Most LBSN data was published by users and consented for public use. However, some users may revoke this consent, for instance, by deleting their LBSN account. Such changes will not be reflected in existing LBSN data sets and thus creating severe privacy concerns.
No ground-truth: There is no way to assess, in existing LBSN data, whether check-ins are missing or if the social network is correct and complete. Without knowing the ground truth, it is difficult to assess the accuracy and robustness of existing experimental results using LBSN data.
The following demo videos visualize social network evolution over time with basic statistics.
Number of Agents: 1,000 / Maps: TownS (virtual city)
Number of Agents: 1,000 / Maps: TownL (virtual city)
Number of Agents: 1,000 / Maps: NOLA (New Orleans, Louisiana)
Number of Agents: 1,000 / Maps: GMU (George Mason University, Fairfax, Virginia)
Maps for NOLA dataset
Maps for TownL dataset
Maps for GMU dataset
Maps for TownS dataset
This Research:
Kim, Joon-Seok, Hyunjee Jin, Hamdi Kavak, Ovi Chris Rouly, Andrew Crooks, Dieter Pfoser, Carola Wenk, and Andreas Züfle. "Location-based social network data generation based on patterns of life." In 2020 21st IEEE International Conference on Mobile Data Management (MDM), pp. 158-167. IEEE, 2020.
Resources:
Source code is publicly available at GitHub.
Generated LSBN Data is publicly available at OSF. When using the dataset, please cite:
J.-S. Kim, H. Jin, H. Kavak, O. Rouly, A. Crooks, D. Pfoser, C. Wenk, and A. Züfle, “Location-Based Social Network Data Generation Based on Patterns of Life,” In Proceedings of 21st IEEE International Conference on Mobile Data Management (MDM 2020), pp. 158-167
An accepted version of the paper is available (PDF).
Related Research:
Amiri, Hossein, Will Kohn, Shiyang Ruan, Joon-Seok Kim, Hamdi Kavak, Andrew Crooks, Dieter Pfoser, Carola Wenk, and Andreas Züfle. "The Patterns of Life Human Mobility Simulation." In Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems, pp. 653-656. 2024.
Amiri, Hossein, Shiyang Ruan, Joon-Seok Kim, Hyunjee Jin, Hamdi Kavak, Andrew Crooks, Dieter Pfoser, Carola Wenk, and Andreas Zufle. "Massive Trajectory Data Based on Patterns of Life." In Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems, pp. 1-4. 2023.
Züfle, Andreas, Carola Wenk, Dieter Pfoser, Andrew Crooks, Joon-Seok Kim, Hamdi Kavak, Umar Manzoor, and Hyunjee Jin. "Urban life: a model of people and places." Computational and Mathematical Organization Theory 29, no. 1 (2023): 20-51.
Kim, Joon-Seok, Hamdi Kavak, Chris Ovi Rouly, Hyunjee Jin, Andrew Crooks, Dieter Pfoser, Carola Wenk, and Andreas Züfle. "Location-Based Social Simulation for Prescriptive Analytics of Disease Spread." SIGSPATIAL Special 12, no. 1 (2020): 53-61.
Project link: https://geosocial.joonseok.org
Kavak, Hamdi, Joon-Seok Kim, Andrew Crooks, Dieter Pfoser, Carola Wenk, and Andreas Züfle. "Location-Based Social Simulation." In Proceedings of the 16th international Symposium on Spatial and Temporal Databases, pp. 218-221. 2019.
Kim, Joon-Seok, Hamdi Kavak, Umar Manzoor, Andrew Crooks, Dieter Pfoser, Carola Wenk, and Andreas Züfle. "Simulating Urban Patterns of Life: A Geo-Social Data Generation Framework." In Proceedings of the 27th ACM SIGSPATIAL international Conference on Advances in Geographic Information Systems, pp. 576-579. 2019.
Project link: https://sigspatial19demo.joonseok.org