Datasets

List of Datasets

  • Flickr User-POI Visits Dataset: List of users and their visits to various points-of-interests (POIs) in eight cities, based on Flickr.
  • Melbourne User-POI Visits Dataset: List of users and their visits to various points-of-interests (POIs) in Melbourne, based on Flickr.
  • Theme Park Attraction Visits Dataset: List of users and their visits to various attractions/rides in five theme parks, based on Flickr.

Flickr User-POI Visits Dataset

Dataset Information:

This dataset comprises a set of users and their visits to various points-of-interests (POIs) in eight cities. The user-POI visits are determined based on geo-tagged YFCC100M Flickr photos that are: (i) mapped to specific POIs location and POI categories; and (ii) grouped into individual travel sequences (consecutive user-POI visits that differ by <8hrs). Other associated datasets are the "List of POIs" dataset ("POI-{cityName}.csv" files from "poiList-ijcai15.zip") and "POI Cost-Profit Table" dataset ("costProfCat-{cityName}POI-all.csv" files from "costProf-ijcai15.zip").

File Description and Dataset Statistics:

All user-POI visits in each city are stored in a single csv file that contains the following columns/fields:

- photoID: identifier of the photo based on Flickr.

- userID: identifier of the user based on Flickr.

- dateTaken: the date/time that the photo was taken (unix timestamp format).

- poiID: identifier of the place-of-interest (Flickr photos are mapped to POIs based on their lat/long).

- poiTheme: category of the POI (e.g., Park, Museum, Cultural, etc).

- poiFreq: number of times this POI has been visited.

- seqID: travel sequence no. (consecutive POI visits by the same user that differ by <8hrs are grouped as one travel sequence).

Download:

Please click here to download this dataset.

References / Citations:

If you use this dataset, please cite the following paper:

- Kwan Hui Lim, Jeffrey Chan, Christopher Leckie and Shanika Karunasekera. "Personalized Tour Recommendation based on User Interests and Points of Interest Visit Durations". In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI'15). Pg 1778-1784. Jul 2015.

- Kwan Hui Lim, Jeffrey Chan, Christopher Leckie and Shanika Karunasekera. "Towards Next Generation Touring: Personalized Group Tours". In Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS'16). Pg 412-420. Jun 2016.

Melbourne User-POI Visits Dataset

Dataset Information:

This dataset comprises a set of users and their visits to various points-of-interest (POIs) in Melbourne, with a total of 3975 tours and 17,087 visits. The user-POI visits are determined based on geo-tagged YFCC100M Flickr photos that are: (i) mapped to specific POIs location and POI categories; and (ii) grouped into individual travel sequences (consecutive user-POI visits that differ by <8hrs).

All user-POI visits in each city are stored in a single csv file that contains the following columns/fields:

- photoID: identifier of the photo based on Flickr.

- userID: identifier of the user based on Flickr.

- dateTaken: the date/time that the photo was taken (unix timestamp format).

- poiID: identifier of the place-of-interest (Flickr photos are mapped to POIs based on their lat/long).

- poiTheme: category of the POI (e.g., Park, Museum, Cultural, etc).

- poiFreq: number of times this POI has been visited.

- seqID: travel sequence no. (consecutive POI visits by the same user that differ by <8hrs are grouped as one travel sequence).

In addition, the list of POIs can be found in "POI-Melb.csv", along with their POI category (theme), sub-category (subTheme) and lat/long coordinates.

Download:

Please click here to download this dataset.

References / Citations:

If you use this dataset, please cite the following paper:

- Xiaoting Wang, Christopher Leckie, Jeffery Chan, Kwan Hui Lim and Tharshan Vaithianathan. "Improving Personalized Trip Recommendation to Avoid Crowds Using Pedestrian Sensor Data". In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM'16). Pg 25-34. Oct 2016.

Theme Park Attraction Visits Dataset

Dataset Information:

This dataset comprises a set of users and their visits to various attractions in five theme parks (Disneyland, Epcot, California Adventure, Disney Hollywood and Magic Kindgom). The user-attraction visits are determined based on geo-tagged Flickr photos that are: (i) posted from Aug 2007 to Aug 2017 and retrieved using the Flickr API; (ii) then mapped to specific attraction location and attraction categories; and (iii) then grouped into individual travel sequences (consecutive user-attraction visits that differ by <8hrs). Other associated datasets are the "List of Attractions/POIs" dataset ("POI-{themeParkName}.csv" files from "poiList-sigir17.zip") and "Attraction/POI Cost-Profit Table" dataset ("costProfCat-{themeParkName}POI-all.csv" files from "costProf-sigir17.zip").

All user-attraction visits in each themepark are stored in a single csv file that contains the following columns/fields:

- photoID: identifier of the photo based on Flickr.

- userID: identifier of the user based on Flickr.

- dateTaken: the date/time that the photo was taken (unix timestamp format).

- poiID: identifier of the attraction (Flickr photos are mapped to attraction based on their lat/long).

- poiTheme: category of the attraction (e.g., Roller Coaster, Family, Water, etc).

- poiFreq: number of times this attraction has been visited.

- rideDuration: the normal ride duration of this attraction.

- seqID: travel sequence no. (consecutive attraction visits by the same user that differ by <8hrs are grouped as one travel sequence).

Download:

Please click here to download this dataset.

References / Citations:

If you use this dataset, please cite the following paper:

- Kwan Hui Lim, Jeffrey Chan, Shanika Karunasekera and Christopher Leckie. "Personalized Itinerary Recommendation with Queuing Time Awareness". Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'17). Pg 325-334. Aug 2017.