Pin-Yu Chen's Webpage

Twitter event propagation datasets can be found here

We collected the traces of three recent events on Twitter during a period of two weeks through the Twitter API. The datasets include event propagation patterns on a Twitter follower network, including both retweeters and non-retweeters. These events include URLs and hashtags specified as follows.

Obama FB: tweets including the URL "http://Facebook.com/POTUS'" from November 9th to November 23rd in 2015. The URL links to U.S. President Obama's personal Facebook page, and was firstly being posted by his personal Twitter account on November 9th 2015.
Premier 12: tweets including the hashtag "#premier12" from November 19th to December 3rd in 2015. Premier 12 is a flagship international baseball tournament organized by the World Baseball Softball Confederation (WBSC), featuring the twelve best-ranked national baseball teams in the world.
AlphaGo: tweets including the hashtag "#AlphaGo" from January 27th to February 10th in 2016. AlphaGo is a computer program developed by Google DeepMind in London to play the board game Go. On January 27th 2016, the news of AlphaGo defeating a European Go champion was announced along with the algorithm published in Nature.

If you use these datasets, please cite the following papers:

[1] P.-Y. Chen, C.-C. Tu, P.-S. Ting, Y.-Y. Luo, D. Koutra, and A. O. Hero, “Identifying Influential Links for Event Propagation on Twitter: A Network of Networks Approach,” IEEE Transactions on Signal and Information Processing over Networks, 2018

Exam datasets for research on crowdsourcing can be found here

The exam dataset is collected by us from one junior high school and one senior high school in Taiwan.

It is released for research purpose only. The answers provided by students can be viewed as labels on exam questions.

If you use this dataset, please cite the following papers:

[1] P.-Y. Chen, C.-W. Lien, F.-J. Chu, P.-S. Ting, and S.-M. Cheng, “Supervised Collective Classification for Crowdsourcing,” IEEE GLOBECOM Workshop, 2015

[2] P.-Y. Chen, S.-M. Cheng, P.-S. Ting, C.-W. Lien, and F.-J Chu, “When Crowdsourcing Meets Mobile Sensing: A Social Network Perspective,” IEEE Communications Magazine, 2015

Traces of actual lateral movement attack can be found here

This dataset is collected by us from a real enterprise network. It contains heterogeneous connectivity patterns in terms of host-application information. There are two files in the dataset: one containing normal traffic and lateral movement traces, and the other containing propagation paths of lateral movements.

If you use this dataset, please cite our technical report - Enterprise Cyber Resiliency Against Lateral Movement: A Graph Theoretic Approach

Temporal collaboration network of Jure Leskovec and Andrews Ng (with ground-truth community labels) can be found here

This dataset is collected by Baichuan Zhang. It contains the coauthors of Prof. Jure Leskovec or Prof. Andrew Ng at Stanford University from year 1995 to year 2014. We partition this 20-year co-authorship into 4 different 5-year intervals and hence create a 4-layer multilayer graph. For each layer, there is an edge between two researchers if they co-authored at least one paper in the 5-year interval. For every edge in each layer, we adopt the temporal collaboration strength as the edge weight proposed in [2,3]. We manually label each researcher by either ``Leskovec's collaborator'' or ``Ng's collaborator'' based on the collaboration frequency and use the labels as the ground-truth cluster assignment. The ground-truth clusters with researcher names and collaboration strengths are displayed below.

If you use this dataset, please cite the following papers:

[1] P.-Y. Chen and A. O. Hero, “Multilayer Spectral Graph Clustering via Convex Layer Aggregation: Theory and Algorithms,” IEEE Transactions on Signal and Information Processing over Networks, 2017

[2] B. Zhang, T. K. Saha, and M. Al Hasan, “Name disambiguation from link data in a collaboration graph,” in IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2014

[3] T. K. Saha, B. Zhang, and M. Al Hasan, “Name disambiguation from link data in a collaboration graph using temporal and topological features,” Social Network Analysis and Mining, 2015