Knowledge Discovery from

Temporal Social Networks

Fabíola S. F. Pereira, Shazia Tabassum and João Gama

Federal University of Uberlândia, Brazil

INESC TEC, University of Porto, Portugal

TUTORIAL - SDM 2018

May 3-5, San Diego, USA

Summary

Data is structured in the form of networks. And now? How to analyze them? Extracting knowledge of network data is not a simple task and requires the use of appropriate tools and techniques, especially in scenarios that take into account the volume and evolving aspects of the network. There is a vast literature on how to collect, process, and model social media data in the form of networks, as well as key metrics of centrality. However, there is still much to be discussed in relation to the analysis of the underlying network. In this tutorial we consider that data has already been collected and is already structured as a network. The goal is to discuss techniques to analyze network data, especially considering time perspective. First, concepts related to problem definition, temporal networks and metrics for network analysis will be presented. Next, in a more practical aspect will be shown techniques of visualization and processing of temporal networks. In the end, applications with real data will be discussed, illustrating how network data knowledge extraction works from start to finish.

Content

1 Introduction & Motivation

    • Fundamental Concepts
    • Challenges in Mining Evolving Networks
    • Tutorial Goals

2 Evolving Centralities in Temporal Graphs

    • Temporal Networks
    • Temporal Metrics
    • Application: evolving temporal centralities in Twitter network

3 Change Detection in Evolving Networks

    • Aggregating Mechanisms to compute Evolving Centralities
    • Application: preference change detection in Twitter network

4 Evolving Communities

    • Tracking Dynamics of Evolving Communities

5 Online Sampling

    • Online Sampling Graph Streams
    • Algorithms for Sampling High Speed Evolving Networks
    • Sampling Evolving Ego-Networks with Forgetting Factor

6 Final Remarks

    • SNA Tools
    • Evolving Networks Research Topics
    • Conclusions

Presenters

Fabiola S. F. Pereira is a Ph.D. student in Computer Science at Federal University of Uberlândia, Brazil and a visiting researcher at LIAAD, a group belonging to INESC TEC, Portugal. She served as reviewer/sub-reviewer in journals/conferences SBBD’15, BRACIS’15, ECML/PKDD’16, KDD’16, BigMine’16, MobDM’16, JDSA, TKDE and New Generation Computing. During past years, she authored papers in areas related to Temporal Networks, Social Network Analysis and User Preferences. She served as co-chair of the special session on Evolving Networks (EvoNets) in DSAA’17.

Shazia Tabassum is a researcher at LIAAD, INESC TEC Porto and pursuing her Ph.D. in Informatics Engineering at Faculty of Engineering, University of Porto. During the past recent years, she authored papers in the area of Networked Data Streams, Evolutionary Social Graphs and Social Network Analysis. She is a member of ACM and IEEE Computer Society. She served as a reviewer/sub-reviewer in Journals/conferences, some of them are Knowledge and Information Systems Journal, IEEE ICMLA’16, ECML/PKDD’16, KDD’16, IEEE BigDataSE’15. She served as co-chair of the special session on Evolving Networks DSAA'17.

Joao Gama received, in 2000, his Ph.D. degree in Computer Science from the Faculty of Sciences of the University of Porto, Portugal. He joined the Faculty of Economy where he holds the position of Associate Professor. He is also a senior researcher and vice-director of LIAAD, a group belonging to INESC TEC. He has worked in several National and European projects on Incremental and Adaptive learning systems, Ubiquitous Knowledge Discovery, Learning from Massive, and Structured Data, etc. He served as Co-Program chair of ECML’2005, DS’2009, ADMA’2009, IDA’ 2011, and ECML-PKDD’2015. He served as track chair on Data Streams with ACM SAC from 2007 till 2017. He organized a series of Workshops on Knowledge Discovery from Data Streams with ECMLPKDD conferences and Knowledge Discovery from Sensor Data with ACM SIGKDD. He is author of several books in Data Mining (in Portuguese) and authored a monograph on Knowledge Discovery from Data Streams. He authored more than 250 peer-reviewed papers in areas related to machine learning, data mining, and data streams. He is a member of the editorial board of international journals ML, DMKD, TKDE, IDA, NGC, and KAIS. Researcher at LIAAD, University of Porto, working at the Machine Learning group. His main research interest is in Learning from Data Streams. He is author of a recent book on Knowledge Discovery from Data Streams.

Slides

Slides now available:


References

tutorial-sdm18-kdnets-V9-final.pdf

[1] S. Tabassum, F. Pereira, Sofia Fernandes, J. Gama, “Social Network Analysis: An Overview,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2018

[2] F. Pereira, S. Tabassum, J. Gama, S. de Amo and G. Oliveira, “Processing Evolving Social Networks for Event Detection based on Centrality Measures,” Large-scale Learning from Data Streams in Evolving Environments. Springer, 2017.

[3] S. Tabassum and J. Gama, “Evolution analysis of call ego-networks,” in International Conference on Discovery Science. Springer, 2016, pp. 213–225.

[4] R. Sarmento, M. Oliveira, M. Cordeiro, S. Tabassum, and J. Gama, “Social network analysis in streaming call graphs,” in Big Data Analysis: New Algorithms for a New Society. Springer, 2016, pp. 239–261.

[5] S. Tabassum and J. Gama, “Sampling evolving ego-networks with forgetting factor,” in 2016 17th IEEE International Conference on Mobile Data Management (MDM), vol. 2. IEEE, 2016, pp. 55–59

[6] S. Tabassum, “Social network analysis of mobile streaming networks,” 17th IEEE International Conference on Mobile Data Management (MDM), PhD. Forum,, 2016, vol. 2. IEEE, 2016, pp. 20–25.

[7] S. Tabassum and J. Gama, “Sampling massive streaming call graphs,” in ACM Symposium on Advanced Computing, 2016, pp. 923–928.

[8] A. Metwally, D. Agrawal, and A. El Abbadi, “Efficient computation of frequent and topk elements in data streams,” in Database Theory-ICDT 2005. Springer, 2005, pp. 398–412.

[9] J. S. Vitter, “Random sampling with a reservoir,” ACM Transactions on MathematicalSoftware (TOMS), vol. 11, no. 1, pp. 37–57, 1985.

[10] Gama, J., Sebastião, R., & Rodrigues, P. P. (2013). “On evaluating stream learning algorithms”. Machine Learning, 90(3), 317-346.

[11] Pereira, F. S. F., Amo, S., Gama, J. Evolving Centralities in Temporal Graphs: a Twitter Network Analysis. In 17th IEEE International Conference on Mobile Data Management, Porto, Portugal, 2016 (MDM'16)

[12] Pereira, F. S. F., Amo, S., Gama, J. On Using Temporal Networks to Analyze User Preferences Dynamics. In 19th International Conference on Discovery Science, Bari, Italy, 2016 (DS'16)

[13] Pereira, F. S. F., Amo, S., Gama, J. Detecting Events in Evolving Social Networks through Node Centrality Analysis. ECML/PKDD workshop on Large-scale Learning from Data Streams in Evolving Environments 2016, Italy (StreamEvolv'16)

[14] P. Holme and J. Saramäki. Temporal networks. Physics reports, 519(3):97--125, 2012.

[15] Sandra de Amo , Mouhamadou Saliou Diallo , Cheikh Talibouya Diop , Arnaud Giacometti , Dominique Li , Arnaud Soulet, Contextual preference mining for user profile construction, Information Systems, v.49 n.C, p.182-199, April 2015

[16] Claudio D. G. Linhares, Bruno A. N. Travençolo, Jose Gustavo S. Paiva, and Luis E. C. Rocha. 2017. DyNetVis: a system for visualization of dynamic networks. In Proceedings of the Symposium on Applied Computing (SAC '17). ACM, New York, NY, USA, 187-194.

[17] Wu, H., Cheng, J., Huang, S., Ke, Y., Lu, Y., Xu, Y.: Path problems in temporal graphs. Proc. VLDB Endowment 7(9), 721–732 (2014)

[18] Nicosia, V., Tang, J., Mascolo, C., Musolesi, M., Russo, G., Latora, V.: Graph metrics for temporal networks. In: Holme, P., Saramäki, J. (eds.) Temporal Networks. Understanding Complex Systems, pp. 15–40. Springer, Heidelberg (2013)

[19] Zafarani, R., Abbasi, M.A., Liu, H.: Social Media Mining: An Introduction. Cambridge University Press, New York (2014)

[20] G Rossetti, L Pappalardo, D Pedreschi, F Giannotti. Tiles: an online algorithm for community discovery in dynamic social networks. Machine Learning, 1-29 (2016). https://link.springer.com/article/10.1007/s10994-016-5582-8

[21] G Rossetti, R Cazabet. Community Discovery in Dynamic Networks: A Survey. ACM Computing Surveys (CSUR) 51 (2), 35 (2018) https://arxiv.org/pdf/1707.03186.pdf