Graph Representation Learning in the Presence of Community Outliers


Tutorial at ECAI 2020

Slides used in the tutorial can be found here.

Introduction

Graph representation learning (also known as graph embedding) has received a significant amount of interest from the network analysis and general artificial intelligence community. Here, the goal is to learn vector representation of the nodes of a graph (or the entire graph itself) which can then be used with any standard machine learning algorithms for different graph mining tasks. However, real world networks come with community outliers, which violate the overall community structure of the graph. The presence of the outlier nodes are not known and they are often difficult to detect. As these outlier nodes are connected to other nodes in the graph randomly, state-of-the-art graph representation algorithms suffer heavily in the presence of even a small number of community outlier nodes in a graph. Thus, it is important to minimize the effect of outlier nodes while generating the representation of the other nodes in the network.

In this tutorial, we start with giving a high level overview of the state-of-the-art node representation learning algorithms such as random walk based unsupervised algorithms, graph neural network based semi-supervised algorithms, etc. We introduce different types of community outliers in a graph and show their adverse effect on the node embedding algorithms. We will cover in details about all the state-of-the-art recently proposed node representation algorithms which minimize the adverse effect of the outlier nodes in the representation learning framework. All these algorithms integrate outlier detection and node embedding into a single optimization framework. We also cover detailed experimental analysis and comparisons before concluding the tutorial.

As part of this tutorial we will be covering following points

  • Motivation
    We motivate the paradigm of graph representation learning, node embedding and outlier detection in graphs.

  • Introduction to Graph Representation Learning
    We establish the problem definition for learning node embedding in a graph and discuss upon popular techniques in the area. We cover methods based on random walks (node2vec), matrix factorization (TADW ), deep learning SDNE, semi-supervised and unsupervised graph neural networks (GCN and DGI), etc.

  • Community Outliers in Graphs and Classical Methods of Outlier Detection
    We formulate the notion of communities and community outliers in a graph and address the problem of outlier detection via classical methods like ALTQP-Inc and more advanced methods like Dominant.

  • Node Embedding and Community Outliers
    Community outliers can affect the node representation in a graph in undesirable ways. Decoupled approaches propose to remove outliers prior to learning node embedding. We provide an alternate to this by combining the task of graph representation learning and community outlier detection while explicitly minimizing the effect of outliers.

  • Recent Developments
    Finally, we discuss some recent advancements in the area which include SEANO (SDM 2018), ONE (AAAI 2019), DONE & AdONE (WSDM 2020) and DMGD (ECAI 2020).

About Speakers:

  • Sambaran Bandyopadhyay (sambband@in.ibm.com) is an Advisory Research Engineer at IBM Research, India. He is also a final year PhD student (Aug, 2016 to July, 2020) at the department of Computer Science and Automation at Indian Institute of Science, Bangalore. His areas of research span over graph representation learning, graph neural networks, optimization and computational sustainability. Sambaran has published his research works in top tier AI conferences such as AAAI, IJCAI, KDD, WSDM, ICAPS, ECAI etc. and also has 8 patents filed to USPTO. He has been a member of the technical program committee of the top AI conferences such as ICML, AAAI, ECML-PKDD, etc.

  • Saley Vishal Vivek (vishalsaley@iisc.ac.in) has completed Master of Technology from the department of Computer Science and Automation at Indian Institute of Science, Bangalore. He is currently working as an Applied Scientist at Amazon Machine Learning, India.

  • M. Narasimha Murty (mnm@iisc.ac.in) received the PhD degree from the Indian Institute of Science, Bangalore, India, in 1982. He is currently a professor in the Department of Computer Science and Automation at the Indian Institute of Science. He has guided about 20 PhD students in the areas of pattern recognition and data mining. He has also published around 125 papers in various journals and conference proceedings in these areas. His coauthored work, “Data Clustering: A Review” (with Anil Jain and P.J. Flynn, ACM Computing Surveys, 1999), is among the most popular data mining and computing survey articles downloaded (more than 16,000 citations according to Google Scholar). He is also an elected fellow of the Indian National Academy of Engineering (FNAE), New Delhi, and the National Academy of Sciences (FNASc), Allahabad. In the past, he has worked on several Indo-US projects and visited Michigan State University, East Lansing, and University of Dauphine, Paris. His current research interests primarily span data mining.