Workshop Schedule

1st Session, 1:00 - 5:00pm on Aug. 15 (Singapore Time, GMT+8)


01:00 -- 01:10 Welcome


01:10 -- 01:50

Keynote 1

Finding a Needle in a Haystack: The Case with Software Bugs

Professor David Lo, Singapore Management University


01:50 -- 02:30

Keynote 2

OOD Example and New Label under Weakly Supervised Scenario

Professor Yu-Feng Li, Nanjing University

02:30 -- 02:40 Short Break


02:40 -- 03:20

Keynote 3

Federated learning-based one class classification for active authentication

Professor Vishal Patel, Johns Hopkins University


03:20 -- 04:20

Presentation of Accepted Papers (20 minutes per paper, including 5 minutes for Q&A):

  1. Gregor Kasieczka, Benjamin Nachman and David Shih. New Methods and Datasets for Group Anomaly Detection From Fundamental Physics.

Universität Hamburg; Lawrence Berkeley National Laboratory; Rutgers University

  1. Andrea Bontempelli, Fausto Giunchiglia, Andrea Passerini and Stefano Teso. Human-in-the-loop Handling of Knowledge Drift.

University of Trento, Italy

  1. Tal Reiss, Niv Cohen, Liron Bergman and Yedid Hoshen. PANDA: Adapting Pretrained Features for Anomaly Detection and Segmentation.

The Hebrew University of Jerusalem, Israel



2nd Session, 4:00 - 8:00am on Aug. 16 (Singapore Time, GMT+8)


04:00 -- 04:40

Keynote 4

Towards Transparent (Fair and Explainable) Outlier Detection

Professor Ian Davidson, University of California, Davis


04:40 -- 05:20

Keynote 5

Novelty Detection for Planetary and Space Exploration

Professor Kiri Wagstaff, Jet Propulsion Laboratory


05:20 -- 05:30 Short Break


05:30 -- 06:10

Keynote 6 Fault Detection in Machines, can we do it?

Professor Osmar Zaiane, University of Alberta


06:10 -- 07:10

Invited Talks & Presentation of Accepted Papers (20 minutes per talk, including 5 minutes for Q&A):

1. Egawati Panjei, Le Gruenwald, Eleazar Leal and Christopher Nguyen. Micro-clusters-based Outlier Explanations for Data Streams.

University of Oklahoma, USA

2. (Invited Talk) Shen Wang. Graph Neural Networks on Anomaly Detection.

Amazon, USA

3. (Invited Talk) Hongzuo Xu. Beyond Outlier Detection: Outlier Interpretation by Attention-Guided Triplet Deviation Network. In Web Conference (WWW) 2021

National University of Defense Technology, China

07:10 -- 07:20 Short Break


07:20 -- 08:00

Panel Discussion on "Anomaly and novelty detection: Challenges ahead"

Professor Nitesh Chawla, University of Notre Dame

Professor Sanjay Chawla, Qatar Computing Research Institute

Professor Rob J. Hyndman, Monash University

Professor Jian Pei, Simon Fraser University

Professor Kai Ming Ting, Nanjing University

Professor Thomas G. Dietterich (moderator), Oregon State University


Keynote Talks

Keynote 1: Finding a Needle in a Haystack: The Case with Software Bugs

Abstract: Software bugs come in various shapes and sizes. They affect businesses and end-users, and many have serious consequences. Fortunately, bugs are anomalies -- the majority of software code is correct. However, this fact makes bug detection hard. This talk will present how data science can be used to identify such anomalous and undesirable code. The talk will highlight one of our latest works that identifies API misuses through active learning and discriminative pattern mining. It will also present our recent work that learns source code representations helpful for a number of downstream tasks on bug identification and management. This talk will also describe open problems and opportunities, with the goal of encouraging more research in this exciting topic at the intersection of data science and software engineering.

Speaker: David Lo, Singapore Management University

Bio: David Lo is a Professor of Computer Science at Singapore Management University, leading the Software Analytics Research (SOAR) group. His research interest is in the intersection of software engineering, cybersecurity, and data science, encompassing socio-technical aspects and analysis of different kinds of software artifacts, with the goal of improving software quality and security and developer productivity. His work has been published in major and premier conferences and journals in the area of software engineering, AI, and cybersecurity. He has won more than 15 international research and service awards including 6 ACM SIGSOFT Distinguished Paper awards and the 2021 IEEE TCSE Distinguished Service Award. More information about him and his research group are available at: http://www.mysmu.edu/faculty/davidlo/ and https://soarsmu.github.io/.


Keynote 2: OOD Example and New Label under Weakly Supervised Scenario

Abstract: Machine learning algorithms tend to fail when the training and test data contain examples from unknown distribution, e.g., out-of-distribution (OOD) example or example from new label, which becomes one major challenge to deploy machine learning models in real-world tasks. Previous studies mainly focused on supervised or unsupervised scenarios, while the efforts on weakly supervised scenarios remain to be limited. In this talk, we present some recent research on weakly supervised learning suffering from examples from unknown distribution. Firstly, we present a more accurate semi-supervised and label noise learning algorithm affected by OOD examples, respectively. Then, we present two attempts on detecting examples from new label under streaming data. Experimental results verify the superiority of our proposed approaches, and reveal the possible research direction in the future.

Speaker: Yu-Feng Li, Nanjing University

Bio: Yu-Feng Li is an associate professor of the National Key Laboratory for Novel Software Technology, Nanjing University. He received the BSc and PhD degrees in computer science from Nanjing University, China, in 2006 and 2013, respectively. His research interests include semi-supervised learning, weakly supervised learning, and optimization. He has published more than 50 papers in top-tier journals and conference proceedings. He is an action or associate editor of the Machine Learning, Neural Network, etc. He served as program co-chair of IEEE Bigcomp 2020, CCML 2021, MLA 2020, journal track co-chair of ACML 2021, workshop co-chair of ACML 2018, tutorial co-chair of ACML 2019, etc, and area chair/senior pc member of ICML, IJCAI, AAAI, ACML, PAKDD, etc.


Keynote 3: Federated learning-based one class classification for active authentication

Abstract: User active authentication on mobile devices aims to learn a model that can correctly recognize the enrolled user based on device sensor information. Due to lack of negative class data, it is often modeled as a one-class classification problem. In practice, mobile devices are connected to a central server, e.g, all android-based devices are connected to Google server through internet. This device-server structure can be exploited by recently proposed Federated Learning (FL) and Split Learning (SL) frameworks to perform collaborative learning over the data distributed among multiple devices. Using FL/SL frameworks, one can alleviate the lack of negative data problem by training a user authentication model over multiple user data distributed across devices. To this end, we propose a novel user active authentication training, termed as Federated Active Authentication (FAA), that utilizes the principles of FL/SL. We first show that existing FL/SL methods are suboptimal for FAA as they rely on the data to be distributed homogeneously (i.e. IID) across devices, which is not true in the case of FAA. Subsequently, we propose a novel method that is able to tackle heterogeneous/non-IID distribution of data in FAA. Specifically, we first extract feature statistics such as mean and variance corresponding to data from each user which are later combined in a central server to learn a multi-class classifier and sent back to the individual devices.

Speaker: Vishal Patel, Johns Hopkins University

Bio: Vishal M. Patel is an Associate Professor in the Department of Electrical and Computer Engineering (ECE) at Johns Hopkins University. Prior to joining Hopkins, he was an A. Walter Tyson Assistant Professor in the Department of ECE at Rutgers University and a member of the research faculty at the University of Maryland Institute for Advanced Computer Studies (UMIACS). He completed his Ph.D. in Electrical Engineering from the University of Maryland, College Park, MD, in 2010. He has received a number of awards including the 2021 NSF CAREER Award, the 2016 ONR Young Investigator Award, the 2016 Jimmy Lin Award for Invention, A. Walter Tyson Assistant Professorship Award, Best Paper Awards at IEEE AVSS 2017 and 2019, Best Paper Award at IEEE BTAS 2015, Honorable Mention Paper Award at IAPR ICB 2018, two Best Student Paper Awards at IAPR ICPR 2018, and Best Poster Awards at BTAS 2015 and 2016. He is an Associate Editor of the IEEE Signal Processing Magazine, Pattern Recognition Journal, and serves on the Machine Learning for Signal Processing (MLSP) Committee of the IEEE Signal Processing Society. He serves as the vice president of conferences for the IEEE Biometrics Council.


Keynote 4: Towards Transparent (Fair and Explainable) Outlier Detection

Abstract: Outlier detection (OD) is perhaps the highest stakes data mining task as it involves identifying unusual behavior for policing/auditing. When OD is applied to humans, the need for transparency becomes paramount. We overview recent work by ourselves and others exploring two tenets of transparency: fairness and explanation. In the former (fairness), we explore measuring if an OD's algorithm is fair and prove this is an intractable problem. This means adding fairness criteria to OD algorithms is a challenging computational task. We outline several deep outlier methods and how fairness can be added to them. In the latter (explanation) we explore the approach of post-processing an OD algorithm's output to try to explain what characterizes outliers vs inliers. We discuss two directions, one where the explanation is given in terms of the features used to perform OD on and another using auxiliary information. We formalize the outlier description problem, present complexity results and declarative formulations which can be easily implemented.

Speaker: Ian Davidson, University of California, Davis

Bio: Ian Davidson is a Professor of Computer Science at the University of California at Davis. He has worked on a wide variety of problems involving unsupervised learning and this latest work on fairness and explanation is funded by the National Science Foundation, The National

Institute of Health and Google research grants.


Keynote 5: Novelty Detection for Planetary and Space Exploration

Abstract: Machine learning provides the ability to quickly sift through large data sets to highlight unexpected observations that could lead to new discoveries. For example, we have developed a system to detect and explain anomalies in galaxy observations from the Dark Energy Survey. The explanations help scientists determine whether the anomalies (1) indicate an upstream data collection or processing issue or (2) are of scientific interest (new discovery). We have also developed visual explanations of novelty in Mars rover image archives. In addition, I will describe how Mars rovers can use novelty measures to autonomously select observation targets. All three systems aim to accelerate the process of scientific discovery by efficiently directing attention to where it is most needed.

Speaker: Kiri Wagstaff, Jet Propulsion Laboratory

Bio: Dr. Kiri L. Wagstaff is a Principal Researcher in machine learning at NASA's Jet Propulsion Laboratory and an associate research professor at Oregon State University. Her research focuses on developing new machine learning methods for use onboard spacecraft and in data archives for planetary science, astronomy, cosmology, and more. She earned a Ph.D. in Computer Science from Cornell University followed by an M.S. in Geological Sciences and a Master's degree in Library and Information Science (MLIS). She received the Lew Allen Award for Excellence in Research and two NASA Exceptional Technology Achievement Medals, and she is a Senior Member of the Association for the Advancement of Artificial Intelligence. She is passionate about keeping machine learning relevant to real-world problems.


Keynote 6: Fault Detection in Machines, can we do it?

Abstract: During the lifetime of any machine, components will at some point break down and fail due to wear and tear. One of the greatest challenges to the automated production of goods is equipment malfunction. Ideally, machines should be able to automatically predict and detect operational faults in order to minimize downtime and plan for timely maintenance.

Can data-driven outlier detection be used to avoid costly traditional ad hoc maintenance? Detecting the onset of machine failure using anomaly detection methods is an interesting research topic.

We propose data-driven approaches to anomaly detection for the early detection of faults for condition-based maintenance. Successfully detecting failures as they begin to occur promises to address key issues in machine maintenance like safety and cost effectiveness.

Speaker: Osmar Zaiane, University of Alberta

Bio: Osmar R. Zaïane is a Professor in Computing Science at the University of Alberta, Canada, Fellow of the Alberta Machine Intelligence Institute (Amii), and Canada CIFAR AI Chair. Dr. Zaiane obtained his Ph.D. from Simon Fraser University, Canada, in 1999. He has published more than 330 papers in refereed international conferences and journals. He is Associate Editor of many International Journals on data mining and data analytics and served as program chair and general chair for scores of international conferences in the field of knowledge discovery and data mining. Dr. Zaiane received numerous awards including the 2010 ACM SIGKDD Service Award from the ACM Special Interest Group on Data Mining, which runs the world’s premier data science, big data, and data mining association and conference.


Invited Talk 1: Graph Neural Networks on Anomaly Detection

Abstract: Anomaly detection is an important task, which tackles the problem of discovering “different from normal” signals or patterns by analyzing a massive amount of data, thereby identifying and preventing major faults. Anomaly detection is applied to numerous high-impact applications in cyber-security, finance, e-commerce, social network, industrial monitoring, and many more mission-critical tasks. While multiple techniques have been developed in past decades in addressing unstructured collections of multi-dimensional data, graph-structure-aware techniques have recently attracted considerable attention. A number of novel techniques have been developed for anomaly detection by leveraging the graph structure. Recently, graph neural networks (GNNs), as a powerful deep-learning-based graph representation technique, has demonstrated superiority in leveraging the graph structure and been used in anomaly detection. In this talk, we provide a general, comprehensive, and structured overview of the existing works that apply GNNs in anomaly detection.

Speaker: Shen Wang, Research Scientist, Amazon


Invited Talk 2: Beyond Outlier Detection: Outlier Interpretation by Attention-Guided Triplet Deviation Network

Abstract: Outlier detection is an important task in many domains and is intensively studied in the past decade. Further, how to explain outliers, i.e., outlier interpretation, is more significant, which can provide valuable insights for analysts to better understand, solve, and prevent these detected outliers. However, only limited studies consider this problem. Most of the existing methods are based on the score-and-search manner. They select a feature subspace as interpretation per queried outlier by estimating outlying scores of the outlier in searched subspaces. Due to the tremendous searching space, they have to utilize pruning strategies and set a maximum subspace length, often resulting in suboptimal interpretation results. Accordingly, this paper proposes a novel Attention-guided Triplet deviation network for Outlier interpretatioN (ATON). Instead of searching a subspace, ATON directly learns an embedding space and learns how to attach attention to each embedding dimension (i.e., capturing the contribution of each dimension to the outlierness of the queried outlier). Specifically, ATON consists of a feature embedding module and a customized self-attention learning module, which are optimized by a triplet deviation-based loss function. We obtain an optimal attention-guided embedding space with expanded high-level information and rich semantics, and thus outlying behaviors of the queried outlier can be better unfolded. ATON finally distills a subspace of original features from the embedding module and the attention coefficient. With the good generality, ATON can be employed as an additional step of any black-box outlier detector. A comprehensive suite of experiments is conducted to evaluate the effectiveness and efficiency of ATON. The proposed ATON significantly outperforms state-of-the-art competitors on 12 real-world datasets and obtains good scalability w.r.t. both data dimensionality and data size.

Speaker: Hongzuo Xu, National University of Defense Technology, China