SIGKDD 2023 

SoCAL Data Science Day

8:30am - 5:00pm, August 7th, 2023

Long Beach Convention & Entertainment Center, Room 204

Long Beach, California, USA

Register for this special day using the link below!

Overview

Southern California is home to a number of world-class academic institutions and industrial research and development labs, spearheading fundamental and applied AI and Data Science research. Our goals, as part of this special day, are to highlight and celebrate the research and development efforts happening in the broader Southern California region and broaden participation of underrepresented groups to KDD. 

Keynote Speakers

Stefano Soatto

Vice President, Amazon Web Services (AWS); Professor, UCLA

Barry Barish

Professor, Caltech/UCR; Nobel Laureate in physics

Program

We have an exciting line-up of 2 keynote talks, 7 invited talks, and 39 poster presentations. See schedule below (all times are in Pacific Time zone)

Day 1, Aug 7th

8:30 am - 8:35am 

Welcome and Introduction

8:35 am - 8:55 am 

Invited Talk by Yan Liu, USC, "Fairness and Interpretability of AI Models for Health Applications"

Abstract: The recent release of large-scale healthcare datasets has greatly propelled the research of data-driven deep-learning models for healthcare applications. However, due to the nature of black-boxed models, concerns about interpretability, fairness, and biases in healthcare scenarios where human lives are at stake call for a careful and thorough examination of both datasets and models. In this work, we focus on MIMIC-IV, the largest publicly available healthcare dataset, and conduct comprehensive analyses of interpretability as well as dataset representation bias and prediction fairness of deep learning models for in-hospital mortality prediction.

Bio: Yan Liu is a Professor in the Computer Science Department and the Director of Machine Learning Center at the University of Southern California. She received her Ph.D. degree from Carnegie Mellon University.  Her research interest is machine learning and its applications to climate science, health care and sustainability. She has received several awards, including NSF CAREER Award, Okawa Foundation Research Award,  New Voices of Academies of Science, Engineering, and Medicine, Biocom Catalyst Award Winner, ACM Dissertation Award Honorable Mention, Best Paper Award in SIAM Data Mining Conference. She serves as general chair of KDD 2020 and ICLR 2023, program co-chair of SDM 2020, KDD 2022, and senior program chiar of ICLR 2022.

8:55 am -  9:15 am

Invited Talk by Cho-Jui Hsieh, UCLA, "Towards Automated Optimizer Discovery"

Abstract: This talk discusses our recent attempts on automating the design and tuning of optimizers. We will introduce an efficient optimizer design framework and an AI-discovered optimizer, Lion, which outperforms Adam in many real applications. We will then discuss several open directions in automating neural network training, including learning rate scheduling and batch selection.

Bio: Cho-Jui Hsieh is an associate professor in the Computer Science Department at UCLA. His work primarily focuses on enhancing the efficiency and robustness of machine learning systems, and he has made significant contributions to multiple widely-used machine learning packages. He has been honored with the NSF Career Award, Samsung AI Researcher of the Year, Google Research Scholar Award, Frontiers of Science Award, and his work has been acknowledged with several paper awards in ICLR, KDD, ICDM, ICPP, and SC.

9:15 am - 9:35 am

Poster Lightning Talks (2mins/poster) (ID: 1-11)

Paper ID Poster Slot Paper Title Authors

9:35 am - 10:00 am

Coffee Break

10.00 am - 10:20 am

Invited Talk Yao Qin, UCSB, "Improving Robustness through Safe Data Augmentation"

Abstract: There are many robustness issues arising in a variety of forms when deploying ML systems in the real world. For example, neural networks suffer from sensitivity to distributional shift, when a model is tested on a data distribution different from what it was trained on. Such a shift is frequently encountered in practical deployments and can lead to a substantial degradation in performance. In addition, neural networks are vulnerable to adversarial examples – small perturbations to the input can successfully fool classifiers into making incorrect predictions. In this talk, we will introduce safe data augmentation to improve robustness via 1) assigning effective labels to augmented data, and 2) negative data augmentation to mitigate non-robust features.

Bio: Dr. Qin is an Assistant Professor at the Department of Electrical and Computer Engineering at UC Santa Barbara, affiliated with the Department of Computer Science. She is also a senior research scientist at Google Research. She obtained her PhD degree at UC San Diego in Computer Science in 2020 and worked at Google Research afterward. Her research interests primarily focus on robustness in multi-modality models, fairness in generative modeling and AI for healthcare, particularly for diabetes. She has served as Area Chair for ICLR-2023 and ICCV-2023 and co-local Chair for KDD-2023. In addition, she has been recognized as EECS Rising Star at MIT, 2021.

10.20 am - 10:40 am

Invited Talk by Vagelis Papalexakis, UC Riverside "Low-rank approximation and robustness"

Abstract:

In this talk we are going to explore the connections between low-rank approximation and robustness. In particular, focusing on graph mining and learning as our motivating application, we are going to discuss recent results where we demonstrate the power of low-rank methods in (a) consolidating and improving the performance of potentially noisy node embeddings, and (b) defending against adversarial attacks. We will conclude with results that point to the generality of our observations and discuss future directions. 

Bio: Evangelos (Vagelis) Papalexakis is an Associate Professor of the CSE Dept. at University of California Riverside.  He received his PhD degree at the School of Computer Science at Carnegie Mellon University (CMU). Prior to CMU, he obtained his Diploma and MSc in Electronic & Computer Engineering at the Technical University of Crete, in Greece. Broadly, his research interests span the fields of Data Science, Machine Learning, Artificial Intelligence, and Signal Processing.  His research involves designing interpretable models and scalable algorithms for extracting knowledge from large multi-aspect datasets, with specific emphasis on tensor factorization models, and applying those algorithms to a variety of real-world problems, including detection of misinformation on the Web, explainable AI, and gravitational wave detection.  His work has appeared in top-tier conferences and journals, and has attracted a number of distinctions, including the 2017 SIGKDD Dissertation Award (runner-up), a number of paper awards, the National Science Foundation CAREER award, the 2021 IEEE DSAA Next Generation Data Scientist Award, and the ICDM 2022 Tao Li Award which awards excellence in early-career researchers. 

10.40 am - 11:00 am

Invited Talk by Yue Dong, UCR "Safeguarding the Potential: Exploring Safety and Vulnerability in Large Language Models"

Abstract: Artificial Intelligence (AI), particularly Large Language Models (LLMs), are dramatically altering our landscape of possibilities, capable of tasks such as writing insightful essays and generating executable code. However, every coin has two sides, and AI is no exception; the intellectual limitations of LLMs bring both safety and vulnerability concerns that we need to carefully examine before autonomous deployment. In this talk, we will traverse the path of the latest innovations that have propelled these models to their current state of impressive performance. While we revel in the impressive potential of LLMs, we will also critically examine the concerns they raise. We will delve into the ongoing debate surrounding the future of AI and the need to align AI's progress with human values. Additionally, we will explore recent discoveries of vulnerabilities through adversarial attacks in LLMs that could lead to unsafe behavior. By doing so, we will emphasize the importance of thorough scrutiny, continuous refinement, and ethical considerations in this rapidly-evolving field.

Bio: Yue Dong is an assistant professor of computer science and engineering at the University of California Riverside. Her research interests include natural language processing, machine learning, and artificial intelligence. She leads the Natural Language Processing group, which develops natural language understanding and generation systems that are controllable, trustworthy, and efficient.

11:00 am - 11:55 am

Poster Lightning talks (2mins/poster)(ID 12-39)

Paper ID Poster Slot Paper Title Authors

11:55 am - 1:30 pm

Lunch Break

1:30 pm - 2:30 pm

Keynote Talk by Stefano Soatto, UCLA/Amazon "Taming AI Bots: Controllability of Neural States in Large Language Models"

Abstract: I will present a view of large language models (LLMs) as stochastic dynamical systems, for which the notion of controllability is well established. From this view, it is easy to see that the ``state of mind'' of an LLM can be easily steered by a suitable choice of input, given enough time and memory. However, the space of interest for an LLM is not that of words, but rather the set of ``meanings'' expressible as sentences that a human could have spoken, and would understand. Unfortunately, unlike controllability, the notions of ``meaning'' and ``understanding'' are not usually formalized in a way that is relatable to LLMs in use today. 

I will propose a simplistic definition of meaning that is compatible with at least some theories found in Epistemology, and relate it to functional characteristics of trained LLMs. Then, I will describe both necessary and sufficient conditions for controllability in the space of meanings. I will show that a well-trained LLM establishes a topology and geometry in the space of meanings, whose embedding space has words (tokens) as coordinate axes. In this space, meanings are equivalence classes of trajectories (complete sentences). 

I will then argue that meaning attribution requires an external grounding mechanism, and relate LLMs with models of the physical scene inferred from images. There, I will highlight the analogy between meanings inferred from sequences of words, and the ``physical scene'' inferred from collections of images. But while the entity that generates textual meanings (the human brain) is not accessible for experimentation, the physical scene can be probed and falsified.

Bio: Professor Soatto received his Ph.D. in Control and Dynamical Systems from the California Institute of Technology in 1996; he joined UCLA in 2000 after being Assistant and then Associate Professor of Electrical and Biomedical Engineering at Washington University, and Research Associate in Applied Sciences at Harvard University. Between 1995 and 1998 he was also Ricercatore in the Department of Mathematics and Computer Science at the University of Udine - Italy. He received his D.Ing. degree (highest honors) from the University of Padova- Italy in 1992. His general research interests are in Computer Vision and Nonlinear Estimation and Control Theory. In particular, he is interested in ways for computers to use sensory information (e.g. vision, sound, touch) to interact with humans and the environment. Dr. Soatto is the recipient of the David Marr Prize (with Y. Ma, J. Kosecka and S. Sastry of U.C. Berkeley) for work on Euclidean reconstruction and reprojection up to subgroups. He also received the Siemens Prize with the Outstanding Paper Award from the IEEE Computer Society for his work on optimal structure from motion (with R. Brockett of Harvard). He received the National Science Foundation Career Award and the Okawa Foundation Grant. He is Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) and a Member of the Editorial Board of the International Journal of Computer Vision (IJCV) and Foundations and Trends in Computer Graphics and Vision.  

2:30 pm - 2:50 pm

Invited Talk by Yunfei Hou, CSUSB "Empowering Future Data Scientists: Building New Data Science Programs at CSUSB"

Abstract: TBD

Bio: Dr. Yunfei Hou is an associate professor in the School of Computer Science and Engineering at California State University San Bernardino, and also serves as the associate director at Leonard Transportation Center. He received his Ph.D. in Computer Science and Engineering from University at Buffalo, SUNY, and B.S. in Computer Science from Xi'an Jiaotong University. His current research interests include applications in transportation cyber-physical systems, data and information analysis for transportation engineering, and STEM education. His recent projects span areas such as vehicular sensing in smart cities, traffic management with connected vehicle technologies, transportation cybersecurity, and data science education. These projects have been funded by NSF, DOT and NREL.

2:50 pm - 3:30 pm 

Coffee Break

3:30 pm - 4:30 pm

Keynote Talk by Barry Barish, Caltech/UCR "Scientific Discovery through Technical Innovation and Data Mining"

Abstract: 

Bio: Barry Clark Barish is an American experimental physicist and Nobel Laureate. He is a Linde Professor of Physics, emeritus at California Institute of Technology and a leading expert on gravitational waves. In 2017, Barish was awarded the Nobel Prize in Physics along with Rainer Weiss and Kip Thorne "for decisive contributions to the LIGO detector and the observation of gravitational waves". In 2018, he joined the faculty at University of California, Riverside, becoming the university's second Nobel Prize winner on the faculty. In the fall of 2023, he will serve as the inaugural President’s Distinguished Endowed Chair in Physics at Stony Brook University.

4:30 pm - 4:50 pm 

Invited Talk by Yue Zhao, USC " Enable Security Applications by Machine Learning with Noisy Inputs"

Abstract: In security applications, conventional machine learning (ML) methodologies rely heavily on obtaining clean labels from human annotators, a process that often proves costly and impractical. This talk will introduce an innovative approach called ADMoE (Anomaly Detection with Mixture of Experts, AAAI' 23), designed to utilize weak or noisy labels, such as risk scores generated by machine rules for malware detection, which are more economical and practical to obtain. ADMoE represents a novel framework that allows anomaly detection--a key ML subfield for security--to learn effectively from noisy labels. Utilizing a Mixture of Experts (MoE) architecture, it promotes specialized learning on a large scale from multiple noisy sources. The method shares most model parameters to capture commonalities among the noisy labels, while fostering specialization through the creation of "expert" sub-networks. Our comprehensive testing has demonstrated that ADMoE can deliver up to a 34% performance boost over methods that do not use it. Additionally, ADMoE outperforms 13 leading baselines when assessed on equivalent network parameters and FLOPS. Importantly, ADMoE's design is model-agnostic, allowing any neural network-based detection methods to manage noisy labels effectively.

Bio:  Yue Zhao is an Assistant Professor of Computer Science at University of Southern California. His research focuses on creating automated and scalable ML algorithms and systems, and he has published over 30 papers in top ML and systems venues such as VLDB, MLSys, NeurIPS, TKDE, and JMLR. His open-source systems (https://github.com/yzhao062) have been widely deployed in firms and industries such as NASA, Morgan Stanley, and Tesla, and have received over 15,000 GitHub stars and 20 million downloads. Yue got his Ph.D. from Carnegie Mellon University, with the support of CMU Presidential Fellowship and Norton Graduate Fellowship.

6:30pm - 8:30pm

Poster Session