# Past Talks

## Title: Scaling Optimal Transport for High dimensional Learning

Abstract: Optimal transport (OT) has recently gained lot of interest in machine learning. It is a natural tool to compare in a geometrically faithful way probability distributions. It finds applications in both supervised learning (using geometric loss functions) and unsupervised learning (to perform generative model fitting). OT is however plagued by the curse of dimensionality, since it might require a number of samples which grows exponentially with the dimension. In this talk, I will explain how to leverage entropic regularization methods to define computationally efficient loss functions, approximating OT with a better sample complexity. More information and references can be found on the website of our book "Computational Optimal Transport" optimaltransport.github.io/

Bio: Gabriel Peyré is CNRS senior researcher and professor at the Ecole Normale Supérieure, Paris. He works at the interface between applied mathematics, imaging and machine learning. He obtained 2 ERC grants (starting in 2010 and consolidator in 2017), the Blaise Pascal prize from the French academy of sciences in 2017, the Magenes Prize from the Italian Mathematical Union in 2019 and the silver medal from CNRS in 2021. He is invited speaker at the European Congress for Mathematics in 2020. He is the deputy director of the Prairie Institute for artificial intelligence, the director of the ENS center for data science and the former director of the GdR CNRS MIA. He is the head of the ELLIS (European Lab for Learning & Intelligent Systems) Paris Unit (https://ellis-paris.github.io/). He is engaged in reproducible research and code education, in particular through the platform www.numerical-tours.com.

optimaltransport.pdf

## Title: Pandora's Box with Correlations: Learning and Approximation

Abstract: In the Pandora's Box problem, the algorithm is provided with a number of boxes with unknown (stochastic) rewards contained inside them. The algorithm can open any box at some cost, discover the reward inside, and based on these observations can choose one box and keep the reward contained in it. Given the distributions from which the rewards are drawn, the algorithm must determine an order in which to open the boxes as well as when to stop and accept the best reward found so far. In general, an optimal algorithm may make both decisions adaptively based on instantiations observed previously. The Pandora's Box problem and its extensions capture many kinds of optimization problems with stochastic input where the algorithm can obtain instantiations of input random variables at some cost. Previous work on these problems assumes that different random variables in the input are distributed independently. As such it does not capture many real-world settings. In this work, we provide the first algorithms for Pandora's Box-type problems with correlations. In the independent setting, optimal algorithms are non-adaptive and based on the notion of the Gittins index. These techniques fail to extend to the correlated case. We assume that the algorithm has access to samples drawn from the joint distribution on input and provide solutions that require few samples; are computationally efficient; and guarantee approximate optimality.
This is joint work with Evangelia Gergatsouli, Yifeng Teng, Christos Tzamos, and Ruimin Zhang and appeared in FOCS'20.

Bio: Shuchi Chawla holds an Endowed Professorship in Computer Science at UT-Austin and is an Amazon Scholar. Shuchi is a theoretical computer scientist specializing in the areas of algorithm design, and economics and computation. Shuchi received a Ph.D. from Carnegie Mellon University and a B.Tech. from the Indian Institute of Technology, Delhi. Prior to joining UT-Austin, she spent 15 years as a professor of CS at the University of Wisconsin-Madison. She has also previously held visiting positions at the University of Washington and Microsoft Research. Shuchi is the recipient of an NSF Career award, a Sloan Foundation fellowship, and several awards for her research and teaching at UW-Madison. Shuchi recently served as the PC Chair of SODA'20 and EC'21, and currently serves on the editorial boards of the ACM Transactions on Algorithms and the ACM Transactions on Economics and Computation. She serves as a member and the current chair of CATCS.

pandoras-box-correlations-DataScience-4-6-22.pdf

## Title: An Introduction to Proximal Causal Learning

Abstract: A standard assumption for causal inference from observational data is that one has measured a sufficiently rich set of covariates to ensure that within covariates strata, subjects are exchangeable across observed treatment values. Skepticism about the exchangeability assumption in observational studies is often warranted because it hinges on one’s ability to accurately measure covariates capturing all potential sources of confounding. Realistically, confounding mechanisms can rarely if ever, be learned with certainty from measured covariates. One can therefore only ever hope that covariate measurements are at best proxies of true underlying confounding mechanisms operating in an observational study, thus invalidating causal claims made on basis of standard exchangeability conditions.

Causal learning from proxies is a challenging inverse problem which has to date remained unresolved. In this paper, we introduce a formal potential outcome framework for proximal causal learning, which while explicitly acknowledging covariate measurements as imperfect proxies of confounding mechanisms, offers an opportunity to learn about causal effects in settings where exchangeability based on measured covariates fails. Sufficient conditions for nonparametric identification are given, leading to the proximal g-formula and corresponding proximal g-computation algorithm for estimation, both generalizations of Robins’ foundational g-formula and g-computation algorithm, which account explicitly for bias due to unmeasured confounding. Both point treatment and time-varying treatment settings are considered, and an application of proximal g-computation of causal effects is given for illustration.

Bio: Eric Tchetgen Tchetgen’s primary area of interest is in semi-parametric efficiency theory with application to causal inference and missing data problems. In general, he works on the development and application of statistical and epidemiologic methods that make efficient use of the information in data collected by scientific investigators, while avoiding unnecessary assumptions about underlying data generating mechanisms.

In 2018, Eric Tchetgen Tchetgen joined The Wharton School, University of Pennsylvania as the Luddy Family President’s Distinguished Professor and Professor of Statistics. Prior to that he was Professor of Biostatistics and Epidemiologic Methods at Harvard University. He completed his PhD in Biostatistics at Harvard University in 2006 received his B.S. in Electrical Engineering from Yale University in 1999.

## Title: Learning-Based Sampling and Streaming

Abstract: Classical algorithms typically provide "one size fits all" performance, and do not leverage properties or patterns in their inputs. A recent line of work aims to address this issue by developing algorithms that use machine learning predictions to improve their performance. In this talk I will present two examples of this type, in the context of streaming and sampling algorithms. In particular, I will show how to use machine learning predictions to improve the performance of (a) low-memory streaming algorithms for frequency estimation (ICLR’19), and (b) sampling algorithms for estimating the support size of a distribution (ICLR’21). Both algorithms use an ML-based predictor that, given a data item, estimates the number of times the item occurs in the input data set.

The talk will cover material from papers co-authored with T Eden, CY Hsu, D Katabi, S Narayanan, R Rubinfeld, S Silwal, T Wagner and A Vakilian.

Bio: Piotr Indyk is a Thomas D. and Virginia W. Cabot Professor in the Department of Electrical Engineering and Computer Science at MIT. He joined MIT in 2000, after earning PhD from Stanford University. Earlier, he received Magister degree from Uniwersytet Warszawski in 1995. Piotr’s research interests are in the design of efficient algorithms for massive data analysis. His specific interests include nearest neighbor search in high-dimensional spaces, streaming algorithms, sparse recovery, fine-grained complexity and learning-based algorithms. He has received Sloan Fellowship (2003), Packard Fellowship (2003), Simons Investigator Award (2013) and ICML Best Paper Award (2015). His work on sparse Fourier sampling has been named to Technology Review TR10 in 2012, while his work on locality-sensitive hashing has received the 2012 ACM Kanellakis Theory and Practice Award. He is a co-director of Foundations of Data Science Institute (fodsi.us), an NSF-funded project focused on foundations of data science.

FDS-learning.pdf

## Title: Communicating with Anecdotes

Abstract: Classic models of communication in economics typically assume agents can communicate any message. However, many important communications, such as those in newspapers or politicians' speeches, use data to convey information. In this talk, we explore how the reliance on data impacts communication. In our model, there are two Bayesian agents (a sender and a receiver) who wish to communicate. The receiver must take an action whose payoff depends on their personal preferences and an unknown state of the world. The sender has access to a collection of data points correlated with the state of the world and can send exactly one of these to the receiver in order to influence her choice of action. Importantly, the sender's personal preferences may differ from the receiver's, which affects the sender's strategic choice of what to send. We show that in a Nash equilibrium even a small difference in preferences can lead to a significant bias in the communicated datum. This can significantly reduce informativeness of the communication, leading to substantial utility loss for both sides. One implication is informational homophily: a receiver can rationally prefer to obtain data from a poorly-informed sender with aligned preferences, rather than a knowledgeable expert whose preferences may differ from her own.

Joint work with Nika Haghtalab, Brendan Lucier, Markus Mobius and Divya Mohan.

Bio: Nicole Immorlica's research lies broadly within the field of economics and computation. Using tools and modeling concepts from both theoretical computer science and economics, Nicole hopes to explain, predict, and shape behavioral patterns in various online and offline systems, markets, and games. Her areas of specialty include social networks and mechanism design. Nicole received her Ph.D. from MIT in Cambridge, MA in 2005 and then completed three years of postdocs at both Microsoft Research in Redmond, WA and CWI in Amsterdam, Netherlands before accepting a job as an assistant professor at Northwestern University in Chicago, IL in 2008. She joined the Microsoft Research New England Lab in 2012.

anecdotes.pdf

## Title: Neural SDEs: Deep Generative Models in the Diffusion Limit

Abstract: In deep generative models, the latent variable is generated by a time-inhomogeneous Markov chain, where at each time step we pass the current state through a parametric nonlinear map, such as a feedforward neural net, and add a small independent Gaussian perturbation. In this talk, based on joint work with Belinda Tzen, I will discuss the diffusion limit of such models, where we increase the number of layers while sending the step size and the noise variance to zero. I will first provide a unified viewpoint on both sampling and variational inference in such generative models through the lens of stochastic control. Then I will show how we can quantify the expressiveness of diffusion-based generative models. Specifically, I will prove that one can efficiently sample from a wide class of terminal target distributions by choosing the drift of the latent diffusion from the class of multilayer feedforward neural nets, with the accuracy of sampling measured by the Kullback-Leibler divergence to the target distribution. Finally, I will briefly discuss a scheme for unbiased, finite-variance simulation in such models. This scheme can be implemented as a deep generative model with a random number of layers.

Bio: Maxim Raginsky received the B.S. and M.S. degrees in 2000 and the Ph.D. degree in 2002 from Northwestern University, all in Electrical Engineering. He has held research positions with Northwestern, the University of Illinois at Urbana-Champaign (where he was a Beckman Foundation Fellow from 2004 to 2007), and Duke University. In 2012, he has returned to the UIUC, where he is currently an Associate Professor and William L. Everitt Fellow with the Department of Electrical and Computer Engineering, the Coordinated Science Laboratory, and the Department of Computer Science. His research interests cover probability and stochastic processes, deterministic and stochastic control, machine learning, optimization, and information theory. Much of his recent research is motivated by fundamental questions in modeling, learning, and simulation of nonlinear dynamical systems, with applications to advanced electronics, autonomy, and artificial intelligence.

diffusions.pdf

## Title: How does the brain beget the mind?

Abstract: How do molecules, cells and synapses effect reasoning, intelligence, planning, language? Despite dazzling progress in experimental neuroscience, as well as in cognitive science at the other extreme of scale, we do not seem to be making progress in the overarching question: the gap is huge and a completely new approach seems to be required. As Richard Axel recently put it: "We don't have a logic for the transformation of neural activity into thought [...]."

What kind of formal system would qualify as this "logic"?

I will introduce the Assembly Calculus (AC), a computational system which appears to be a promising bridge between neurons and cognition. Through this programming framework, a Parser was recently implemented which (a) can handle reasonably complex sentences of English and other languages, and (b) works exclusively through the spiking of neurons.

Bio: One of world’s leading computer science theorists, Christos Papadimitriou is best known for his work in computational complexity, helping to expand its methodology and reach. He has also explored other fields through what he calls the algorithmic lens, having contributed to biology and the theory of evolution, economics, and game theory (where he helped found the field of algorithmic game theory), artificial intelligence, robotics, networks and the Internet, and more recently the study of the brain.

He authored the widely used textbook Computational Complexity, as well as four others, and has written three novels, including the best-selling Logicomix and his latest, Independence. He considers himself fundamentally a teacher, having taught at UC Berkeley for the past 20 years, and before that at Harvard, MIT, the National Technical University of Athens, Stanford, and UC San Diego.

Papadimitriou has been awarded the Knuth Prize, IEEE’s John von Neumann Medal, the EATCS Award, the IEEE Computer Society Charles Babbage Award, and the Gödel Prize. He is a fellow of the Association for Computer Machinery and the National Academy of Engineering, and a member of the National Academy of Sciences.

He received his BS in Electrical Engineering from Athens Polytechnic in 1972. He has a MS in Electrical Engineering and a PhD in Electrical Engineering/Computer Science from Princeton, received in 1974 and 1976, respectively.

ESSLLI.pdf

## Title: Learning Robust Models: How does the Geometry of Perturbations Play a Role?

Abstract: In this talk, we will focus on the emerging field of (adversarially) robust machine learning. The talk will be self-contained and no particular background on robust learning will be needed. Recent progress in this field has been accelerated by the observation that despite unprecedented performance on clean data, modern learning models remain fragile to seemingly innocuous changes such as small, norm-bounded additive perturbations. Moreover, recent work in this field has looked beyond norm-bounded perturbations and has revealed that various other types of distributional shifts in the data can significantly degrade performance. However, in general our understanding of such shifts is in its infancy and several key questions remain unaddressed.

The goal of this talk is to explain why robust learning paradigms have to be designed — and sometimes rethought — based on the geometry of the input perturbations. We will cover a wide range of perturbation geometries from simple norm-bounded perturbations, to sparse, natural, and more general distribution shifts. As we will show, the geometry of the perturbations necessitates fundamental modifications to the learning procedure as well as the architecture in order to ensure robustness. In the first part of the talk, we will discuss our recent theoretical results on robust learning with respect to various geometries, along with fundamental tradeoffs between robustness and accuracy, phase transitions, etc. The remaining portion of the talk will be about developing practical robust training algorithms and evaluating the resulting (robust) deep networks against state-of-the-art methods on naturally-varying, real-world datasets.

Bio: Hamed Hassani is currently an assistant professor of department of Electrical and Systems Engineering as well as the department of Computer and Information Sciences at the University of Pennsylvania. Prior to that, he was a research fellow at Simons Institute for the Theory of Computing (UC Berkeley) affiliated with the program of Foundations of Machine Learning, and a post-doctoral researcher in the Institute of Machine Learning at ETH Zurich. He received a Ph.D. degree in Computer and Communication Sciences from EPFL, Lausanne. He is the recipient of the 2014 IEEE Information Theory Society Thomas M. Cover Dissertation Award, 2015 IEEE International Symposium on Information Theory Student Paper Award, 2017 Simons-Berkeley Fellowship, 2018 NSF-CRII Research Initiative Award, 2020 Air Force Office of Scientific Research (AFOSR) Young Investigator Award, 2020 National Science Foundation (NSF) CAREER Award, and 2020 Intel Rising Star Award.

## Title: Discovering low-dimensional manifolds in high-dimensional data sets

Abstract: This talk reviews diffusion methods to identify low-dimensional manifolds underlying high-dimensional datasets, and illustrates that by pinpointing additional mathematical structure, improved results can be obtained. Much of the talk draws on a case study from a collaboration with biological morphologists, who compare different phenotypical structures to study relationships of living or extinct animals with their surroundings and each other. This is typically done from carefully defined anatomical correspondence points (landmarks) on e.g. bones; such landmarking draws on highly specialized knowledge. To make possible more extensive use of large (and growing) databases, algorithms are required for automatic morphological correspondence maps, without any preliminary marking of special features or landmarks by the user.

Bio: Ingrid Daubechies is the James Duke Professor of Mathematics and Computer Engineering at Duke University. She is well known for her work with wavelets in image compression. Daubechies is recognized for her study of the mathematical methods that enhance image-compression technology. She is a member of the National Academy of Engineering, the National Academy of Sciences, and the American Academy of Arts and Sciences. Her research accomplishments have garnered her a MacArthur Fellowship, NAS Mathematics Prize, Steele Prize, Nemmers Prize, to name a few. She is a strong supporter of women in science, and a leading figure in the mathematics community. For example, she was the International Mathematical Union President (2011-2014), and currently serves on the NAS US National Committee for Mathematics.

ID_talk.pdf

## Title: Data-Driven Algorithm Design

Abstract: The best algorithm for a computational problem generally depends on the “relevant inputs”, a concept that depends on the application domain and often defies formal articulation. While there is a large literature on empirical approaches to selecting the best algorithm for a given application domain, there has been surprisingly little theoretical analysis of the problem.

We adapt concepts from statistical and online learning theory to reason about application-specific algorithm selection. Our models are straightforward to understand, but also expressive enough to capture several existing approaches in the theoretical computer science and AI communities, ranging from self-improving algorithms to empirical performance models. We present one framework that models algorithm selection as a statistical learning problem, and our work here shows that dimension notions from statistical learning theory, historically used to measure the complexity of classes of binary- and real-valued functions, are relevant in a much broader algorithmic context. We also study the online version of the algorithm selection problem, and give possibility and impossibility results for the existence of no-regret learning algorithms.

Joint work with Rishi Gupta.

Bio: Tim Roughgarden is a Professor of Computer Science at Columbia University. Prior to joining Columbia, he spent 15 years on the computer science faculty at Stanford, following a PhD at Cornell and a postdoc at UC Berkeley. His research interests include the many connections between computer science and economics, as well as the design, analysis, applications, and limitations of algorithms. For his research, he has been awarded the ACM Grace Murray Hopper Award, the Presidential Early Career Award for Scientists and Engineers (PECASE), the Kalai Prize in Computer Science and Game Theory, the Social Choice and Welfare Prize, the Mathematical Programming Society's Tucker Prize, and the EATCS-SIGACT Gödel Prize. He was an invited speaker at the 2006 International Congress of Mathematicians, the Shapley Lecturer at the 2008 World Congress of the Game Theory Society, and a Guggenheim Fellow in 2017. He has written or edited ten books and monographs, including Twenty Lectures on Algorithmic Game Theory (2016), Beyond the Worst-Case Analysis of Algorithms (2020), and the Algorithms Illuminated book series (2017-2020).

berkeley_tripods21.pdf

## Title: Equilibrium Computation and the Foundations of Deep Learning

Abstract: Deep Learning has recently yielded important advances in single-agent learning challenges, much of that progress being fueled by the empirical success of gradient descent and its variants in computing local optima of non-convex optimization problems. In multi-agent learning applications, the role of single-objective optimization is played by equilibrium computation, yet our understanding of its complexity in settings that are relevant for Deep Learning remains sparse. In this talk we focus on min-max optimization of nonconvex-nonconcave objectives, which has found applications in GANs, and other adversarial learning problems. Here, not only are

there no known gradient-descent based methods converging to even local and approximate min-max equilibria, but the computational complexity of identifying them remains poorly understood. We show that finding approximate local min-max equilibria of Lipschitz and smooth objectives requires a number of queries to the function and its gradient that is exponential in the relevant parameters, in sharp contrast to the polynomial number of queries required to find approximate local minima of non convex objectives. Our oracle lower bound is a byproduct of a complexity-theoretic result showing that finding approximate local min-max equilibria is computationally equivalent to finding Brouwer fixed points, and Nash equilibria in non zero-sum games, and thus PPAD-complete.

Minimal complexity theory knowledge will be assumed in the talk. Joint work with Stratis Skoulakis and Manolis Zampetakis

Bio: Constantinos Daskalakis is a Professor of Computer Science at MIT, working on Computation Theory and its interface with Game Theory, Economics, Probability Theory, Machine Learning and Statistics. His work has resolved long-standing problems about the computational complexity of Nash equilibrium, and multi-item auctions, and now focuses on high-dimensional statistics and learning from biased, dependent, and strategic data. He has been honored with the Nevanlinna Prize by the International Mathematical Union as well as other awards including the ACM Doctoral Dissertation Award, the Kalai Prize from the Game Theory Society, the Sloan fellowship in Computer Science, the SIAM outstanding paper prize, the Simons investigator award, the ACM Grace Murray Hopper award, and the Bodossaki foundation distinguished young scientists award.

2021 - Costis.pptx

## Title: When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?

Abstract: Modern machine learning models are complex, and frequently encode surprising amounts of information about individual inputs. In extreme cases, complex models appear to memorize entire input examples, including seemingly irrelevant information (social security numbers from text, for example). In this paper, we aim to understand whether this sort of memorization is necessary for accurate learning. We describe natural prediction problems in which every sufficiently accurate training algorithm must encode, in the prediction model, essentially all the information about a large subset of its training examples. This remains true even when the examples are high-dimensional and have entropy much higher than the sample size, and even when most of that information is ultimately irrelevant to the task at hand. Further, our results do not depend on the training algorithm or the class of models used for learning.

Our problems are simple and fairly natural variants of the next-symbol prediction and the cluster labeling tasks. These tasks can be seen as abstractions of image- and text-related prediction problems. To establish our results, we reduce from a family of one-way communication problems for which we prove new information complexity lower bounds.

Joint work with Gavin Brown, Mark Bun, Vitaly Feldman, and Kunal Talwar.

Bio: Adam Smith is a professor of computer science at Boston University. He obtained his Ph.D. from MIT in 2004, and was a faculty member at Penn State from 2007 to 2017. His research interests lie in data privacy and cryptography, and their connections to machine learning, statistics, information theory, and quantum computing. He received a Presidential Early Career Award for Scientists and Engineers (PECASE) in 2009; Test of Time awards in 2016 (TCC) and 2019 (Eurocrypt); and the 2017 Gödel Prize.

2020-12-04-memorization-FODSI-virtual.pdf

## Title: General lower bounds for estimation under information constraints

Abstract: We present very general lower bounds for parametric estimation when only limited information per sample is allowed. These limitations can arise, for example, in form of communication constraints, privacy constraints, or linear measurements. Our lower bounds hold for discrete distributions with large alphabet as well as continuous distributions with high-dimensional parameters, apply for any information constraint, and are valid for any $\ell_p$ loss function. Our bounds recover both strong data processing inequality based bounds and Cramér-Rao based bound as special cases.

This talk is based on joint work with Jayadev Acharya and Clément Canonne.

Bio: Himanshu Tyagi received the B.Tech. degree in electrical engineering and the M.Tech. degree in communication and information technology, both from the Indian Institute of Technology, Delhi, India in 2007. He received the Ph.D. degree from the University of Maryland, College Park in 2013. From 2013 to 2014, he was a postdoctoral researcher at the Information Theory and Applications (ITA) Center, University of California, San Diego. Since January 2015, he has been a faculty member at the Department of Electrical Communication Engineering, Indian Institute of Science in Bangalore. His research interests broadly lie in information theory and its application in cryptography, statistics, machine learning, and computer science. Also, he is interested in communication and automation for city-scale systems.

HimanshuTalks.pdf

## Title: You Only Speak Once -- Secure MPC with Stateless Ephemeral Roles

Abstract: The inherent difficulty of maintaining stateful environments over long periods of time gave rise to the paradigm of serverless computing, where mostly-stateless components are deployed on demand to handle computation tasks, and are teared down once their task is complete. Serverless architecture could offer the added benefit of improved resistance to targeted denial-of-service attacks. Realizing such protection, requires that the protocol only uses stateless parties. Perhaps the most famous example of this style of protocols is the Nakamoto consensus protocol used in Bitcoin. We refer to this stateless property as the You-Only-Speak-Once (YOSO) property, and initiate the formal study of it within a new YOSO model. Our model is centered around the notion of roles, which are stateless parties that can only send a single message. Furthermore, we describe several techniques for achieving YOSO MPC; both computational and information theoretic.

The talk will be self contained.

Based on joint works with: Fabrice Benhamouda, Craig Gentry, Sergey Gorbunov, Shai Halevi, Hugo Krawczyk, Chengyu Lin, Bernardo Magri, Jesper Nielsen, Leo Reyzin, Sophia Yakoubov.

Bio: Tal Rabin is a researcher whose general area focuses on cryptography and, more specifically, on secure multiparty computation, threshold cryptography, and proactive security. Her works have been instrumental in forming these areas. She is a professor at the University of Pennsylvania, Computer Science Dept and a consultant at the Algorand Foundation.

Prior to joining UPenn she has been the head of research and the Algorand Foundation and prior to that she had been at IBM Research for 23 years as a Distinguished Research Staff Member and the manager of the Cryptographic Research Group. She has a PhD from the Hebrew University.

Rabin is an ACM Fellow, an IACR (International Association of Cryptologic Research) Fellow and member of the American Academy of Arts and Sciences. She is the 2019 recipient of the RSA Award for Excellence in the Field of Mathematics. She was named by Forbes in 2018 as one of the Top 50 Women in Tech in the world. In 2014 Tal won the Anita Borg Women of Vision Award winner for Innovation and was ranked by Business Insider as the #4 on the 22 Most Powerful Women Engineers. Tal has served as the Program and General Chair of the leading cryptography conferences and is an editor of the Journal of Cryptology. She has initiated and organizes the Women in Theory Workshop, a biennial event for graduate students in Theory of Computer Science. She has served as a member of the SIGACT Executive Board and a council member of the Computing Community Consortium.

## Title: Approximating Edit Distance in Near-Linear Time

Abstract: Abstract: Edit distance is a classic measure of similarity between strings, with applications ranging from computational biology to coding. Computing edit distance is also a classic dynamic programming problem, with a quadratic run-time solution, often taught in the "Intro to Algorithms" classes. Improving this runtime has been a decades-old challenge, now ruled likely-impossible using tools from the modern area of fine-grained complexity.

We show how to approximate the edit distance between two strings in near-linear time, up to a constant factor. Our result completes a research direction set forth in the breakthrough paper of [Chakraborty, Das, Goldenberg, Koucky, Saks; FOCS'18], which showed the first constant-factor approximation algorithm with a (strongly) sub-quadratic running time.

Joint work with Negev Shekel Nosatzki, available at https://arxiv.org/abs/2005.07678

Bio: Alexandr Andoni is an associate professor at the Columbia University and the co-chair of the Foundations Center of the Columbia's Data Science Institute. He graduated from MIT in 2009, with a PhD thesis on Nearest Neighbor Search in high-dimensional spaces. Following graduation, he was a postdoctoral researcher at the Center for Computational Intractability, hosted by Princeton, NYU, and IAS. Alexandr then joined Microsoft Research Silicon Valley, where he was a full-time researcher until 2014. Afterwards, Alexandr was a visiting scientist at the Simons Institute for the Theory of Computing at UC Berkeley until joining Columbia in 2015.

Alexandr is a theoretical computer scientist with a general research focus on advancing algorithmic foundations of massive data. His concrete interests revolve around sublinear algorithms (streaming and property testing), high-dimensional computational geometry, metric embeddings, theoretical machine learning and the connections among these areas.

edit_const_ds.pdf

## Title: Veridical Data Science

Abstract: Building and expanding on principles of statistics, machine learning, and the sciences, we propose the predictability, computability, and stability (PCS) framework for veridical data science. Our framework is comprised of both a workflow and documentation and aims to provide responsible, reliable, reproducible, and transparent results across the entire data science life cycle. The PCS workflow uses predictability as a reality check and considers the importance of computation in data collection/storage and algorithm design. It augments predictability and computability with an overarching stability principle for the data science life cycle. Stability expands on statistical uncertainty considerations to assess how human judgment calls impact data results through data and model/algorithm perturbations. We develop inference procedures that build on PCS, namely PCS perturbation intervals and PCS hypothesis testing, to investigate the stability of data results relative to problem formulation, data cleaning, modeling decisions, and interpretations.

Moreover, we propose PCS documentation based on R Markdown or Jupyter Notebook, with publicly available, reproducible codes and narratives to back up human choices made throughout an analysis.

The PCS framework will be illustrated through our DeepTune approach to model and characterize neurons in the difficult visual cortex area V4.

Bio: Bin Yu is The Class of 1936 Second Chair in the College of Letters and Science, and Chancellor's Distinguished Professor, Departments of Statistics and of Electrical Engineering & Computer Sciences, University of California at Berkeley and a former chair of Statistics at UC Berkeley.

She heads the Yu Group, which consists of 15-20 students and postdocs from Statistics and EECS. She was formally trained as a statistician, but her research interests and achievements extend beyond the realm of statistics. Together with her group, her work has leveraged new computational developments to solve important scientific problems by combining novel statistical machine learning approaches with the domain expertise of her many collaborators in neuroscience, genomics, remote sensing, and precision medicine. She and her group also develop relevant theory to provide insight and guide practice.

She is a member of the U.S. National Academy of Sciences and a fellow of the American Academy of Arts and Sciences. She was a Guggenheim Fellow in 2006, and the Tukey Memorial Lecturer of the Bernoulli Society in 2012. She was President of IMS (Institute of Mathematical Statistics) in 2013-2014 and the Rietz Lecturer of IMS in 2016. She received the E. L. Scott Award from COPSS (Committee of Presidents of Statistical Societies) in 2018.

## Title: User-Friendly Submodular Maximization

Abstract: Submodular functions model the intuitive notion of diminishing returns. Due to their far-reaching applications, they have been rediscovered in many fields such as information theory, operations research, statistical physics, economics, and machine learning. They also enjoy computational tractability as they can be minimized exactly or maximized approximately.

The goal of this talk is simple. We see how a little bit of randomness, a little bit of greediness, and the right combination can lead to pretty good methods for offline, streaming, and distributed solutions. I do not assume any background on submodularity and try to explain all the required details during the talk.

Bio: Amin Karbasi is currently an assistant professor of Electrical Engineering, Computer Science, and Statistics at Yale University. He is also a senior visiting scientist at Google NY. He has been the recipient of the National Science Foundation (NSF) Career Award 2019, Office of Naval Research (ONR) Young Investigator Award 2019, Air Force Office of Scientific Research (AFOSR) Young Investigator Award 2018, DARPA Young Faculty Award 2016, National Academy of Engineering Grainger Award 2017, Amazon Research Award 2018, Google Faculty Research Award 2016, Microsoft Azure Research Award 2016, Simons Research Fellowship 2017, and ETH Research Fellowship 2013. His work has also been recognized with a number of paper awards, including Medical Image Computing and Computer Assisted Interventions Conference (MICCAI) 2017, International Conference on Artificial Intelligence and Statistics (AISTAT) 2015, IEEE ComSoc Data Storage 2013, International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2011, ACM SIGMETRICS 2010, and IEEE International Symposium on Information Theory (ISIT) 2010 (runner-up). His Ph.D. thesis received the Patrick Denantes Memorial Prize 2013 from the School of Computer and Communication Sciences at EPFL, Switzerland.

user-friendly submodular max.pdf

## Title: The power of asking more informative questions about the data

Abstract: Many supervised learning algorithms (such as deep learning) need a large collection of labelled data points in order to perform well. However, what is easy to get are large amounts of unlabelled data. Labeling data is an expensive procedure, as it usually needs to be done manually, often by a domain expert. Active learning provides a mechanism to bridge this gap. Active learning algorithms are given a large collection of unlabelled data points. They need to smartly choose a few data points to query their label. The goal is then to automatically infer the labels of many other data points.

In this talk, we will explore the option of giving active learning algorithms additional power, by allowing them to have richer interaction with the data. We will see how allowing for even simple types of queries, such as comparing two data points, can exponentially improve the number of queries needed in various settings. Along the way, we will see interesting connections to both geometry and combinatorics, and a surprising application to fine grained complexity.

Based on joint works with Daniel Kane, Shay Moran and Jiapeng Zhang.

Bio: Shachar Lovett is an Associate Professor at the UC San Diego Computer Science department. He obtained his PhD from the Weizmann Institute in 2010. He is interested in the role that structure and randomness play in computation and mathematics, in particular in computational complexity, optimization, machine learning and coding theory; as well as pseudo-randomness, explicit constructions and additive combinatorics. He is a recipient of an NSF Career award and a Sloan fellowship.

data-sci-lovett.pptx

## Title: Towards Model Agnostic Robustness

Abstract: It is now common practice to try and solve machine learning problems by starting with a complex existing model or architecture, and fine-tuning/adapting it to the task at hand. However, outliers, errors or even just sloppiness in training data often lead to drastic drops in performance.

We investigate a simple generic approach to correct for this, motivated by a classic statistical idea: trimmed loss. This advocates jointly (a) selecting which training samples to ignore, and (b) fitting a model on the remaining samples. As such this is computationally infeasible even for linear regression. We propose and study the natural iterative variant that alternates between these two steps (a) and (b) - each of which individually can be easily accomplished in pretty much any statistical setting. We also study the batch-SGD variant of this idea. We demonstrate both theoretically (for generalized linear models) and empirically (for vision and NLP neural network models) that this effectively recovers accuracy in the presence of bad training data.

This work is joint with Yanyao Shen and Vatsal Shah and appears in NeurIPS 2019, ICML 2019 and AISTATS 2020.

Bio: Sujay Sanghavi is an Associate Professor at the University of Texas, Austin. He received a PhD in ECE and an MS in Math and ECE from the University of Illinois, and was a postdoc at LIDS in MIT. Sujay’s research focus is rigorous methodological innovation in machine learning, using ideas from optimization, statistics and graph theory. He has received early career awards from the NSF and DoD, and currently leads the NSF TRIPODS Institute on the Foundations of Data Science at UT.

Sujay is also interested in learning from and applying his ideas in industry. He has been a Visiting Scientist at Google Research, a senior quant at Engineers Gate and is currently a Principal Scientist and Amazon Scholar at Amazon.

SujayUMass2020.pdf

## Title: Fairness and Bias in Algorithmic Decision-Making

Abstract: As data science has broadened its scope in recent years, a number of domains have applied computational methods for classification and prediction to evaluate individuals in high-stakes settings. These developments have led to an active line of recent discussion in the public sphere about the consequences of algorithmic prediction for notions of fairness and equity. In part, this discussion has involved a basic tension between competing notions of what it means for such classifications to be fair to different groups. We consider several of the key fairness conditions that lie at the heart of these debates, and in particular how these properties operate when the goal is to rank-order a set of applicants by some criterion of interest, and then to select the top-ranking applicants.

The talk will be based on joint work with Sendhil Mullainathan and Manish Raghavan.

data-sci-kleinberg.pdf