Past Talks - Fall 2020

Dec 04, 2020: Adam Smith (Boston University):
"
When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?"

Abstract: Modern machine learning models are complex, and frequently encode surprising amounts of information about individual inputs. In extreme cases, complex models appear to memorize entire input examples, including seemingly irrelevant information (social security numbers from text, for example). In this paper, we aim to understand whether this sort of memorization is necessary for accurate learning. We describe natural prediction problems in which every sufficiently accurate training algorithm must encode, in the prediction model, essentially all the information about a large subset of its training examples. This remains true even when the examples are high-dimensional and have entropy much higher than the sample size, and even when most of that information is ultimately irrelevant to the task at hand. Further, our results do not depend on the training algorithm or the class of models used for learning.

Our problems are simple and fairly natural variants of the next-symbol prediction and the cluster labeling tasks. These tasks can be seen as abstractions of image- and text-related prediction problems. To establish our results, we reduce from a family of one-way communication problems for which we prove new information complexity lower bounds.

Joint work with Gavin Brown, Mark Bun, Vitaly Feldman, and Kunal Talwar.

Bio: Adam Smith is a professor of computer science at Boston University. He obtained his Ph.D. from MIT in 2004, and was a faculty member at Penn State from 2007 to 2017. His research interests lie in data privacy and cryptography, and their connections to machine learning, statistics, information theory, and quantum computing. He received a Presidential Early Career Award for Scientists and Engineers (PECASE) in 2009; Test of Time awards in 2016 (TCC) and 2019 (Eurocrypt); and the 2017 Gödel Prize.

2020-12-04-memorization-FODSI-virtual.pdf

Nov 20, 2020: Himanshu Tyagi (Indian Institute of Science):
"General lower bounds for estimation under information constraints"

Abstract: We present very general lower bounds for parametric estimation when only limited information per sample is allowed. These limitations can arise, for example, in form of communication constraints, privacy constraints, or linear measurements. Our lower bounds hold for discrete distributions with large alphabet as well as continuous distributions with high-dimensional parameters, apply for any information constraint, and are valid for any $\ell_p$ loss function. Our bounds recover both strong data processing inequality based bounds and Cramér-Rao based bound as special cases.

This talk is based on joint work with Jayadev Acharya and Clément Canonne.

Bio: Himanshu Tyagi received the B.Tech. degree in electrical engineering and the M.Tech. degree in communication and information technology, both from the Indian Institute of Technology, Delhi, India in 2007. He received the Ph.D. degree from the University of Maryland, College Park in 2013. From 2013 to 2014, he was a postdoctoral researcher at the Information Theory and Applications (ITA) Center, University of California, San Diego. Since January 2015, he has been a faculty member at the Department of Electrical Communication Engineering, Indian Institute of Science in Bangalore. His research interests broadly lie in information theory and its application in cryptography, statistics, machine learning, and computer science. Also, he is interested in communication and automation for city-scale systems.

HimanshuTalks.pdf

Nov 09, 2020: Tal Rabin (University of Pennsylvania) :
"
You Only Speak Once -- Secure MPC with Stateless Ephemeral Roles"

Abstract: The inherent difficulty of maintaining stateful environments over long periods of time gave rise to the paradigm of serverless computing, where mostly-stateless components are deployed on demand to handle computation tasks, and are teared down once their task is complete. Serverless architecture could offer the added benefit of improved resistance to targeted denial-of-service attacks. Realizing such protection, requires that the protocol only uses stateless parties. Perhaps the most famous example of this style of protocols is the Nakamoto consensus protocol used in Bitcoin. We refer to this stateless property as the You-Only-Speak-Once (YOSO) property, and initiate the formal study of it within a new YOSO model. Our model is centered around the notion of roles, which are stateless parties that can only send a single message. Furthermore, we describe several techniques for achieving YOSO MPC; both computational and information theoretic.

The talk will be self contained.

Based on joint works with: Fabrice Benhamouda, Craig Gentry, Sergey Gorbunov, Shai Halevi, Hugo Krawczyk, Chengyu Lin, Bernardo Magri, Jesper Nielsen, Leo Reyzin, Sophia Yakoubov.

Bio: Tal Rabin is a researcher whose general area focuses on cryptography and, more specifically, on secure multiparty computation, threshold cryptography, and proactive security. Her works have been instrumental in forming these areas. She is a professor at the University of Pennsylvania, Computer Science Dept and a consultant at the Algorand Foundation.

Prior to joining UPenn she has been the head of research and the Algorand Foundation and prior to that she had been at IBM Research for 23 years as a Distinguished Research Staff Member and the manager of the Cryptographic Research Group. She has a PhD from the Hebrew University.

Rabin is an ACM Fellow, an IACR (International Association of Cryptologic Research) Fellow and member of the American Academy of Arts and Sciences. She is the 2019 recipient of the RSA Award for Excellence in the Field of Mathematics. She was named by Forbes in 2018 as one of the Top 50 Women in Tech in the world. In 2014 Tal won the Anita Borg Women of Vision Award winner for Innovation and was ranked by Business Insider as the #4 on the 22 Most Powerful Women Engineers. Tal has served as the Program and General Chair of the leading cryptography conferences and is an editor of the Journal of Cryptology. She has initiated and organizes the Women in Theory Workshop, a biennial event for graduate students in Theory of Computer Science. She has served as a member of the SIGACT Executive Board and a council member of the Computing Community Consortium.


Oct 09, 2020: Alex Andoni (Columbia University):
"
Approximating Edit Distance in Near-Linear Time"

Abstract: Abstract: Edit distance is a classic measure of similarity between strings, with applications ranging from computational biology to coding. Computing edit distance is also a classic dynamic programming problem, with a quadratic run-time solution, often taught in the "Intro to Algorithms" classes. Improving this runtime has been a decades-old challenge, now ruled likely-impossible using tools from the modern area of fine-grained complexity.

We show how to approximate the edit distance between two strings in near-linear time, up to a constant factor. Our result completes a research direction set forth in the breakthrough paper of [Chakraborty, Das, Goldenberg, Koucky, Saks; FOCS'18], which showed the first constant-factor approximation algorithm with a (strongly) sub-quadratic running time.

Joint work with Negev Shekel Nosatzki, available at https://arxiv.org/abs/2005.07678

Bio: Alexandr Andoni is an associate professor at the Columbia University and the co-chair of the Foundations Center of the Columbia's Data Science Institute. He graduated from MIT in 2009, with a PhD thesis on Nearest Neighbor Search in high-dimensional spaces. Following graduation, he was a postdoctoral researcher at the Center for Computational Intractability, hosted by Princeton, NYU, and IAS. Alexandr then joined Microsoft Research Silicon Valley, where he was a full-time researcher until 2014. Afterwards, Alexandr was a visiting scientist at the Simons Institute for the Theory of Computing at UC Berkeley until joining Columbia in 2015.

Alexandr is a theoretical computer scientist with a general research focus on advancing algorithmic foundations of massive data. His concrete interests revolve around sublinear algorithms (streaming and property testing), high-dimensional computational geometry, metric embeddings, theoretical machine learning and the connections among these areas.

edit_const_ds.pdf

Sep 11, 2020: Bin Yu (UC Berkeley):
"Veridical Data Science"

Abstract: Building and expanding on principles of statistics, machine learning, and the sciences, we propose the predictability, computability, and stability (PCS) framework for veridical data science. Our framework is comprised of both a workflow and documentation and aims to provide responsible, reliable, reproducible, and transparent results across the entire data science life cycle. The PCS workflow uses predictability as a reality check and considers the importance of computation in data collection/storage and algorithm design. It augments predictability and computability with an overarching stability principle for the data science life cycle. Stability expands on statistical uncertainty considerations to assess how human judgment calls impact data results through data and model/algorithm perturbations. We develop inference procedures that build on PCS, namely PCS perturbation intervals and PCS hypothesis testing, to investigate the stability of data results relative to problem formulation, data cleaning, modeling decisions, and interpretations.

Moreover, we propose PCS documentation based on R Markdown or Jupyter Notebook, with publicly available, reproducible codes and narratives to back up human choices made throughout an analysis.

The PCS framework will be illustrated through our DeepTune approach to model and characterize neurons in the difficult visual cortex area V4.

Bio: Bin Yu is The Class of 1936 Second Chair in the College of Letters and Science, and Chancellor's Distinguished Professor, Departments of Statistics and of Electrical Engineering & Computer Sciences, University of California at Berkeley and a former chair of Statistics at UC Berkeley.

She heads the Yu Group, which consists of 15-20 students and postdocs from Statistics and EECS. She was formally trained as a statistician, but her research interests and achievements extend beyond the realm of statistics. Together with her group, her work has leveraged new computational developments to solve important scientific problems by combining novel statistical machine learning approaches with the domain expertise of her many collaborators in neuroscience, genomics, remote sensing, and precision medicine. She and her group also develop relevant theory to provide insight and guide practice.

She is a member of the U.S. National Academy of Sciences and a fellow of the American Academy of Arts and Sciences. She was a Guggenheim Fellow in 2006, and the Tukey Memorial Lecturer of the Bernoulli Society in 2012. She was President of IMS (Institute of Mathematical Statistics) in 2013-2014 and the Rietz Lecturer of IMS in 2016. She received the E. L. Scott Award from COPSS (Committee of Presidents of Statistical Societies) in 2018.