Machine Learning Seminar Series

Dec. 2rd

Speaker: Yangyang Xu

  • Rensselaer Polytechnic Institute

  • Bio : Yangyang Xu is now a tenure-track assistant professor in the Department of Mathematical Sciences at Rensselaer Polytechnic Institute. He received his B.S. in Computational Mathematics from Nanjing University in 2007, M.S. in Operations Research from the Chinese Academy of Sciences in 2010, and Ph.D. from the Department of Computational and Applied Mathematics at Rice University in 2014. His research interests are mainly in optimization theory and methods and their applications, such as in machine learning, statistics, and signal processing. His research has been supported by NSF and IBM. He was awarded the gold medal in the 2017 International Consortium of Chinese Mathematicians (ICCM).

Talk information

  • Title: First-order methods for nonlinear-constrained optimization

  • Time: Thursday, Dec. 2rd, 2021 12:001:00 pm

  • Location: Online via zoom (join)

Abstract

First-order methods (FOMs) have recently been applied and analyzed for solving problems with complicated functional constraints. Existing works show that FOMs for functional constrained problems have lower-order convergence rates than those for unconstrained problems. In particular, an FOM for a smooth strongly-convex problem can have linear convergence, while it can only converge sublinearly for a constrained problem if the projection onto the constraint set is prohibited. In this talk, I will first give a lower-bound result of FOM for solving affine-constrained problems. Then I will show that the slower convergence is caused by the large number of functional constraints but not the constraints themselves. When there are only O(1) functional constraints, I will show that an FOM can have almost the same convergence rate as that for solving an unconstrained problem, even without the projection onto the feasible set. Finally, I will give an adaptive primal-dual method for problems with many constraints. Experimental results on quadratically-constrained quadratic programs will be shown to demonstrate the theory.

Nov. 18th

Speaker: Steven L. Brunton

  • Mechanical Engineering at the University of Washington

  • Bio : Steven L. Brunton (https://www.me.washington.edu/facultyfinder/steve-brunton) is a Professor of Mechanical Engineering at the University of Washington. He is also Adjunct Professor of Applied Mathematics and Computer science, and a Data Science Fellow at the eScience Institute. Steve received the B.S. in mathematics from Caltech in 2006 and the Ph.D. in mechanical and aerospace engineering from Princeton in 2012. His research combines machine learning with dynamical systems to model and control systems in fluid dynamics, biolocomotion, optics, energy systems, and manufacturing. He is a co-author of three textbooks, received the University of Washington College of Engineering junior faculty and teaching awards, the Army and Air Force Young Investigator Program (YIP) awards, and the Presidential Early Career Award for Scientists and Engineers (PECASE).

Talk information

  • Title: Machine Learning for Sparse Nonlinear Modeling and Control

  • Time: Thursday, Nov. 18th, 2021 12:001:00 pm

  • Location: Online via zoom (join)

Abstract

This work describes how machine learning may be used to develop accurate and efficient nonlinear dynamical systems models for complex natural and engineered systems. We explore the sparse identification of nonlinear dynamics (SINDy) algorithm, which identifies a minimal dynamical system model that balances model complexity with accuracy, avoiding overfitting. This approach tends to promote models that are interpretable and generalizable, capturing the essential “physics” of the system. We also discuss the importance of learning effective coordinate systems in which the dynamics may be expected to be sparse. This sparse modeling approach will be demonstrated on a range of challenging modeling problems in fluid dynamics, and we will discuss how to incorporate these models into existing model-based control efforts.

Nov. 11th

Speaker: Johannes O. Royset

  • Operations Research at the Naval Postgraduate School

  • Bio : Dr. Johannes O. Royset is Professor of Operations Research at the Naval Postgraduate School. Dr. Royset's research focuses on formulating and solving stochastic and deterministic optimization problems arising in data analytics, sensor management, and reliability engineering. He was awarded a National Research Council postdoctoral fellowship in 2003, a Young Investigator Award from the Air Force Office of Scientific Research in 2007, and the Barchi Prize as well as the MOR Journal Award from the Military Operations Research Society in 2009. He received the Carl E. and Jessie W. Menneken Faculty Award for Excellence in Scientific Research in 2010 and the Goodeve Medal from the Operational Research Society in 2019. Dr. Royset was a plenary speaker at the International Conference on Stochastic Programming in 2016 and at the SIAM Conference on Uncertainty Quantification in 2018. He has a Doctor of Philosophy degree from the University of California at Berkeley (2002). Dr. Royset has been an associate or guest editor of SIAM Journal on Optimization, Operations Research, Mathematical Programming, Journal of Optimization Theory and Applications, Naval Research Logistics, Journal of Convex Analysis, Set-Valued and Variational Analysis, and Computational Optimization and Applications. He is the author of about 100 papers and two books.

Talk information

  • Title: Deep Graph Learning for Drug Property Prediction

  • Time: Thursday, Nov. 11th, 2021 12:001:00 pm

  • Location: Online via zoom (join)

Abstract

The theoretical and empirical performance of Empirical Risk Minimization (ERM) often suffers when loss functions are poorly behaved with large Lipschitz moduli and spurious sharp minimizers. We propose and analyze a counterpart to ERM called Diametrical Risk Minimization (DRM), which accounts for worst-case empirical risks within neighborhoods in parameter space. DRM has generalization bounds that are independent of Lipschitz moduli for convex as well as nonconvex problems and it can be implemented using a practical algorithm based on stochastic gradient descent. Numerical results illustrate the ability of DRM to find quality solutions with low generalization error in sharp empirical risk landscapes from benchmark neural network classification problems with corrupted labels.

Nov. 4th

Speaker: Junzhou Huang

  • Department of Computer Science and Engineering, University of Texas at Arlington

  • Bio : Dr. Junzhou Huang is a professor in the department of computer science and engineering at the University of Texas, Arlington. He received the Ph.D. degree in Computer Science at Rutgers, The State University of New Jersey. His major research interests include machine learning, computer vision, computational pathology, computational drug discovery and clinical science. He was selected as one of the 10 emerging leaders in multimedia and signal processing by the IBM T.J. Watson Research Center in 2010. His work won the MICCAI Young Scientist Award 2010, the FIMH Best Paper Award 2011, the STMI Best Paper Award 2012, the MICCAI Best Student Paper Award 2015, the 1st place of the Tool Presence Detection Challenge at M2CAI 2016, the 6th place in the 3D Structure Prediction Challenge and the 1st place in the Contact and Distance Prediction Challenge at CASP14, 2020 and the Google TensorFlow Model Garden Award 2021. He received the NSF CAREER Award 2016.

Talk information

  • Title: Deep Graph Learning for Drug Property Prediction

  • Time: Thursday, Nov. 4th, 2021 12:001:00 pm

  • Location: Online via zoom (join)

Abstract

Graphs are powerful mathematical structures to describe relations or interactions among objects in different fields, such as biology, social science, economics and so on. Recent technological innovations are enabling scientists to capture enormous graph-structured data at increasing speed and scale. Thus, a compelling need exists to develop novel learning tools to foster and fuel the next generation of scientific discovery in graph data related research. However, the major computational challenges are due to the unprecedented scale and complexity of complex graph data analytics. There is a critical need for large-scale learning strategies with theoretical guarantees to bridge the gap and facilitate knowledge discovery from complex graph data. This talk will introduce our recent work on developing novel deep graph learning methods to efficiently and effectively process atom graph data for predicting the chemical or biological properties of drug molecules.

Oct. 28th

Speaker: Irfan Bulu

  • Department of Statistics, UC Berkeley

  • Bio : I received a PhD in Physics from Bilkent University in 2007. The focus of my PhD work was novel structures such as photonic crystals, plasmonic devices and metamaterials for controlling the flow of light. I joined Prof. Marko Loncar’s lab at Harvard University for postdoc after completing PhD. There, I tackled problems and challenges in communication security [1] and communication bandwidth [2] using diamond nano-photonic structures. In 2013, I took a career in industrial research at Schlumberger, the largest oil field services company, which started an exciting journey for me in taking innovations from lab to products at the hands of customers. For example, our team invented a new nuclear magnetic resonance logging tool [3], which improved logging speed by an order of magnitude, thereby addressing an important challenge for our customers in adopting nuclear magnetic resonance measurements. This work also led me to a career in machine learning as both the design of the instrument and interpretation of various measurements in oil field benefited from advances in deep learning. I joined United Health Group in 2018, where I research machine learning algorithms for healthcare applications.

[1]"Enhanced single-photon emission from a diamond–silver aperture," Nature Photonics, 2011.

[2] "Diamond nonlinear photonics," Nature Photonics, 2014.

[3]"NMR well logging instrument and method with synthetic apertures". Patent 10444397, 2019.


Talk information

  • Title: How to improve healthcare AI? Incorporating multimodal data and domain knowledge

  • Time: Thursday, Oct. 28th, 2021 12:001:00 pm

  • Location: Online via zoom (join)

Abstract

Healthcare data is special. Its complex nature is a double-edged sword, possessing great potential but also presenting many difficulties to overcome. For example, administrative claims data —in contrast to the common data types (text, vision, audio) where AI has made eye-popping advances—is multi-modal (i.e., consisting of distinct data types including medical claims, pharmacy claims, and lab results), asynchronous (medication histories and diagnosis histories need not be aligned in time), and irregularly sampled (we only collect data when an individual interacts with the system). Along with such rich and complex data, there is a great deal of domain knowledge in various forms in the healthcare field. In this talk, I will present our work on deep learning architectures for incorporating multimodal data and domain knowledge into models.

Oct. 21th

Speaker: Song Mei

  • Department of Statistics, UC Berkeley

  • Bio : Song Mei is an Assistant Professor in statistics at UC Berkeley. His research is motivated by data science and lies at the intersection of statistics, machine learning, information theory, and computer science. His work often builds on insights that originate within statistical physics literature. His recent research interests include theory of deep learning, high dimensional geometry, approximate Bayesian inferences, and applied random matrix theory.

Talk information

  • Title: The efficiency of kernel methods on structured datasets

  • Time: Thursday, Oct. 21th, 2021 12:001:00 pm

  • Location: Online via zoom (join) (slides)(video)

Abstract

Inspired by the proposal of tangent kernels of neural networks (NNs), a recent research line aims to design kernels with a better generalization performance on standard datasets. Indeed, a few recent works showed that certain kernel machines perform as well as NNs on certain datasets, despite their separations in specific cases implied by theoretical results. Furthermore, it was shown that the induced kernels of convolutional neural networks perform much better than any former handcrafted kernels. These empirical results pose a theoretical challenge to understanding the performance gaps in kernel machines and NNs in different scenarios.


In this talk, we show that data structures play an essential role in inducing these performance gaps. We consider a few natural data structures, and study their effects on the performance of these learning methods. Based on a fine-grained high dimensional asymptotics framework of analyzing random features models and kernel machines, we show the following: 1) If the feature vectors are nearly isotropic, kernel methods suffer from the curse of dimensionality, while NNs can overcome it by learning the best low-dimensional representation; 2) If the feature vectors display the same low-dimensional structure as the target function (the spiked covariates model), this curse of dimensionality becomes milder, and the performance gap between kernel methods and NNs become smaller; 3) On datasets that display some invariance structure (e.g., image dataset), there is a quantitative performance gain of using invariant kernels (e.g., convolutional kernels) over inner product kernels. Beyond explaining the performance gaps, these theoretical results can further provide some intuitions towards designing kernel methods with better performance.

Oct. 14th

Speaker: Uday V. Shanbhag

  • Department of Industrial and Manufacturing Engineering, Pennsylvania State University

  • Bio : Uday V. Shanbhag has held the Gary and Sheila Bello Chaired professorship in Ind. & Manuf. Engr. at Penn State University (PSU) since Nov. 2017 and has been at PSU since Fall 2012, prior to which he was at University of Illinois at Urbana- Champaign (between 2006–2012, both as an assistant and a tenured associate professor). His interests lie in the analysis and solution of optimization problems, variational inequality problems, and noncooperative games complicated by nonsmoothness and uncertainty. He holds undergraduate and Master’s degrees from IIT, Mumbai (1993) and MIT, Cambridge (1998) respectively and a Ph.D. in management science and engineering (Operations Research) from Stanford University (2006).

Talk information

  • Title: Probability Maximization via Minkowski Functionals: Convex Representations and Tractable Resolution

  • Time: Thursday, Oct. 14th, 2021 12:001:00 pm

  • Location: Online via zoom (join) (sildes) (video)

Abstract

Oct. 7th

Speaker: Eric Vanden-Eijnden

  • Courant Institute, New York University

  • Bio : Salman Avestimehr is a Dean's Professor, the inaugural directEric Vanden-Eijnden is a Professor of Mathematics at the Courant Institute of Mathematical Sciences, New York University. His research focuses on the mathematical and computational aspects of statistical mechanics, with applications to complex dynamical systems arising in molecular dynamics, materials science, atmosphere-ocean science, fluids dynamics, and neural networks. He is also interested in the mathematical foundations of machine learning (ML) and the applications of ML in scientific computing. He is known for the development and analysis of multiscale numerical methods for systems whose dynamics span a wide range of spatio-temporal scales. He is the winner of the Germund Dahlquist Prize and the J.D. Crawford Prize, and a recipient of the Vannevar Bush Faculty Fellowship.

Talk information

  • Title: Machine Learning and Scientific Computing

  • Time: Thursday, Oct. 7th, 2021 12:001:00 pm

  • Location: Online via zoom (join) (slides)(video)

Abstract

The recent success of machine learning suggests that neural networks may be capable of approximating high-dimensional functions with controllably small errors. As a result, they could outperform standard function interpolation methods that have been the workhorses of current numerical methods. This feat offers exciting prospects for scientific computing, as it may allow us to solve problems in high-dimension once thought intractable. At the same time, looking at the tools of machine learning through the lens of applied mathematics and numerical analysis can give new insights as to why and when neural networks can beat the curse of dimensionality. I will briefly discuss these issues, and present some applications related to solving PDE in large dimensions and sampling high-dimensional probability distributions.

Sep. 30th

Speaker: Salman Avestimehr

  • USC

  • Bio : Salman Avestimehr is a Dean's Professor, the inaugural director of the USC-Amazon Center on Secure and Trusted Machine Learning (Trusted AI), and the director of the Information Theory and Machine Learning (vITAL) research lab at the Electrical and Computer Engineering Department of University of Southern California. He is also an Amazon Scholar at Alexa AI. He received his Ph.D. in 2008 and M.S. degree in 2005 in Electrical Engineering and Computer Science, both from the University of California, Berkeley. Prior to that, he obtained his B.S. in Electrical Engineering from Sharif University of Technology in 2003. His research interests include information theory, and large-scale distributed computing and machine learning, secure and private computing/learning, and federated learning.

Dr. Avestimehr has received a number of awards for his research, including the James L. Massey Research & Teaching Award from IEEE Information Theory Society, an Information Theory Society and Communication Society Joint Paper Award, a Presidential Early Career Award for Scientists and Engineers (PECASE) from the White House (President Obama), a Young Investigator Program (YIP) award from the U. S. Air Force Office of Scientific Research, a National Science Foundation CAREER award, a USC Mentoring Award, and the David J. Sakrison Memorial Prize, and several Best Paper Awards at Conferences. He has been an Associate Editor for IEEE Transactions on Information Theory and a general Co-Chair of the 2020 International Symposium on Information Theory (ISIT). He is a fellow of IEEE.

Talk information

  • Title: Secure Model Aggregation in Federated Learning

  • Time: Thursday, Sep. 30th, 2021 12:001:00 pm

  • Location: Online via zoom (join)

Abstract

Federated learning (FL) has emerged as a promising approach for distributed machine learning over edge devices, in order to strengthen data privacy, reduce data migration costs, and break regulatory restrictions. A key component of FL is "secure model aggregation", which aims at protecting the privacy of each user’s individual model, while allowing their global aggregation. This problem can be viewed as a privacy-preserving multi-party computing, but with two interesting twists: (1) some users may drop out during the protocol (due to poor connectivity, low battery, unavailability, etc); (2) there is potential for multi-round privacy leakage, even if each round is perfectly secure. In this talk, I will first provide a brief overview of FL, then discuss several recent results on secure model aggregation, and finally end the talk by highlighting a few open problems in the area.

Sep. 23th

Speaker: Mikhail Belkin

  • Senior Halicioğlu Data Science Institute, UCSD

  • Bio : Mikhail Belkin received his Ph.D. in 2003 from the Department of Mathematics at the University of Chicago. His research interests are in theory and applications of machine learning and data analysis. Some of his well-known work includes widely used Laplacian Eigenmaps, Graph Regularization and Manifold Regularization algorithms, which brought ideas from classical differential geometry and spectral analysis to data science. His recent work has been concerned with understanding remarkable mathematical and statistical phenomena observed in deep learning. This empirical evidence necessitated revisiting some of the basic concepts in statistics and optimization. One of his key recent findings is the "double descent" risk curve that extends the textbook U-shaped bias-variance trade-off curve beyond the point of interpolation. Mikhail Belkin is a recipient of a NSF Career Award and a number of best paper and other awards. He has served on the editorial boards of the Journal of Machine Learning Research, IEEE Pattern Analysis and Machine Intelligence and SIAM Journal on Mathematics of Data Science.

Talk information

  • Title: The Polyak-Lojasiewicz condition as a framework for over-parameterized optimization and its application to deep learning

  • Time: Thursday, Sep. 23th, 2021 12:001:00 pm

  • Location: Online via zoom (join) (slides) (video)

Abstract

The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. In this talk I will discuss some general mathematical principles allowing for efficient optimization in over-parameterized non-linear systems, a setting that includes deep neural networks. I will discuss that optimization problems corresponding to these systems are not convex, even locally, but instead satisfy the Polyak-Lojasiewicz (PL) condition on most of the parameter space, allowing for efficient optimization by gradient descent or SGD. I will connect the PL condition of these systems to the condition number associated with the tangent kernel and show how a non-linear theory for those systems parallels classical analyses of over-parameterized linear equations. As a separate related developement, I will discuss a perspective on the remarkable recently discovered phenomenon of transition to linearity (constancy of NTK) in certain classes of large neural networks. I will show how this transition to linearity results from the scaling of the Hessian with the size of the network controlled by certain functional norms. Combining these ideas, I will show how the transition to linearity can be used to demonstrate the PL condition and convergence for a general class of wide neural networks. Finally I will comment on systems which are ''almost'' over-parameterized, which appears to be common in practice.

Sep. 16th

Speaker: Holger Roth

  • Senior Applied Research Scientist, NVIDIA

  • Bio : Holger Roth is a Sr. Applied Research Scientist at NVIDIA focusing on deep learning for medical imaging. He has been working closely with clinicians and academics over the past several years to develop deep learning based medical image computing and computer-aided detection models for radiological applications. He is an Associate Editor for IEEE Transactions of Medical Imaging and holds a Ph.D. from University College London, UK. In 2018, he was awarded the MICCAI Young Scientist Publication Impact Award.

Talk information

  • Title: Tackling the Challenges of Next-generation Healthcare: NVIDIA’s Applied Research in Medical Imaging

  • Time: Thursday, Sep. 16th, 2021 12:001:00 pm

  • Location: Online via zoom (join) (slides) (video)

Abstract

Recent advances in computer vision and artificial intelligence caused a paradigm shift in medical image computing and radiological image analysis. Deep learning has been widely applied to many radiological applications, replacing, or working together with conventional methods. The advantage of being able to learn from that data directly is promising for many imaging tasks. Some key factors and current challenges preventing the widespread adaption of machine learning techniques in the clinic are algorithmic considerations, computational power, and, most critically, high-quality data for training.


NVIDIA wants to provide solutions to make the widespread adoption of deep learning and artificial intelligence easier in the real world. This talk will highlight NVIDIA’s efforts in the healthcare sector and medical imaging research, for example, around federated learning and COVID-19 image analysis, and introduce platforms & hardware considerations for modern machine learning at scale.

Sep. 9th

Speaker: Tuo Zhao

  • The H. Milton Stewart School of Industrial and Systems Engineering (ISyE), Georgia Institute of Technology

  • Bio : Tuo Zhao is an assistant professor at Georgia Tech. He received his Ph.D. degree in Computer Science at Johns Hopkins University. His research mainly focuses on developing methodologies, algorithms and theories for machine learning, especially deep learning. He is also actively working in neural language models and open-source machine learning software for scientific data analysis. He has received several awards, including the winner of INDI ADHD-200 global competition, ASA best student paper award on statistical computing, INFORMS best paper award on data mining and Google faculty research award.

Talk information

  • Title: On Fine-Tuning of Pretrained Language Models under Limited Supervision: A Machine Learning Perspective

  • Time: Thursday, Sep. 9th, 2021 12:001:00 pm

  • Location: Online via zoom (join)

Abstract

Transfer learning has fundamentally changed the landscape of natural language processing (NLP). Many state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. When we only have limited supervision for the downstream tasks, however, due to the extremely high complexity of pre-trained models, aggressive fine-tuning often causes the fine-tuned model to overfit the training data of downstream tasks and fail to generalize to unseen data.

To address such a concern, we propose a new approach for fine-tuning of pretrained models to attain better generalization performance. Our proposed approach adopts three important ingredients: (1) Smoothness-inducing adversarial regularization, which effectively controls the complexity of the massive model; (2) Bregman proximal point optimization, which is an instance of trust-region algorithms and can prevent aggressive updating; (3) Differentiable programming, which can mitigate the undesired bias induced by conventional adversarial training algorithms. Our experiments show that the proposed approach significantly outperforms existing methods in multiple NLP tasks. In addition, our theoretical analysis provides some new insights of adversarial training for improving generalization.

Sep. 2nd

Speaker: Simon Batzner

  • Applied Mathematics, Harvard University

  • Bio : I'm a Mathematician and Machine Learning Researcher at Harvard. Previously, I worked on Machine Learning at MIT, wrote software on a NASA mission, and spent some time at McKinsey. I enjoy working with ambitious people who want to change the world.

Talk information

  • Title: Causal Inference from Slowly Varying Nonstationary Processes

  • Time: Thursday, Sep. 2nd, 2021 12:001:00 pm

  • Location: Online via zoom (join) (slides) (video)

Abstract

Representations of atomistic systems for machine learning must transform predictably under the geometric transformations of 3D space, in particular rotation, translation, mirrors, and permutation of atoms of the same species. These constraints are typically satisfied by means of atomistic representations that depend on scalar distances and angles, leaving the representation invariant under the above transformations. Invariance, however, limits the expressivity and can lead to an incompleteness of representations. In order to overcome this shortcoming, we recently introduced Neural Equviariant Interatomic Potentials [1], a Graph Neural Network approach for learning interatomic potentials that uses a E(3)-equivariant representation of atomic environments. While most current Graph Neural Network interatomic potentials use invariant convolutions over scalar features, NequIP instead employs equivariant convolutions over geometric tensors (scalar, vectors, …), providing a more information-rich message passing scheme. In my talk, I will first motivate the choice of an equivariant representation for atomistic systems and demonstrate how it allows for the design of interatomic potentials at previously unattainable accuracy. I will discuss applications on a diverse set of molecules and materials, including small organic molecules, water in different phases, a catalytic surface reaction, proteins, glass formation of a lithium phosphate, and Li diffusion in a superionic conductor. I will then show that NequIP can predict structural and kinetic properties from molecular dynamics simulations in excellent agreement with ab-initio simulations. The talk will then discuss the observation of a remarkable sample efficiency in equivariant interatomic potentials which outperform existing neural network potentials with up to 1000x fewer training data and rival or even surpass the sample efficiency of kernel methods. Finally, I will discuss potential reasons for the high sample efficiency of equivariant interatomic potentials.