Sensor Data Modeling through Textual Descriptions for Human Activity Recognition (and beyond) in Smart Homes (04/11/2025)
Presenter: Sourish Gunesh Dhekane
Human activity recognition (HAR) using ambient sensors in smart homes has numerous applications for human healthcare and wellness. However, building general-purpose HAR models that can be deployed to new smart home environments requires a significant amount of annotated sensor data and training overhead. Most smart homes vary significantly in their layouts, i.e., floor plans and the specifics of embedded sensors, resulting in low generalizability of HAR models trained for specific homes. In this talk, I will present one of our recent works, where we addressed this limitation by introducing a novel, layout-agnostic modeling approach for HAR systems in smart homes that utilizes the transferrable representational capacity of natural language descriptions of raw sensor data. To this end, we generated Textual Descriptions Of Sensor Triggers (TDOST) that encapsulate the surrounding trigger conditions and provide cues for underlying activities to the activity recognition models. By leveraging textual embeddings rather than raw sensor data, we created activity recognition systems that can predict standard activities across homes without (re-)training or adaptation to target homes. Our TDOST approach, initially developed for HAR, can also be extended for other applications like routine assessment and sub-activity classification.
Generative Models for Protein Sequence/Structure Design (04/04/2025)
Presenter: Tony Tu
Recent advances in generative modeling have opened new frontiers in protein design, enabling the creation of novel protein sequences and structures with atomic-level resolution. In this talk, I will present an overview of emerging generative frameworks across the protein design landscape, focusing on three key models: RFdiffusion, NOS, and Chroma. RFdiffusion represents a powerful approach for backbone generation, using a structure-conditioned diffusion process to produce novel protein folds or scaffold functional motifs. NOS introduces a discrete diffusion model directly operating in sequence space, offering an effective and flexible strategy for generating amino acid sequences directly, without requiring structural input. Finally, Chroma integrates both modalities to perform sequence-structure co-design to unify sequence and geometry generation within a single framework. Together, these models illustrate the growing potential of generative methods to transform how we design proteins—from scaffolding catalytic sites to generating entirely new folds. I’ll discuss the underlying methodologies and presents cases & results, and highlight how these tools are reshaping the future of computational protein engineering.
Hierarchical and Hardware-Aware Optimization for AI Model Efficiency: From Bits to Modules to Models (03/28/2025)
Presenter: Yonggan Fu
Despite the remarkable advancements of AI foundation models, such as large language models (LLMs), in numerous tasks and applications, deploying these powerful models on everyday devices remains challenging due to their growing computational and memory demands. This challenge hinders the realization of immersive and interactive user experiences that require real-time AI processing on resource-constrained devices.
Our research aims to bridge this gap by performing hierarchical and hardware-aware optimization of AI models, maximizing accuracy-efficiency trade-offs to enable ubiquitous edge intelligence. Specifically, our research addresses redundancy at the bit, module, and model levels and leverages hardware characteristics to achieve real-device speed-ups. The proposed techniques include cyclic precision training (ICLR'21 Spotlight) for efficient and accurate bit-level quantization, AmoebaLLM (NeurIPS'24) for delivering real-hardware-efficient LLMs through module-level optimization, and a new language model architecture, Hymba (ICLR'25 Spotlight), for efficient language processing. These techniques collectively enable real-time execution of complex AI models on everyday devices, advancing the development of efficient AI solutions for ubiquitous edge intelligence.
Towards Multi-Domain Generalization for Subband Audio Source Separation (02/28/2025)
Presenter: Karn Watcharasupat
Audio source separation is the task of extracting one or more constituent components, or composites thereof, from their mixture. Creatively produced audio signals, such as music and cinematic audio, present a unique challenge for source separation algorithms due to the sheer diversity of potential sound sources within a particular mixture.
However, most state-of-the-art deep learning systems for source separation have often been either a collection of single-source separators or a tightly coupled system that cannot be easily adapted to support additional or unseen sound sources. In this webinar, we will present our series of works on psychoacoustically-motivated subband source separation for music and cinematic audio, working towards a more flexible, extensible, and controllable source separation system that can still maintain the high fidelity requirements demanded by creative audio practices.
Universal Parameter-Free Methods for Convex, Nonconvex, and Stochastic Optimization (02/07/2025)
Presenter: Tianjiao Li
First-order methods are widely used to tackle data science and machine learning problems with complex structures, such as nonconvexity, nonsmoothness, and stochasticity. However, in many real-world scenarios, the problem structure and parameters can be unknown or ambiguous, creating challenges for algorithm design and stepsize selections.
In this talk, I will present novel parameter-free methods that are universal in solving different classes of optimization problems without requiring prior knowledge of the problem parameters or resorting to any line search or backtracking procedures. In the first part of the talk, we focus on convex optimization and propose a uniformly optimal method for smooth, weakly smooth, and nonsmooth problems. In the second part of the talk, we consider smooth but possibly nonconvex optimization, and propose a novel parameter-free projected gradient method with the best-known unified complexity for convex and nonconvex problems. We then generalize the method to the stochastic setting, achieving universal complexity bounds that are nearly optimal for both convex and nonconvex problems. The advantages of the proposed methods are demonstrated by encouraging numerical results.
Reading with Intent (11/08/2024)
Presenter: Benjamin Reichman
Retrieval augmented generation (RAG) systems augment how knowledgeable language models are by integrating external information sources. In academic research it is common to train and assess RAG systems using Wikipedia as the retrieval corpora. However, in industry such systems are also exposed to the open internet. Wikipedia is written in a neutral, matter-of-fact tone whereas the open internet is written with many different tones and emotions. Understanding communication on the internet requires an understanding of both the surface-level textual content and the connotative intent of the writing on the internet. This presentation focuses on datasets and methods to overcome the challenge of reading text that can come in different tones with a focus on reading sarcastic text.
Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups (11/01/2024)
Presenter: Yuchen Zhu
The generative modeling of data on manifold is an important task, for which diffusion models in flat spaces typically need nontrivial adaptations. This article demonstrates how a technique called `trivialization' can transfer the effectiveness of diffusion models in Euclidean spaces to Lie groups. In particular, an auxiliary momentum variable was algorithmically introduced to help transport the position variable between data distribution and a fixed, easy-to-sample distribution. Normally, this would incur further difficulty for manifold data because momentum lives in a space that changes with the position. However, our trivialization technique creates a new momentum variable that stays in a simple fixed vector space. This design, together with a manifold preserving integrator, simplifies implementation and avoids inaccuracies created by approximations such as projections to tangent space and manifold, which were typically used in prior work, hence facilitating generation with high-fidelity and efficiency.
How (Not) to Train a Neural Network (10/11/2024)
Presenter: Nitya Mani (Jane Street)
Machine learning and model-driven research in an applied setting often goes quite differently than in the classroom or a purely research environment. Inspired by a variety of real life examples from finance and beyond, we'll discuss some ways deep learning models can go off the rails in surprising ways.
Leveraging Lexical Knowledge for Generalizable Language Style Understanding with LLMs (10/04/2024)
Presenter: Ruohao Guo
Language style is often used by writers to convey their intentions, identities, and mastery of language. However, current large language models struggle to capture some language styles without fine-tuning. To address this challenge, we investigate whether LLMs can be meta-trained based on representative lexicons to recognize new styles that they have not been fine-tuned on. Experiments on 13 established style classification tasks, as well as 63 novel tasks generated using LLMs, demonstrate that meta-training with style lexicons consistently improves zero-shot transfer across styles.
Task Shift: Classification to Regression via Benign Overfitting (09/20/2024)
Presenter: Tyler LaBonte
Modern machine learning methods have demonstrated remarkable capability to generalize under task shift, where the model is used for a different task than it was trained on without any modification to the weights. For example, large language models can perform retrieval and reasoning tasks in-context having only been trained on a fill-in-the-blank classification task. We investigate this phenomenon in an overparameterized linear regression setting where the task shifts from classification during training to regression during evaluation. We prove upper and lower bounds on task shift error based on a surprising connection to benign overfitting, and we propose simple postprocessing methods which theoretically and empirically guarantee generalization.
Subspace Tracking for Radar Data (09/06/2024)
Presenter: Alex Saad-Falcon
The recent advent of large, element-level digital antenna arrays has necessitated the use of compression. The simplest kind of compression is linear, where the data is projected onto a (possibly random) subspace. Given non-stationary radar environments, the optimal linear subspace must be tracked online from compressive measurements. In this talk, we show some basic subspace tracking algorithms applied to streaming radar data and demonstrate significant reduction in data throughput via online compression and tracking. We also briefly discuss a Kalman-like smoothing using a simple dynamical model for subspace motion.
Deep Learning Re-imagined: the Future is Hopfield (11/10/2022)
Presenter: Ben Hoover
Once a pillar of AI research, the biologically-inspired Hopfield Network has fallen in popularity compared to modern Deep Learning alternatives. We have therefore lost many desirable guarantees that come with understanding deep learning as Hopfield Networks or “associative memories” — energy-based dynamical systems that converge to a fixed point. In this talk, we propose a universal abstraction that allows us to build Hierarchical Hopfield Networks using familiar Deep Learning operations (e.g., convolutions, attention, normalizations). These novel architectures, which we call “Hierarchical Associative Memories” (HAMs) preserve the biological plausibility of the original Hopfield Network while incorporating the expressibility and power of modern architectures. We also release an accompanying ML framework written in JAX to help assemble HAMs using energy-based building blocks. We believe that our software lays the groundwork for a new class of AI frameworks built around meaningful energies and resulting in well-behaved and interpretable dynamical systems.
Towards Self-supervision by Learning the Data Manifold (04/22/2022)
Presenter: Kion Fallah
Self-supervised learning has recently made tremendous leaps in training transferable representations when labeled data is limited. Additionally, the manifold hypothesis suggests that high-dimensional, real-world data lies on or near a low-dimensional manifold, with different classes separated by low probability regions. In this talk, we discuss steps towards combining these ideas by learning the data manifold to perform self-supervision. First, we will briefly discuss transport operators, which model the non-linear path between two points with sparse inference from a dictionary of manifold operators. Then, we will spend the majority of the talk introducing a variational approach to quickly perform sparse inference. Finally, we will provide early approaches in incorporating this model into self-supervised learners.
Frameworks for Optimization on Smooth Manifolds (04/08/2022)
Presenter: Brighton Ancelin
Manifold structures are ubiquitous in machine learning problems. If a problem involves affine spaces, level sets, linear subspaces, orthonormal bases, etc. one can likely find a manifold structure interpretation. However, as a generalization of Euclidean space, viewing problems through the lens of manifold structures demands careful consideration and a few departures from the Euclidean optimization techniques to which we are accustomed. This talk aims to provide a high-level introduction to the fundamentals of optimization on smooth manifolds for an audience with a general background in optimization and linear algebra. Principles will be paired with simple examples to demonstrate use, and domains of applicability will be discussed.
Spectral Signed Graph Neural Networks (03/18/2022)
Presenter: Rahul Singh
Graph convolutional networks (GCNs) and its variants are designed for unsigned graphs containing only positive links. Many existing GCNs have been derived from the spectral domain analysis of signals lying over (unsigned) graphs and in each convolution layer they perform low pass filtering of the input features followed by a learnable linear transformation. Their extension to signed graphs with positive as well as negative links imposes multiple issues including computational irregularities and ambiguous frequency interpretation, making the design of computationally efficient low pass filters challenging. In this talk, I will discuss how to address these issues via spectral analysis of signed graphs and present our proposed signed GNN architectures.
Integrating Knowledge into Learning: A Neural to Symbolic Perspective (03/04/2022)
Presenter: Karan Samel
Knowledge graphs have been leveraged to explicitly retrieve factual information and integrated to support general queries (ex. googling facts). Recent efforts have been trying to integrate these explicit knowledge bases within large language models and have shown to improve performance in domain-specific NLP tasks. In this talk I will present knowledge integration from two perspectives. In the first perspective we look at the knowledge in its more symbolic form, by representing the knowledge explicitly and learning the low level knowledge entities that have to be parsed from raw data. The second direction explores fusing knowledge within language models to provide a neural representation, with a focus on integrating noisy knowledge. We analyze which settings benefit from models within this neuro-symbolic spectrum.
Provable Lifelong Learning of Representations (02/18/2022)
Presenter: Xinyuan Cao
In lifelong learning, the tasks to be learned arrive sequentially over time in arbitrary order. During training, knowledge from previous tasks can be captured and transferred to subsequent ones to improve sample efficiency. In this talk, I will discuss the recent work on a provably efficient lifelong learning algorithm. Considering the setting where all tasks share a small number of common features, our algorithm’s sample complexity improves significantly on existing bounds and matches a lower bound for the lifelong learning algorithms.
Towards large scale optimal transports (02/04/2022)
Presenter: Jiaojiao Fan
In optimal transport (OT) problems one aims to find a transport plan to move mass from a source distribution to a target distribution with minimum transport cost. When the cost is the square of Euclidean distance, the OT plan is associated with a unique optimal map, which is the gradient of a convex function. In recent years, a trend is to parameterize this map as a neural network (e.g. the gradient of a input convex neural network) to solve large scale OT problems. This leads to many algorithms that can be applied to high dimensional problems such as image generative model, Bayesian inference. In this talk, I will discuss several algorithms we developed, including Scalable computation of Wasserstein barycenter and Variational Wasserstein gradient flows. The first algorithm seeks the "average" of multiple distributions and can be used to aggregate the posterior samples of Bayesian inference. The second algorithm solves the optimization problems over probability space by utilizing the JKO scheme and can be used to model the gradient flow in image pixel space.
A Lyapunov Theory for Finite-Sample Guarantees of Markovian Stochastic Approximation Algorithms (11/12/2021)
Presenter: Zaiwei Chen
This paper develops an unified framework to study finite-sample convergence guarantees of a large class of value-based asynchronous reinforcement learning (RL) algorithms. We do this by first reformulating the RL algorithms as \textit{Markovian Stochastic Approximation} (SA) algorithms to solve fixed-point equations. We then develop a Lyapunov analysis and derive mean-square error bounds on the convergence of the Markovian SA. Based on this result, we establish finite-sample mean-square convergence bounds for asynchronous RL algorithms such as Q-learning, n-step TD, TD(\lambda), and off-policy TD algorithms including V-trace. As a by-product, by analyzing the convergence bounds of n-step TD and TD(\lambda), we provide theoretical insights into the bias-variance trade-off, i.e., efficiency of bootstrapping in RL. This was first posed as an open problem in (Sutton 1999).
Optimal Control Theoretic Neural Optimizer (10/29/2021)
Presenter: Guan-Horng Liu
Designing an effective optimization process for Deep Neural Networks stands as one of the most crucial problems in modern machine learning. Despite the community has developed recipes related to the organization of practical DNN training, the overall optimization procedure lacks solid theoretical understanding and remains fairly ad-hoc. In this talk, we will present a new paradigm of Deep Learning optimization grounded on Optimal Control Theory (i.e. Model-based RL). We first show that most DNN training methods (e.g. SGD, Adam) belong to a subclass of Optimal Control algorithms (ICLR 2021 spotlight). This connection facilitates robust game-theoretic training (ICML 2021 oral) and achieves superior convergence with architecture co-optimization (NeurIPS 2021 spotlight). Finally, we will discuss our latest finding on unifying deep generative models and optimal transport models with Stochastic Optimal Control methodology. These strengthen the Optimal Control perspective as a principled tool of analyzing optimization in deep learning.
On the intersection of network neuroscience and deep learning (10/15/2021)
Presenter: Shreyas Patil
What we know from neuroscience (“connectomics”) is that the brain is, overall, a very sparse network with relatively small locally dense clusters of neurons. These topological properties are crucial for the brain’s ability to perform efficiently, robustly, and to process information in a hierarchically modular manner. On the other hand, the artificial neural networks we use today are very dense, or even fully connected, at least between successive layers. Additionally, it is well known that deep neural networks are highly over-parameterized: pruning studies have shown that it is often possible to eliminate 90% of the connections (weights) without significant loss in performance. Pruning, however, is typically performed after the dense network has been trained, which only improves the run-time efficiency of the inference process. The previous points suggest that we need methods to design sparse neural networks, without any training, that can perform almost as well as the corresponding dense networks after training. This talk will first provide some background in the pruning literature, either after training or before training. Then, we will present a recently proposed (ICML 2021) method called PHEW (Paths with Higher Edge Weights) which creates sparse neural networks, before training, and that can learn fast and generalize well. Additionally, PHEW does not require access to any data as it only depends on the initial weights and the topology of the given network architecture.
Learning the Data Manifold for Identity-Preserving Transformations (09/17/2021)
Presenter: Kion Fallah
Many machine learning techniques incorporate identity-preserving transformations into their models to generalize their performance to previously unseen data. These transformations are typically selected from a set of functions that are known to maintain the identity of an input when applied (e.g., cropping, flipping, and rotation). However, there are many natural variations that cannot be labeled for supervision or defined through examination of the data. As suggested by the manifold hypothesis, many of these natural variations live on or near a low-dimensional, nonlinear manifold. In this talk we will discuss our recent work in manifold learning that allows us to train Lie group operators to learn natural variations in a dataset without transformation labels.
Computer Vision in Multi-Angle Polarimetric Imagery (09/03/2021)
Presenter: Sean Foley
There are many fundamental differences between imagery in standard CV datasets and climate satellite imagery. These differences pose problems for standard approaches. I'll explain the properties of multi-angle, polarimetric sensors, introduce a dataset I developed containing such data, discuss some preliminary experimental results, and comment on potential future approaches.
Dynamic Prescriptive Analytics for Logistics Service Providers (04/23/2021)
Presenter: Jana Boerger
With the strong growth of e-commerce and the increase of online-shoppers on not only Amazon but also through small Shopify shops and through grocery delivery apps such as Instacart, comes a demand for more efficient logistics infrastructure to support the development. Third party logistics service providers (3PLs) and other logistics infrastructure operators have been struggling to keep up with the new demands. They bring forward capacity management and resource management as their main pain points in current logistics businesses. These 3PLs need to improve their offerings on multiple levels. I will first introduce a framework for logistics providers to address capacity management and then explore three different levels of the supply chain to apply the framework. First we will study the operational level of warehouses and will consider picking and slotting processes. Then, we analyse demand and supply decision making for a single warehouse of a cold chain logistics provider. Finally, we will move towards the network level and address the decision of deploying clients across a logistics networks.
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm (03/26/2021)
Presenter: Sajad Khodadadian
In this talk, we provide finite-sample convergence guarantees for an off-policy variant of the natural actor-critic (NAC) algorithm based on Importance Sampling. In particular, we show that the algorithm converges to a global optimal policy with a sample complexity of $\mathcal{O}(\epsilon^{-3}\log^2(1/\epsilon))$ under an appropriate choice of stepsizes. In order to overcome the issue of large variance due to Importance Sampling, we propose the $Q$-trace algorithm for the critic, which is inspired by the V-trace algorithm. This enables us to explicitly control the bias and variance, and characterizes the trade-off between them. As an advantage of off-policy sampling, a major feature of our result is that we do not need any additional assumptions, beyond the ergodicity of the Markov chain induced by the behavior policy.
PHEW! Constructing Sparse Networks that Learn Fast and Generalize Well Without Training Data (03/12/2021)
Presenter: Shreyas Patil
Pruning methods are going through a “renaissance” in the recent literature. Methods that sparsify a network at initialization, in particular, are important in practice because they greatly improve the efficiency of both learning and inference. Our work is based on a recently proposed decomposition of the Neural Tangent Kernel (NTK) that has decoupled the dynamics of the training process into a data-dependent component and an architecture-dependent kernel – the latter referred to as Path Kernel. That work has shown how to design sparse neural networks for faster convergence, without any training data, using the Synflow-L2 algorithm. Here, we first show that even though Synflow-L2 is optimal in terms of convergence, for a given network density, it results in sub-networks with "bottleneck’' (narrow) layers – leading to poor performance as compared to other data-agnostic methods that use the same number of parameters. Then, we propose a new method to construct sparse networks, without any training data, referred to as Paths with Higher-Edge Weights (PHEW). PHEW is a probabilistic network formation method based on biased random walks that only depends on the initial weights. It has a similar path kernel trace as Synflow-L2 but it generates much wider layers, resulting in better performance. We empirically evaluate the effectiveness of PHEW against pruning-before-training methods and show PHEW achieves significant improvements over the state-of-the-art methods in a wide range of network densities.
Optimal Transport for Aligning Generative Model Latent Spaces (02/26/2021)
Presenter: Kion Fallah
There are many cases where training separate generative models on data is necessary. Some applications in multi-modal datasets, domain adaptation, privacy, and security all warrant separate networks. In these cases, it is often desirable to find correspondences between latent points for some other downstream task, such as classification or image-to-image translation. After a quick primer on computational optimal transport, in this talk we discuss how to apply recent developments for aligning two latent spaces of generative models. Specifically, we aim to find a unitary transformation that minimizes a hierarchical Wasserstein metric for alignment. This metric utilizes geometric insights from clusters inherent in the latent spaces to find a global alignment.
Density of States Estimation for Out-of-Distribution Detection (02/12/2021)
Presenter: Cusuh Ham
Perhaps surprisingly, recent studies have shown probabilistic model likelihoods have poor specificity for out-of-distribution (OOD) detection and often assign higher likelihoods to OOD data than in-distribution data. To ameliorate this issue we propose DoSE, the density of states estimator. Drawing on the statistical physics notion of "density of states,'" the DoSE decision rule avoids direct comparison of model probabilities, and instead utilizes the ``probability of the model probability,'' or indeed the frequency of any reasonable statistic. The frequency is calculated using nonparametric density estimators (e.g., KDE and one-class SVM) which measure the typicality of various model statistics given the training data and from which we can flag test points with low typicality as anomalous. Unlike many other methods, DoSE requires neither labeled data nor OOD examples. DoSE is modular and can be trivially applied to any existing, trained model. We demonstrate DoSE's state-of-the-art performance against other unsupervised OOD detectors on previously established "hard" benchmarks.
A large-scale neural network training framework for generalized estimation of single-trial population dynamics (10/30/2020)
Presenter: Andrew Sedler
Large-scale recordings of neural activity are becoming ubiquitous, providing new opportunities to study network-level dynamics in diverse brain areas and during increasingly complex, natural behaviors. However, the sheer volume of data and its dynamical complexity are critical barriers to uncovering and interpreting these dynamics. Deep learning methods are a particularly promising approach due to their ability to uncover meaningful relationships from large, complex, and noisy datasets. One such method, latent factor analysis via dynamical systems (LFADS), uses recurrent neural networks to infer latent dynamics from high-D neural spiking data. When applied to motor cortical (M1) activity during stereotyped behaviors, LFADS substantially improved the ability to uncover dynamics and their relation to subjects’ behaviors on a moment-by-moment, millisecond timescale. However, applying LFADS to less-structured behaviors, or in brain areas that are not predominantly driven by intrinsic dynamics, is far more challenging. This is because LFADS, like many deep learning methods, requires careful hand-tuning of complex model hyperparameters (HPs) to achieve good performance. Here we demonstrate AutoLFADS, a large-scale, automated model tuning framework that can characterize dynamics in diverse brain areas without regard to behavior. AutoLFADS uses distributed computing to train dozens of models simultaneously while using evolutionary algorithms to optimally tune HPs in a completely unsupervised way. AutoLFADS required 10-fold less data to uncover dynamics from macaque M1/PMd, with better generalization to unseen behavioral conditions than previous LFADS models. We then tested data from the somatosensory and dorsomedial frontal cortices, areas with very different dynamics from M1/PMd. AutoLFADS produced precise estimates of population dynamics without any prior knowledge of the areas’ dynamics, tasks, or subjects’ behaviors, outperforming any individually-trained LFADS model obtained through random HP searches.
Convergence rates of asynchronous stochastic approximation algorithms with applications in RL (10/02/2020)
Presenter: Zaiwei Chen
Stochastic Approximation (SA) is a popular approach for solving fixed point equations where the information is corrupted by noise. In this talk, we consider an SA involving a contraction mapping with respect to an arbitrary norm, and show its finite-sample error bounds while using different step sizes. The idea is to construct a smooth Lyapunov function using the Generalized Moreau Envelope, and show that the iterates of SA have negative drift with respect to that Lyapunov function. Our result is applicable in Reinforcement Learning (RL). In particular, we use it to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-learning, and recover the existing state-of-the-art result on Q-learning. Importantly, our construction results in only a logarithmic dependence of the convergence bound on the size of the state-space.
Forecasting Global Geomagnetic Conditions using RNNs (09/18/2020)
Presenter: Charles Topliff
Historically, the solar wind and interplanetary magnetic field (IMF) measurements gathered by the ACE and WIND satellites have driven the study and prediction of geomagnetic activity indices. Results have demonstrated the ability to forecast these indices with high correlation a few hours in advance. In this work, we seek to simultaneously forecast 4 proxies for magnetic activity around Earth. Specifically, we employ Long Short-Term Memory neural networks to forecast the Auroral Electrojet indices (AE, AU, and AL), as well as the Disturbance Storm-Time index from several hours to several days in advance. We also investigate the inclusion of solar wind forecasts such as the WSA-Enlil Solar Wind Forecast in order to drive out the effective lead time of these geomagnetic index forecasts.
Automatic Differentiation Variational Inference with Mixtures (02/28/2020)
Presenter: Cusuh Ham
Automatic Differentiation Variational Inference (ADVI) is a useful tool for efficiently learning probabilistic models in machine learning. Generally approximate posteriors learned by ADVI are forced to be unimodal in order to facilitate use of the reparameterization trick. In this work, we show how stratified sampling may be used to enable mixture distributions as the approximate posterior, and derive a new lower bound on the evidence analogous to the importance weighted autoencoder (IWAE). We show that this "SIWAE'' is a tighter bound than both IWAE and the traditional ELBO, both of which are special instances of this bound. We verify empirically that the traditional ELBO objective disfavors the presence of multimodal posterior distributions and may therefore not be able to fully capture structure in the latent space. Our experiments show that using the SIWAE objective allows the encoder to learn more complex distributions which regularly contain multimodality, resulting in higher accuracy and better calibration in the presence of incomplete, limited, or corrupted data.
Continual Learning with Limited Supervision (02/28/2020)
Presenter: James Smith
Recent progress towards lifelong learning agents includes learning with limited labeled data (i.e. semi-supervised learning (SSL)) and incrementally learning classes of objects without forgetting previous classes (i.e. incremental continual learning (CL)). Yet, little effort has been made to design and evaluate models which reconcile the relationship between leveraging large amounts of unlabeled data and learning to recognize new object classes in an incremental manner. A learning system which could handle a non-stationary data distribution, while also leveraging cheap, primarily unlabeled data, could have a tremendous impact for scenarios involving fast, on-device learning. In this talk, I will survey the field of CL (including my past and current work) with an emphasis on limited supervision and its broader impact.
Spatiotemporal soft attention with pose and object interaction for video-based action classification (02/14/2020)
Presenter: Dan Scarafoni
Activity recognition systems frequently need to differentiate between fine-grained actions that only differ subtly, which remains a substantial challenge for even the most advanced state-of-the-art models. We introduce a new approach– the Three-Dimensional Spatiotemporal Attention Mechanisms (3DSAM)–for simultaneously capturing both the subtle action differences and global context of fine-grained activities. In doing so, we provide the basis for data-driven enrichment of activity recognition models, which we demonstrate in this paper. By deploying a localized soft attention mechanism on automatically derived pose and object interaction information, our approach is able to effectively capture those subtle action differences that render fine-grained activity recognition challenging. We combine this localized information with global spatiotemporal representations as captured by 3D convolutional neural networks (CNN), arguably the state-of-the-art in coarse grained activity recognition. We evaluate our approach on two challenging datasets for fine-grained activity recognition: the IKEA Furniture Assembly Dataset and MSR-DailyActivities3D Dataset. We demonstrate that our approach is able to improve upon state-of-the art activity recognition approaches in both domains in a statistically significant manner.
Discussion on Modern Learning (01/31/2020)
Presenter: Nathan Somavarapu
Before the “Deep Learning Revolution” much of the applied learning work could be theoretically justified from a learning theory point of view. With the expressivity of Artificial Neural Networks (ANN) comes a number of benefits, but a number of theoretical disadvantages. From a theoretical standpoint there is no reason why ANNs generalize, but miraculously they still seem to. In this discussion I will be giving an overview of the issues that ANNs introduce to the learning theoretic story and going into a specific paper, Uniform convergence may be unable to explain generalization in deep learning.
ML in Gaming (01/31/2020)
Presenter: Sean Foley
Twenty-three years ago, a computer beat the world champion at chess for the first time. The world watched Deep Blue’s narrow victory over Kasparov, and many believed that a new era had begun: one in which computers would dominate humans in all strategy games. Yet it wasn’t until almost two decades later that a computer finally beat the world Go champion. The deep learning revolution has allowed us to tackle much more difficult games - real-time strategy games, which have unimaginably large state spaces and imperfect knowledge. In this ML@GT seminar, we’ll briefly discuss the history of game-playing agents, examine recent research in the area, and speculate on the future of ML for gaming.
Faculty Lightning Talks! (12/06/2019)
For the last seminar of this semester we’re excited to announce faculty lightning talks and we want to invite the general ML@GT community. We will have five faculty presenting lightning talks of ten minutes each:
Jake Abernethy (CS)
Lelia Glass (Modern Languages)
Judy Hoffman (IC)
Mohit Singh (ISyE)
Diyi Yang (IC)
Differentiable Rendering (11/25/2019)
Presenter: Amit Raj (IC)
Abstract: Rendering has been used to generate 2D data from 3D scenes by simulating the physical process of image formation which is not traditionally differentiable. Inverting the rendering step by differentiating through it helps in reasoning about and learning 3D information like object geometry, scene geometry and lighting from “in-the-wild” images. This is useful in a number of applications like AR/VR and content authoring for 3D artists . Additionally, it can be used to augment datasets to aid a downstream learning system. In my talk, I will briefly discuss the landscape of literature in differentiable rendering and touch upon my work on its use in texture synthesis and geometry understanding
Compressed Sensing of Multiband Signals on the Continuum (11/08/2019)
Presenter: Cole DeLude (ECE)
Compressed sensing has been celebrated for its ability to reconstruct signals from what would classically be considered an inconceivably small number of observations. The caveat being that said reconstruction hinges on the signal of interest being sparse or approximately sparse in some dictionary. This talk will be a discussion on how to perform compressed sensing when the signal is supported on a finite union of frequency intervals (i.e. multiband). Namely, we will discuss Discrete Prolate Spheroidal Sequences, sparse dictionary construction, and reconstruction algorithms.
The Evolution of GANs (10/25/2019)
Presenter: Jihui Jin (ECE)
Generative Adversarial Networks, or GANs, have been able to achieve many new state-of-the-art results since its first introduction in 2014. The original GANs framework introduces a second network to compete against the first, leading to an adversarial training setting. This formulation has evolved since then allowing for more powerful combinations and interactions between different networks. This talk will be a high level overview of major concepts and changes throughout the "history" of GANs.
Applications of Learning Counterfactual Decision-Making Models (10/11/2019)
Presenter: Hang Wu (BME)
Counterfactual inference studies 'what would have happened if we had taken another option in the past'. It can help provide reliable decision support by predicting the outcomes of alternative actions. To apply it to real-world scenarios, however, we are faced with challenges such as the unavailability of the ground-truths, the safety concerns, and biases in training data.
In this talk, I’ll present some of my recent works on applying counterfactual inference towards addressing some of the challenges. We will first show 1) how we can build more robust clinical decision support systems and 2) how we can mitigate the filter bubble issue resulted from the biased feedback loops of recommendation systems.
Cooperative neural networks (CoNN): Exploiting prior independence structure for improved classification (09/27/2019)
Presenter: Harsh Shrivastava (CSE)
We propose a new approach, called cooperative neural networks (CoNN), which uses a set of cooperatively trained neural networks to capture latent representa-tions that exploit prior given independence structure. The model is more flexible than traditional graphical models based on exponential family distributions, but incorporates more domain specific prior structure than traditional deep networks or variational autoencoders. The framework is very general and can be used to exploit the independence structure of any graphical model. We illustrate the technique by showing that we can transfer the independence structure of the popular Latent Dirichlet Allocation (LDA) model to a cooperative neural network, CoNN-sLDA. Empirical evaluation of CoNN-sLDA on supervised text classification tasks demonstrates that the theoretical advantages of prior independence structure can be realized in practice - we demonstrate a 23% reduction in error on the challenging MultiSent data set compared to state-of-the-art.
Locally accelerated conditional gradients (09/13/2019)
Presenter: Alejandro Carderera (ISyE)
Conditional gradient methods form a class of projection-free first-order algorithms for solvingsmooth convex optimization problems. Apart from eschewing projections, these methods areattractive because of their simplicity, numerical performance, and the sparsity of the solutionsoutputted. However, they do not achieve optimal convergence rates for smooth convexand strongly convex functions. We present the Locally Accelerated Conditional Gradientsalgorithm that relaxes the projection-freeness requirement to only require projection onto(typically low-dimensional) simplices and mixes accelerated steps with conditional gradientsteps to achieve local acceleration. We derive asymptotically optimal convergence ratesfor this algorithm. Our experimental results demonstrate the practicality of our approach;in particular, the speedup is achieved both in wall-clock time and per-iteration progresscompared to standard conditional gradient methods and a Catalyst-accelerated Away-StepFrank-Wolfe algorithm.
Curriculum Learning for Bipedal Locomotion, and Other Stories (08/30/2019)
Presenter: Nathan Hatch
This talk is an overview of my past and current research projects at Georgia Tech. First: a theoretical paper on a gradient-based algorithm for meta-learning. Second: an attempt to control bipedal robots walking over rough terrain. The approach uses a combination of hand-engineered controllers, linear regression, and curriculum learning. Third: an introduction to the AutoRally platform for high-speed autonomous off-road driving.