Please join us for the inaugural NJ Workshop on Machine Learning and Information Theory, to be held in the Central King Building (CKB) Room 303 on May 1, 2026 at the New Jersey Institute of Technology (NJIT). This one-day event will bring together researchers from Princeton, Rutgers and NJIT to identify emerging challenges and create a strong regional community in machine learning and information theory. The event features a blend of invited talks and a dynamic poster symposium. We strongly encourage students and postdocs to present posters of their work to disseminate new ideas in an informal setting.
All attendees: Registration is free! For planning purposes, please register for the workshop here by April 15: (registration link)
Transportation logistics:
Arriving by train: From Newark Penn Station (NJ Transit), you can take the Newark Light Rail to the Warren Street / NJIT station, which is right next to the campus. From Newark Broad Street Station (NJ Transit), campus is walkable within 15 minutes.
Arriving by car: Parking will be available at NJIT's Summit Street garage (154 Summit St., Newark, NJ 07102). To access the garage, first turn onto Warren Street, and then turn onto Colden Street and follow signs to NJIT parking.
Abstract: Statisticians often work in settings with limited labeled data and abundant unlabeled data. During training, they may even have access to privileged information (some labeled, some not) that won't be available once the model is deployed. When can incorporating this privileged information improve performance? A standard two-stage approach trains a rich-view model on the privileged information, uses it to generate pseudo-labels on unlabeled data, and then trains a deployment model on the combined set of true and pseudo-labels. When the privileged information provides only weak or noisy signal, this pipeline propagates errors from the rich-view model into the deployment model and can perform worse than ignoring the privileged information entirely. We propose a coupled training framework that jointly learns the rich-view and deployment models through an alternating procedure. Each update to the deployment model calibrates the next round of pseudo-labels, and those refined pseudo-labels in turn guide the deployment model, allowing it to benefit from privileged information adaptively.
Abstract: Solving inverse problems, central to signal processing and machine learning, depends critically on the choice of prior. Over the past few decades, there has been a gradual shift toward increasingly complex priors. This talk presents data compression as a principled framework for constructing optimal priors that approach fundamental limits in recovery. Compression-based methods have already led to effective solutions across tasks such as compressive sensing, and coherent imaging, offering both strong theoretical guarantees and practical algorithms. However, their broader applicability has been limited by the lack of adaptive and expressive compression codes beyond well-studied domains such as images and video. Deep learning–based compression, often referred to as neural compression, addresses this limitation by enabling flexible, data-driven priors. This perspective leads to recovery algorithms that are both theoretically grounded and empirically competitive. As a case study, we discuss Zero-Shot Neural Compression Denoising (ZS-NCD), which uses untrained neural compression models to denoise a single noisy observation, achieving state-of-the-art performance. Overall, this work highlights neural compression as a bridge between information theory and modern learning, providing a foundation for efficient and principled inference across a broad range of inverse problems.
Abstract: Imagine a network with malicious Pac-Man nodes that silently “eat” random walks passing through them. This can cripple Random Walk-based decentralized learning by driving the walk population to extinction, completely halting the learning progress. We present self-creating random walks, a simple fully decentralized mechanism that regenerates walks using only local timing information. Our approach prevents extinction, maintains a stable walk population, and preserves learning without centralized coordination. We provide theoretical guarantees on stability and convergence under attack, and show empirical robustness and fast recovery.
Abstract: We present an information-theoretic analysis of remote document retrieval over unreliable channels. By adaptively encoding query features according to semantic importance, we characterize retrieval error via multivariate Gaussian convergence and derive computable upper bounds. As a practical countermeasure, we propose semantically adaptive repetition, where tokens are repeated proportionally to their semantic importance, and show that this scheme substantially enhances retrieval reliability for both TF-IDF and embedding-based pipelines evaluated on Google NQ. We further illustrate the broader impact of semantic fragility through video transmission experiments, where a video description is transmitted over a token erasure channel and the video is reconstructed via a text-to-video generator. While the system shows surprising robustness against i.i.d. token erasures due to the strong generative prior of the decoder, targeted erasure of semantically important tokens leads to measurable degradation in reconstruction quality. We show that semantically adaptive repetition has the potential to mitigate such a degradation.
Abstract: While artificial neural networks are undeniably excellent information processing systems, their standard formulation as deterministic point-to-point mappings obscures what they actually do with information. By introducing a probabilistic representation space -- no more complex than that of a variational autoencoder (VAE) -- we can control, quantify and characterize all information passing through the space. In this talk, we'll insert multiple such spaces at key locations in a model to construct communication networks whose optimization reveals where information resides in the data, shapes the nature of information processing, and provides interpretable points for intervention. This perspective turns information flow into a concrete object of analysis and design.
A Theoretical Analysis of Mamba's Training Dynamics -- Mugunthan Shandirasegaran (NJIT)
Theoretical Analysis of Contrastive Learning under Imbalanced Data: From Training Dynamics to a Pruning Solution -- Haixu Liao (NJIT)
Multi-Stream Change-Point Detection -- Fegor Uwuseba (NJIT)
Age-Stability Trade-off in Remote Monitoring Systems -- Nitya Sathyavageeswaran (Rutgers)
From Simulation to Reality: Safe and Robust Decision making in Uncertain environment -- Sourav Ganguly (NJIT)
The Safety Knight: Fixed-Penalty Constraint Optimization for Safer Language Models -- Kartik Pandit (NJIT)
Optimizing Server Placement for Vertical Federated Learning in Dynamic Edge/Fog Networks -- (Henry) Su Wang (Princeton)
Interpreting Large Language Model Decisions under Information Constraints: A Rational Inattention Perspective -- Yuan Zhao (NJIT)
Theoretical and algorithmic analysis of ill-posed inverse problem: a case study in compressive coherent imaging -- Xi Chen (Rutgers)
Low Separation Rank Ridge Regression for Matrix Covariates -- Lakshitha Ramanayake (Rutgers)
Robust Peak-cost Constrained Reinforcement Learning -- Shilpa Mukhopadhyay (NJIT)
Bayesian Despeckling of Structured Sources -- Ali Zafari (Rutgers)
Perfect Privacy and Strong Stationary Times for Markovian Sources -- Zonghong Liu (Rutgers)
Self-Creating Random Walks for Decentralized Learning under Pac-Man Attacks -- Rohit Bhagat (Rutgers)
Digital Forgery Detection, Localization, and Explanation with Multimodal AI -- Mehry Rezaei (NJIT)
Graph Property Inference Attacks on Erdős-Rényi and Stochastic Block Model Graphs via Shadow Graph Training -- Eralp Erol (NJIT)
Measuring training variability from stochastic optimization using robust nonparametric testing -- Anand Sarwate (Rutgers)
Learning to Help in Hybrid AI System -- Yu Wu (Rutgers)
Minimax Data Sanitization with Distortion Constraint and Adversarial Inference -- Amirarsalan Moatazedian (NJIT)
A Low-Complexity Speech Codec Using Parametric Dithering for ASR -- Ellison Murray (Rutgers)
Differentially-Private Decentralized Learning in Heterogeneous Multicast Networks -- Amir Ziaeddini (NJIT)
Decentralized Adaptive Optimization with Kalman Filtering under Local Differential Privacy -- Yauhen Yakimenka (NJIT)
Organization committee: Arnob Ghosh (NJIT), Ani Sridhar (NJIT), Anand Sarwate (Rutgers)
We are grateful for valuable support from the following sponsors.