Can we build neural architectures that go beyond Transformers by leveraging principles from dynamical systems?
We introduce a novel approach to sequence modeling that draws inspiration from the emerging paradigm of online control to achieve efficient long-range memory, fast inference, and provable robustness.
At the core of this approach is a new method for learning linear dynamical systems through spectral filtering. This method eliminates the need for learned convolutional filters, remains invariant to system dimensionality, and offers strong theoretical guarantees — all while achieving state-of-the-art performance on long-range sequence tasks. Our research spans efficient generation/inference (FutureFill algorithm), length generalizaion, and advances in spectral filtering.
Universal Sequence Prediction - Annie Marsden, Elad Hazan
Provable Distillation for Linear Dynamical Systems - Devan Shah, Shlomo Fortgang, Sofiia Druchyna, Elad Hazan
Provable Length Generalization in Sequence Prediction via Spectral Filtering - Annie Marsden, Evan Dogariu, Naman Agarwal, Xinyi Chen, Daniel Suo, Elad Hazan
FutureFill: Fast Generation from Convolutional Sequence Models - Naman Agarwal, Xinyi Chen, Evan Dogariu, Vlad Feinberg, Daniel Suo, Peter Bartlett, Elad Hazan
Flash STU: Fast Spectral Transform Units - Y. Isabel Liu, Windsor Nguyen, Yagiz Devre, Evan Dogariu, Anirudha Majumdar, Elad Hazan
Spectral State Space Models - Naman Agarwal, Daniel Suo, Xinyi Chen, Elad Hazan
Learning Linear Dynamical Systems via Spectral Filtering - Elad Hazan, Karan Singh, Cyril Zhang
Spectral Filtering for General Linear Dynamical Systems - Elad Hazan, Holden Lee, Karan Singh, Cyril Zhang, Yi Zhang
Introduction to Online Control (chapter 13) - Elad Hazan, Karan Singh
Code for flash STU in PyTorch: https://github.com/windsornguyen/flash-stu/
JAX Code for SSSM (non-hybrid, outdated!!): https://github.com/google-deepmind/spectral_ssm