Accepted Papers
Spotlights
Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling
Raunaq Bhirangi, Chenyu Wang, Venkatesh Pattabiraman, Carmel Majidi, Abhinav Gupta, Tess Hellebrekers, Lerrel Pinto
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts
Qizhen Zhang, Nikolas Gritsch, Dwaraknath Gnaneshwar, Simon Guo, David Cairuz, Bharat Venkitesh, Jakob Nicolaus Foerster, Phil Blunsom, Sebastian Ruder, Ahmet Üstün, Acyr Locatelli
Probing the Decision Boundaries of In-context Learning in Large Language Models
Siyan Zhao, Tung Nguyen, Aditya Grover
Towards a theory of learning dynamics in deep state space models
Jakub Smekal, Jimmy T.H. Smith, Michael Kleinman, Dan Biderman, Scott Linderman
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Alexander Hägele, Elie Bakouch, Atli Kosson, Loubna Ben allal, Leandro Von Werra, Martin Jaggi
On Feature Learning in Structured State Space Models
Leena Chennuru Vankadara, Jin Xu, Moritz Haas, Volkan Cevher
Posters
Randomized Signatures for processing long-range Sequences on Graphs
Lukas Gruber, Bernhard Schäfl, Johannes Brandstetter, Sepp Hochreiter
Viewing Attention as a Recurrent Neural Network
Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Mohamed Osama Ahmed, Yoshua Bengio, Greg Mori
DynaGraph: Dynamic Contrastive Graph for Interpretable Multi-label Prediction using Time-Series EHR Data
Munib Mesinovic, Soheila Molaei, Peter Watkinson, Tingting Zhu
EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation
Yuqiao Wen, Behzad Shayegh, Chenyang Huang, Yanshuai Cao, Lili Mou
Delay Embedding Theory of Neural Sequence Models
Mitchell Ostrow, Adam Joseph Eisen, Ila R Fiete
xLSTM: Extended Long Short-Term Memory
Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael K Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter
Multi-Task Instruction Training of Text Diffusion Models
Changyou Chen, Gargi Balasubramaniam, Rui Meng, Han Zhao, Bunyamin Sisman, qingjun cui
KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty
Philipp Becker, Niklas Freymuth, Gerhard Neumann
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski, Michał Krutul, Jakub Krajewski, Szymon Antoniak, Piotr Miłoś, Marek Cygan, Sebastian Jaszczur
RotRNN: Modelling Long Sequences with Rotations
Rares Dolga, Kai Biegun, Harry Jake Cunningham, David Barber
Akhil Arora, Lars Henning Klein, Nearchos Potamitis, Roland Aydin, Caglar Gulcehre, Robert West
Rough Transformers: Lightweight Continuous-Time Sequence Modelling with Path Signatures
Fernando Moreno-Pino, Alvaro Arroyo, Harrison Waldon, Xiaowen Dong, Alvaro Cartea
Enhancing Transformer RNNs with Multiple Temporal Perspectives
Razvan-Gabriel Dumitru, Darius Peteleaza, Mihai Surdeanu
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin, Maximilian Beck, Korbinian Pöppel, Sepp Hochreiter, Johannes Brandstetter
Orthogonal residual connections for long-term memory retention in recurrent neural networks
Andrea Ceni, Claudio Gallicchio
LongSSM: On the Length Extension of State-space Models in Language Modelling
Shida Wang
Reservoir Structured State Space Models
Giuseppe Lombardi, Claudio Gallicchio, Andrea Ceni
When can transformers compositionally generalize in-context?
Seijin Kobayashi, Simon Schug, Yassir Akram, Florian Redhardt, Johannes Von Oswald, Razvan Pascanu, Guillaume Lajoie, Joao Sacramento
Latte: Latent Attention for Linear Time Transformers
Rares Dolga, Marius Cobzarenco, Ahmed H. Shahin, David Barber
Recurrent Action Transformer with Memory
Egor Cherepanov, Aleksei Staroverov, Dmitry Yudin, Alexey Kovalev, Aleksandr Panov
Reservoir Memory Networks: Long-range temporal dependencies with untrained RNNs
Claudio Gallicchio, Andrea Ceni
The Role of State Matrix Initialization in SSMs: A Perspective on the Approximation-Estimation Tradeoff
Fusheng Liu, Qianxiao Li
State soup: in-context skill learning, retrieval and mixing
Maciej Pióro, Maciej Wolczyk, Razvan Pascanu, Johannes Von Oswald, Joao Sacramento
Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information
Fedor Sergeev, Paola Malsot, Gunnar Ratsch, Vincent Fortuin
Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling
Harry Jake Cunningham, Giorgio Giannone, Mingtian Zhang, Marc Peter Deisenroth
Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang
Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe
Albert Q. Jiang, Alicja Ziarko, Bartosz Piotrowski, Wenda Li, Mateja Jamnik, Piotr Miłoś
Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis
Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre
Q-S5: Towards Quantized State Space Models
Steven Abreu, Jens Egholm Pedersen, Kade Heckel, Alessandro Pierro
Pretrained Hybrids with MAD Skills
Nicholas Roberts, Samuel Guo, Zhiqi Gao, Satya Sai Srinath Namburi GNVV, Sonia Cromp, Chengjun Wu, Chengyu Duan, Frederic Sala
Selective Attention: Enhancing Transformer through Principled Context Control
Xuechen Zhang, Xiangyu Chang, Mingchen Li, Amit Roy-Chowdhury, Jiasi Chen, Samet Oymak
State Space Models for Brain Computer Interfaces?
Pablo Soëtard, Miran Özdogan, Oiwi Parker Jones
SeRpEnt: Selective Resampling for Expressive State Space Models
Stefano Rando, Luca Romani, Matteo Migliarini, Denis A Gudovskiy, Luca Franco, Luca Rigazio, Fabio Galasso
Needle in the Haystack for Memory Based Large Language Models
Elliot Nelson, Soham Dan, Georgios Kollias, Payel Das, Subhajit Chaudhury
Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models
Mohammad Shahab Sepehri, Zalan Fabian, Mahdi Soltanolkotabi
MSAMamba: Adapting Subquadratic Models To Long-Context DNA MSA Analysis
Vishrut Thoutam, Dina Ellsworth
An All-MLP Sequence Modeling Architecture That Excels at Copying
Chenwei Cui, Zehao Yan, Gedeon Muhawenayo, Hannah Kerner
Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective
Zhen Qin, Xuyang Shen, Dong Li, Weigao Sun, Stan Birchfield, Richard Hartley, Yiran Zhong
ECG Signal Denoising Using Multi-scale Patch Embedding and Transformers
Ding Zhu, Vishnu Kabir Chhabra, Mohammad Mahdi Khalili
FutureTST: When Transformers Meet Future Exogenous Drivers
Kshitij Tayal, Arvind Renganathan, Vipin Kumar, Dan Lu
QSMixer: Connecting SSMs with Mixer Models via Quasi-Separable Matrices
Ali Behrouz, Michele Santacatterina, Ramin Zabih
HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context
Federico Arangath Joseph, Noah Liniger, Kilian Konstantin Haefeli, Caglar Gulcehre
On the Power of Convolution-Augmented Transformer
Mingchen Li, Xuechen Zhang, Yixiao HUANG, Samet Oymak
Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models
Ali Behrouz, Michele Santacatterina, Ramin Zabih
Length independent generalization bounds for deep SSM architectures
Dániel Rácz, Mihaly Petreczky, Balint Daroczy
On the Bottleneck of State Space Models: Locality and Oversmoothing
Pragya Srivastava, Peihao Wang, Ruisi Cai, Jiajun Zhu, Pan Li, Zhangyang Wang
Parallelizing Autoregressive Generation with Variational State-Space Models
Gaspard Lambrechts, Yann Claes, Pierre Geurts, Damien Ernst
Associative Recurrent Memory Transformer
Ivan Rodkin, Yuri Kuratov, Aydar Bulatov, Mikhail Burtsev
Recurrent VAE with Gaussian Process Decoders for De novo Molecular Generation
Vidhi Lalchand, David Lines, Neil D Lawrence
Enhancing Sequence Modeling with Multi-Resolution State Space Models
Mahdi Karami, Ali Behrouz