Accepted Papers

Spotlights

Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling

Raunaq Bhirangi, Chenyu Wang, Venkatesh Pattabiraman, Carmel Majidi, Abhinav Gupta, Tess Hellebrekers, Lerrel Pinto


BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

Qizhen Zhang, Nikolas Gritsch, Dwaraknath Gnaneshwar, Simon Guo, David Cairuz, Bharat Venkitesh, Jakob Nicolaus Foerster, Phil Blunsom, Sebastian Ruder, Ahmet Üstün, Acyr Locatelli


Probing the Decision Boundaries of In-context Learning in Large Language Models

Siyan Zhao, Tung Nguyen, Aditya Grover


Towards a theory of learning dynamics in deep state space models

Jakub Smekal, Jimmy T.H. Smith, Michael Kleinman, Dan Biderman, Scott Linderman


Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Alexander Hägele, Elie Bakouch, Atli Kosson, Loubna Ben allal, Leandro Von Werra, Martin Jaggi


On Feature Learning in Structured State Space Models

Leena Chennuru Vankadara, Jin Xu, Moritz Haas, Volkan Cevher

Posters

Randomized Signatures for processing long-range Sequences on Graphs

Lukas Gruber, Bernhard Schäfl, Johannes Brandstetter, Sepp Hochreiter


Viewing Attention as a Recurrent Neural Network

Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Mohamed Osama Ahmed, Yoshua Bengio, Greg Mori


DynaGraph: Dynamic Contrastive Graph for Interpretable Multi-label Prediction using Time-Series EHR Data

Munib Mesinovic, Soheila Molaei, Peter Watkinson, Tingting Zhu


EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation

Yuqiao Wen, Behzad Shayegh, Chenyang Huang, Yanshuai Cao, Lili Mou


Delay Embedding Theory of Neural Sequence Models

Mitchell Ostrow, Adam Joseph Eisen, Ila R Fiete


xLSTM: Extended Long Short-Term Memory

Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael K Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter


Multi-Task Instruction Training of Text Diffusion Models

Changyou Chen, Gargi Balasubramaniam, Rui Meng, Han Zhao, Bunyamin Sisman, qingjun cui


KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty

Philipp Becker, Niklas Freymuth, Gerhard Neumann


MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski, Michał Krutul, Jakub Krajewski, Szymon Antoniak, Piotr Miłoś, Marek Cygan, Sebastian Jaszczur


RotRNN: Modelling Long Sequences with Rotations

Rares Dolga, Kai Biegun, Harry Jake Cunningham, David Barber


Fleet of Agents: Coordinated Problem Solving with Large Language Models using Genetic Particle Filtering

Akhil Arora, Lars Henning Klein, Nearchos Potamitis, Roland Aydin, Caglar Gulcehre, Robert West


Rough Transformers: Lightweight Continuous-Time Sequence Modelling with Path Signatures

Fernando Moreno-Pino, Alvaro Arroyo, Harrison Waldon, Xiaowen Dong, Alvaro Cartea


Enhancing Transformer RNNs with Multiple Temporal Perspectives

Razvan-Gabriel Dumitru, Darius Peteleaza, Mihai Surdeanu


Vision-LSTM: xLSTM as Generic Vision Backbone

Benedikt Alkin, Maximilian Beck, Korbinian Pöppel, Sepp Hochreiter, Johannes Brandstetter


Orthogonal residual connections for long-term memory retention in recurrent neural networks

Andrea Ceni, Claudio Gallicchio


LongSSM: On the Length Extension of State-space Models in Language Modelling

Shida Wang


Reservoir Structured State Space Models

Giuseppe Lombardi, Claudio Gallicchio, Andrea Ceni


When can transformers compositionally generalize in-context?

Seijin Kobayashi, Simon Schug, Yassir Akram, Florian Redhardt, Johannes Von Oswald, Razvan Pascanu, Guillaume Lajoie, Joao Sacramento


Latte: Latent Attention for Linear Time Transformers

Rares Dolga, Marius Cobzarenco, Ahmed H. Shahin, David Barber


Recurrent Action Transformer with Memory

Egor Cherepanov, Aleksei Staroverov, Dmitry Yudin, Alexey Kovalev, Aleksandr Panov


Reservoir Memory Networks: Long-range temporal dependencies with untrained RNNs

Claudio Gallicchio, Andrea Ceni


The Role of State Matrix Initialization in SSMs: A Perspective on the Approximation-Estimation Tradeoff

Fusheng Liu, Qianxiao Li


State soup: in-context skill learning, retrieval and mixing

Maciej Pióro, Maciej Wolczyk, Razvan Pascanu, Johannes Von Oswald, Joao Sacramento


Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information

Fedor Sergeev, Paola Malsot, Gunnar Ratsch, Vincent Fortuin


Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling

Harry Jake Cunningham, Giorgio Giannone, Mingtian Zhang, Marc Peter Deisenroth


OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang


Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe

Albert Q. Jiang, Alicja Ziarko, Bartosz Piotrowski, Wenda Li, Mateja Jamnik, Piotr Miłoś


Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis

Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre


Q-S5: Towards Quantized State Space Models

Steven Abreu, Jens Egholm Pedersen, Kade Heckel, Alessandro Pierro


Pretrained Hybrids with MAD Skills

Nicholas Roberts, Samuel Guo, Zhiqi Gao, Satya Sai Srinath Namburi GNVV, Sonia Cromp, Chengjun Wu, Chengyu Duan, Frederic Sala


Selective Attention: Enhancing Transformer through Principled Context Control

Xuechen Zhang, Xiangyu Chang, Mingchen Li, Amit Roy-Chowdhury, Jiasi Chen, Samet Oymak


State Space Models for Brain Computer Interfaces?

Pablo Soëtard, Miran Özdogan, Oiwi Parker Jones


SeRpEnt: Selective Resampling for Expressive State Space Models

Stefano Rando, Luca Romani, Matteo Migliarini, Denis A Gudovskiy, Luca Franco, Luca Rigazio, Fabio Galasso


Needle in the Haystack for Memory Based Large Language Models

Elliot Nelson, Soham Dan, Georgios Kollias, Payel Das, Subhajit Chaudhury


Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models

Mohammad Shahab Sepehri, Zalan Fabian, Mahdi Soltanolkotabi


MSAMamba: Adapting Subquadratic Models To Long-Context DNA MSA Analysis

Vishrut Thoutam, Dina Ellsworth


An All-MLP Sequence Modeling Architecture That Excels at Copying

Chenwei Cui, Zehao Yan, Gedeon Muhawenayo, Hannah Kerner


Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective

Zhen Qin, Xuyang Shen, Dong Li, Weigao Sun, Stan Birchfield, Richard Hartley, Yiran Zhong


ECG Signal Denoising Using Multi-scale Patch Embedding and Transformers

Ding Zhu, Vishnu Kabir Chhabra, Mohammad Mahdi Khalili


FutureTST: When Transformers Meet Future Exogenous Drivers

Kshitij Tayal, Arvind Renganathan, Vipin Kumar, Dan Lu


QSMixer: Connecting SSMs with Mixer Models via Quasi-Separable Matrices

Ali Behrouz, Michele Santacatterina, Ramin Zabih


HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context

Federico Arangath Joseph, Noah Liniger, Kilian Konstantin Haefeli, Caglar Gulcehre



On the Power of Convolution-Augmented Transformer

Mingchen Li, Xuechen Zhang, Yixiao HUANG, Samet Oymak


Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models

Ali Behrouz, Michele Santacatterina, Ramin Zabih


Length independent generalization bounds for deep SSM architectures

Dániel Rácz, Mihaly Petreczky, Balint Daroczy


On the Bottleneck of State Space Models: Locality and Oversmoothing

Pragya Srivastava, Peihao Wang, Ruisi Cai, Jiajun Zhu, Pan Li, Zhangyang Wang


Parallelizing Autoregressive Generation with Variational State-Space Models

Gaspard Lambrechts, Yann Claes, Pierre Geurts, Damien Ernst


Associative Recurrent Memory Transformer

Ivan Rodkin, Yuri Kuratov, Aydar Bulatov, Mikhail Burtsev


Recurrent VAE with Gaussian Process Decoders for De novo Molecular Generation

Vidhi Lalchand, David Lines, Neil D Lawrence


Enhancing Sequence Modeling with Multi-Resolution State Space Models

Mahdi Karami, Ali Behrouz