Accepted Papers

Spotlights

Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling

Raunaq Bhirangi, Chenyu Wang, Venkatesh Pattabiraman, Carmel Majidi, Abhinav Gupta, Tess Hellebrekers, Lerrel Pinto

BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

Qizhen Zhang, Nikolas Gritsch, Dwaraknath Gnaneshwar, Simon Guo, David Cairuz, Bharat Venkitesh, Jakob Nicolaus Foerster, Phil Blunsom, Sebastian Ruder, Ahmet Üstün, Acyr Locatelli

Probing the Decision Boundaries of In-context Learning in Large Language Models

Siyan Zhao, Tung Nguyen, Aditya Grover

Towards a theory of learning dynamics in deep state space models

Jakub Smekal, Jimmy T.H. Smith, Michael Kleinman, Dan Biderman, Scott Linderman

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Alexander Hägele, Elie Bakouch, Atli Kosson, Loubna Ben allal, Leandro Von Werra, Martin Jaggi

On Feature Learning in Structured State Space Models

Leena Chennuru Vankadara, Jin Xu, Moritz Haas, Volkan Cevher

Posters

Randomized Signatures for processing long-range Sequences on Graphs

Lukas Gruber, Bernhard Schäfl, Johannes Brandstetter, Sepp Hochreiter

Viewing Attention as a Recurrent Neural Network

Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Mohamed Osama Ahmed, Yoshua Bengio, Greg Mori

DynaGraph: Dynamic Contrastive Graph for Interpretable Multi-label Prediction using Time-Series EHR Data

Munib Mesinovic, Soheila Molaei, Peter Watkinson, Tingting Zhu

EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation

Yuqiao Wen, Behzad Shayegh, Chenyang Huang, Yanshuai Cao, Lili Mou

Delay Embedding Theory of Neural Sequence Models

Mitchell Ostrow, Adam Joseph Eisen, Ila R Fiete

xLSTM: Extended Long Short-Term Memory

Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael K Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter

Multi-Task Instruction Training of Text Diffusion Models

Changyou Chen, Gargi Balasubramaniam, Rui Meng, Han Zhao, Bunyamin Sisman, qingjun cui

KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty

Philipp Becker, Niklas Freymuth, Gerhard Neumann

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski, Michał Krutul, Jakub Krajewski, Szymon Antoniak, Piotr Miłoś, Marek Cygan, Sebastian Jaszczur

RotRNN: Modelling Long Sequences with Rotations

Rares Dolga, Kai Biegun, Harry Jake Cunningham, David Barber

Fleet of Agents: Coordinated Problem Solving with Large Language Models using Genetic Particle Filtering

Akhil Arora, Lars Henning Klein, Nearchos Potamitis, Roland Aydin, Caglar Gulcehre, Robert West

Rough Transformers: Lightweight Continuous-Time Sequence Modelling with Path Signatures

Fernando Moreno-Pino, Alvaro Arroyo, Harrison Waldon, Xiaowen Dong, Alvaro Cartea

Enhancing Transformer RNNs with Multiple Temporal Perspectives

Razvan-Gabriel Dumitru, Darius Peteleaza, Mihai Surdeanu

Vision-LSTM: xLSTM as Generic Vision Backbone

Benedikt Alkin, Maximilian Beck, Korbinian Pöppel, Sepp Hochreiter, Johannes Brandstetter

Orthogonal residual connections for long-term memory retention in recurrent neural networks

Andrea Ceni, Claudio Gallicchio

LongSSM: On the Length Extension of State-space Models in Language Modelling

Shida Wang

Reservoir Structured State Space Models

Giuseppe Lombardi, Claudio Gallicchio, Andrea Ceni

When can transformers compositionally generalize in-context?

Seijin Kobayashi, Simon Schug, Yassir Akram, Florian Redhardt, Johannes Von Oswald, Razvan Pascanu, Guillaume Lajoie, Joao Sacramento

Latte: Latent Attention for Linear Time Transformers

Rares Dolga, Marius Cobzarenco, Ahmed H. Shahin, David Barber

Recurrent Action Transformer with Memory

Egor Cherepanov, Aleksei Staroverov, Dmitry Yudin, Alexey Kovalev, Aleksandr Panov

Reservoir Memory Networks: Long-range temporal dependencies with untrained RNNs

Claudio Gallicchio, Andrea Ceni

The Role of State Matrix Initialization in SSMs: A Perspective on the Approximation-Estimation Tradeoff

Fusheng Liu, Qianxiao Li

State soup: in-context skill learning, retrieval and mixing

Maciej Pióro, Maciej Wolczyk, Razvan Pascanu, Johannes Von Oswald, Joao Sacramento

Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information

Fedor Sergeev, Paola Malsot, Gunnar Ratsch, Vincent Fortuin

Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling

Harry Jake Cunningham, Giorgio Giannone, Mingtian Zhang, Marc Peter Deisenroth

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang

Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe

Albert Q. Jiang, Alicja Ziarko, Bartosz Piotrowski, Wenda Li, Mateja Jamnik, Piotr Miłoś

Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis

Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre

Q-S5: Towards Quantized State Space Models

Steven Abreu, Jens Egholm Pedersen, Kade Heckel, Alessandro Pierro

Pretrained Hybrids with MAD Skills

Nicholas Roberts, Samuel Guo, Zhiqi Gao, Satya Sai Srinath Namburi GNVV, Sonia Cromp, Chengjun Wu, Chengyu Duan, Frederic Sala

Selective Attention: Enhancing Transformer through Principled Context Control

Xuechen Zhang, Xiangyu Chang, Mingchen Li, Amit Roy-Chowdhury, Jiasi Chen, Samet Oymak

State Space Models for Brain Computer Interfaces?

Pablo Soëtard, Miran Özdogan, Oiwi Parker Jones

SeRpEnt: Selective Resampling for Expressive State Space Models

Stefano Rando, Luca Romani, Matteo Migliarini, Denis A Gudovskiy, Luca Franco, Luca Rigazio, Fabio Galasso

Needle in the Haystack for Memory Based Large Language Models

Elliot Nelson, Soham Dan, Georgios Kollias, Payel Das, Subhajit Chaudhury

Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models

Mohammad Shahab Sepehri, Zalan Fabian, Mahdi Soltanolkotabi

MSAMamba: Adapting Subquadratic Models To Long-Context DNA MSA Analysis

Vishrut Thoutam, Dina Ellsworth

An All-MLP Sequence Modeling Architecture That Excels at Copying

Chenwei Cui, Zehao Yan, Gedeon Muhawenayo, Hannah Kerner

Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective

Zhen Qin, Xuyang Shen, Dong Li, Weigao Sun, Stan Birchfield, Richard Hartley, Yiran Zhong

ECG Signal Denoising Using Multi-scale Patch Embedding and Transformers

Ding Zhu, Vishnu Kabir Chhabra, Mohammad Mahdi Khalili

FutureTST: When Transformers Meet Future Exogenous Drivers

Kshitij Tayal, Arvind Renganathan, Vipin Kumar, Dan Lu

QSMixer: Connecting SSMs with Mixer Models via Quasi-Separable Matrices

Ali Behrouz, Michele Santacatterina, Ramin Zabih

HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context

Federico Arangath Joseph, Noah Liniger, Kilian Konstantin Haefeli, Caglar Gulcehre

On the Power of Convolution-Augmented Transformer

Mingchen Li, Xuechen Zhang, Yixiao HUANG, Samet Oymak

Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models

Ali Behrouz, Michele Santacatterina, Ramin Zabih

Length independent generalization bounds for deep SSM architectures

Dániel Rácz, Mihaly Petreczky, Balint Daroczy

On the Bottleneck of State Space Models: Locality and Oversmoothing

Pragya Srivastava, Peihao Wang, Ruisi Cai, Jiajun Zhu, Pan Li, Zhangyang Wang

Parallelizing Autoregressive Generation with Variational State-Space Models

Gaspard Lambrechts, Yann Claes, Pierre Geurts, Damien Ernst

Associative Recurrent Memory Transformer

Ivan Rodkin, Yuri Kuratov, Aydar Bulatov, Mikhail Burtsev

Recurrent VAE with Gaussian Process Decoders for De novo Molecular Generation

Vidhi Lalchand, David Lines, Neil D Lawrence

Enhancing Sequence Modeling with Multi-Resolution State Space Models

Mahdi Karami, Ali Behrouz

Page updated

Google Sites

Report abuse