Naoki Nishikawa, Rei Higuchi, Taiji Suzuki
Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency
Advances in Neural Information Processing Systems 38 (NeurIPS 2025)
arXiv:2507.03340
Rei Higuchi, Ryotaro Kawata, Naoki Nishikawa, Kazusato Oko, Shoichiro Yamaguchi, Sosuke Kobayashi, Seiya Tokui, Kohei Hayashi, Daisuke Okanohara, Taiji Suzuki
When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Second Conference on Language Modeling (COLM 2025)
arXiv:2504.17562
Naoki Nishikawa, Yujin Song, Kazusato Oko, Denny Wu, Taiji Suzuki
Nonlinear transformers can perform inference-time feature learning
Forty-second International Conference on Machine Learning (ICML 2025)
Ryotaro Kawata, Kohsei Matsutani, Yuri Kinoshita, Naoki Nishikawa, Taiji Suzuki
Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning
Forty-second International Conference on Machine Learning (ICML 2025)
arXiv:2506.01656
Naoki Nishikawa, Taiji Suzuki
State Space Models are Provably Comparable to Transformers in Dynamic Token Selection
The Thirteenth International International Conference on Learning Representations (ICLR 2025)
arxiv:2405.19036
Naoki Nishikawa, Yuichi Ike, Kenji Yamanishi
Adaptive Topological Feature via Persistent Homology: Filtration Learning for Point Clouds
Advances in Neural Information Processing Systems 36 (NeurIPS 2023)
arxiv:2307.09259
Naoki Nishikawa, Taiji Suzuki, Atsushi Nitanda, Denny Wu
Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime
Advances in Neural Information Processing Systems 35 (NeurIPS 2022)