Publications

[2026][pdf] Patrick Queiroz Da Silva, Sanchaita Hazra, Doeun Lee, Sachin Kumar, Bodhisattwa Prasad Majumder, "Process-Oriented Evaluation of AI-Assisted Scientific Writing", 2026 Conference of Language Modeling (COLM 2026). To appear.
[2026][pdf] Doeun Lee, Muge Zhang, Yi Yu, Ashish Manne, Stephen Koesters, Frank Wen, Brady Buchanan, Lynda Villagomez, Oluwatoba Moninuola, James Lim, Kathryn Tobin, Andrew Srisuwananukorn, Ping Zhang, Sachin Kumar, "When Cases Get Rare: A Retrieval Benchmark for Off-Guideline Clinical Question Answering", preprint.
[2026][pdf] Emmy Liu, Varun Gangal, Michael Yu, Zhuofu Tao, Karan Singh, Sachin Kumar, Steven Y. Feng, "HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models", preprint.
[2026][pdf] Karan Singh, Michael Yu, Varun Gangal, Zhuofu Tao, Sachin Kumar, Emmy Liu, Steven Y. Feng, "To Memorize or to Retrieve: Scaling Laws for RAG-Considerate Pretraining", preprint.
[2026][pdf] Tarun Kathuria, Sachin Kumar, "Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion", 2026 Conference of the Association for Computational Linguistics (ACL 2026 Findings).
[2026][pdf][website][code][data] Min Jang, Orevaoghene Ahia, Nazif Tamer, Sachin Kumar, Yulia Tsvetkov, Noah A. Smith, "BASS: Benchmarking Audio LMs for Musical Structure and Semantic Reasoning." preprint.
[2026][pdf][demo][code] Jiwoo Park, Ruoqi Liu, Avani Jagdale, Andrew Srisuwananukorn, Jing Zhao, Lang Li, Ping Zhang, Sachin Kumar, "ClinicalTrialsHub: Bridging Registries and Literature for Comprehensive Clinical Trial Access", The 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2026 System Demonstration).
[2026][pdf][code] Sanchaita Hazra*, Doeun Lee*, Bodhisattwa Prasad Majumder, Sachin Kumar, "Accepted with Minor Revisions: Value of AI-Assisted Scientific Writing", 2026 ACM Conference on Intelligent User Interfaces (IUI 2026). *=equal contribution.
[2026][pdf] Emmy Liu, Varun Gangal, Chelsea Zou, Xiaoqi Huang, Michael Yu, Alex Chang, Zhuofu Tao, Sachin Kumar, Steven Y Feng, "A Unified Definition of Hallucination, Or: It's the World Model, Stupid", Internation Conference on Machine Learning (ICML 2026) Position Paper Track.
[2025]][pdf][code] Zishuo Zheng, Vidhisha Balachandran, Chan Young Park, Faeze Brahman, Sachin Kumar, "Reasoning Up the Instruction Ladder for Controllable Language Models", 2026 Conference of the Association for Computational Linguistics (ACL 2026 Findings), also presented at Foundations of Reasoning in Language Models (FoRLM) Workshop at NeurIPS 2025.
[2025][pdf][code][demo] Hanane Nour Moussa*, Patrick Queiroz Da Silva*, Daniel Adu-Ampratwum, Alyson East, Zitong Lu, Nikki Puccetti, Mingyi Xue, Huan Sun, Bodhisattwa Prasad Majumder, Sachin Kumar, "ScholarEval: Research Idea Evaluation Grounded in Literature", preprint. *=equal contribution.
[2025][pdf] Alex Gulko*, Yusen Peng*, Sachin Kumar, "CE-Bench: Towards a Reliable Contrastive Evaluation Benchmark of Interpretability of Sparse Autoencoders", BlackboxNLP at EMNLP 2025. *=equal contribution.
[2025][pdf][code] Abraham Toluwase Owodunni, Sachin Kumar, "Continually Adding New Languages to Multilingual Language Models", Transactions of Machine Learning Research (TMLR); also presented at Multilingual and Equitable Language Technologies (MELT) Workshop @COLM 2025.
[2025][pdf][code][data] Orevaoghene Ahia, Martijn Bartelds, Kabir Ahuja, Hila Gonen, Valentin Hofmann, Siddhant Arora, Shuyue Stella Li, Vishal Puttagunta, Mofetoluwa Adeyemi, Charishma Buchireddy, Ben Walls, Noah Bennett, Shinji Watanabe, Noah A. Smith, Yulia Tsvetkov, Sachin Kumar, "BLAB: Brutally Long Audio Bench", preprint.
[2025][pdf] Zhouhang Xie, Junda Wu, Yiran Shen, Yu Xia, Xintong Li, Aaron Chang, Ryan Rossi, Sachin Kumar, Bodhisattwa Prasad Majumder, Jingbo Shang, Prithviraj Ammanabrolu, Julian McAuley, "A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models", Conference on Language Modeling (COLM) 2025.
[2025][pdf][code] Abraham Toluwase Owodunni, Orevaoghene Ahia, Sachin Kumar, "FLEXITOKENS: Flexible Tokenization for Evolving Language Models", 2026 Conference of the Association for Computational Linguistics (ACL 2026 Findings), also presented at Tokenization Workshop @ICML 2025.
[2025][pdf] Carolina Hatanpää, Noah A. Smith, Sachin Kumar, "On Distributional Robustness of In-Context Learning for Text Classification", Second Workshop on Test-Time Adaptation: Putting Updates to the Test! @ICML 2025.
[2025][pdf][code] Patrick Queiroz Da Silva, Hari Sethuraman, Dheeraj Rajagopal, Hannaneh Hajishirzi, Sachin Kumar, "Steering off Course: Reliability Challenges in Steering Language Models", 2025 Conference of the Association for Computational Linguistics (ACL 2025). Oral (top 8%), Panel (top 0.8%).
[2025][pdf][code][demo] Jaesung Tae*, Hamish Ivison*, Sachin Kumar, Arman Cohan, "TESS 2: A Large-Scale Generalist Diffusion Language Model", 2025 Conference of the Association for Computational Linguistics (ACL 2025) Oral (top 8%). *=equal contribution.
[2025][pdf][code][data][blog] Lester James V. Miranda, Yizhong Wang, Yanai Elazar, Sachin Kumar, Valentina Pyatkin, Faeze Brahman, Noah A. Smith, Hannaneh Hajishirzi, Pradeep Dasigi, "Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback", 2025 Conference of the Association for Computational Linguistics (ACL 2025).
[2025][pdf][code and data] Harsh Kohli, Sachin Kumar, Huan Sun, "GroundCocoa: A Benchmark for Evaluating Compositional & Conditional Reasoning in Language Models", 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025).
[2025][pdf][code][data] Sachin Kumar*, Chan Young Park*, Yulia Tsvetkov, Noah A. Smith, Hannaneh Hajishirzi, "ComPO: Community Preferences for Language Model Personalization", 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025). *=equal contribution.
[2025][pdf][code][leaderboard] Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, "RewardBench: Evaluating Reward Models for Language Modeling", 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025) Findings.
[2024][pdf][code][data][blog] Faeze Brahman*, Sachin Kumar*, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi, "The Art of Saying No: Contextual Noncompliance in Language Models", Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS) 2024: Datasets and Benchmarks.
[2024][pdf][code] Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Valentin Hoffman, Tomasz Limisiewicz, Yulia Tsvetkov, Noah A. Smith, "MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization", Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS) 2024.
[2024][pdf][code] Liwei Jiang, Kavel Rao, Seungju Han, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Yejin Choi, Nouha Dziri, "WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models", Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS) 2024.

[2024][pdf][code] Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo, "Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research", 2024 Conference of the Association for Computational Linguistics (ACL 2024). Best Resource Paper Award.
[2024][pdf][code] YuHan Liu, Shangbin Feng, Xiaochuang Han, Vidhisha Balachandran, Chan Young Park, Sachin Kumar, Yulia Tsvetkov, "What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization", 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024).
[2024][pdf][code] Xiaochuang Han, Sachin Kumar, Yulia Tsvetkov, Marjan Ghazvininejad, "SSD-2: Scaling and Inference-time Fusion of Diffusion Language Models", 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024).
[2024][pdf][code] Sachin Kumar, Chan Young Park, Yulia Tsvetkov, "Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions", International Conference on Learning Representations (ICLR 2024).
[2023][pdf][code] Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Jungo Kasai, David R. Mortensen, Noah A. Smith, Yulia Tsvetkov, "Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models", 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023).
[2023][pdf][code] Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi and Yulia Tsvetkov, “Minding Language Models’ Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker”, 2023 Conference of the Association for Computational Linguistics (ACL 2023). Outstanding Paper Award
[2023] [pdf] [code] Tianxing He, Jingyu Zhang, Tianle Wang, Sachin Kumar, Kyunghyun Cho, James Glass, Yulia Tsvetkov, "On the Blind Spots of Model-Based Evaluation Metrics for Text Generation", 2023 Conference of the Association for Computational Linguistics (ACL 2023).
[2023] [pdf] [code][demo] Xiaochuang Han, Sachin Kumar, Yulia Tsvetkov, "SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control", 2023 Conference of the Association for Computational Linguistics (ACL 2023).
[2023][pdf] Leon Derczynski, Hannah Rose Kirk, Vidhisha Balachandran, Sachin Kumar, Yulia Tsvetkov, M. R. Leiser, Saif Mohammad, "Assessing Language Model Deployment with Risk Cards", preprint.
[2023] [pdf] Sachin Kumar*, Vidhisha Balachandran*, Lucille Njoo, Antonios Anastasopoulos, Yulia Tsvetkov, "Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey", 2023 Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023).
[2022] [pdf] [code] Melanie Sclar, Peter West, Sachin Kumar, Yulia Tsvetkov and Yejin Choi, “Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation”, 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022).
[2022] [pdf][code] Sachin Kumar, Biswajit Paria, Yulia Tsvetkov, “Gradient-based Constrained Sampling from Language Models”, 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022).
[2021] [pdf][code] Sachin Kumar, Eric Malmi, Aliaksei Severyn, Yulia Tsvetkov. Controlled Text Generation as Continuous Optimization with Multiple Constraints. Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS) 2021.
[2021] [pdf] [code] Monisha Jegadeesan, Sachin Kumar, John Wieting, Yulia Tsvetkov. Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs. Multilingual Representation Learning Workshop at EMNLP 2021.
[2021] [pdf] [code] Sachin Kumar, Antonios Anastasopoulos, Shuly Wintner, Yulia Tsvetkov. Machine Translation into Low-Resource Language Varieties. In the proceedings of 2021 Conference on Association of Computational Linguistics (ACL).
[2021] [pdf] Lidia Kidane, Sachin Kumar, Yulia Tsvetkov. An Exploration of Data Augmentation Techniques for Improving English to Tigrinya Translation. The 2nd AfricaNLP Workshop at EACL 2021.
[2020] [pdf] Zi-Yi Dou, Sachin Kumar, Yulia Tsvetkov, A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with Bilingual Semantic Similarity Rewards. The 4th Workshop on Neural Generation and Translation (ACL) 2020
[2019] [pdf] Gayatri Bhat, Sachin Kumar, Yulia Tsvetkov, A Margin-based Loss with Synthetic Negative Samples for Continuous-output Machine Translation, The 3rd Workshop on Neural Generation and Translation (EMNLP) 2019
[2019] [pdf][code] Sachin Kumar, Shuly Wintner, Noah A. Smith, Yulia Tsvetkov, Topics to Avoid: Demoting Latent Confounds in Text Classification, 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2019
[2018] [pdf] [code] Sachin Kumar & Yulia Tsvetkov, Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs, 7th International Conference on Learning Representations (ICLR) 2019.
[2018] [pdf] Shreshtha Mundra*, Sachin Kumar*, Manjira Sinha, Sandya Mannarswamy, Mining & Summarizing E-petitions for Enhanced Understanding of Public Opinion, In Proceedings of the International Conference on Information and Knowledge Management (CIKM) 2018.
[2018] Sachin Kumar, Yulia Tsvetkov, Machine Translation with Continuous Outputs, ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models.
[2017] [pdf] Sachin Kumar, Soumen Chakrabarti, Shourya Roy. Earth Mover Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI) 2017.
[2014] [pdf] Sachin Kumar, Vikas C. Raykar, and Priyanka Agrawal. Decisions under drift: Adapting binary decision thresholds to drifts in test distribution. In Proceedings of the 6th IBM Collaborative Academia Research Exchange Conference. ACM, New York, NY, USA, Article 17, 4 pages. DOI=http://dx.doi.org/10.1145/2662117.2662134

Google Sites

Report abuse