Publications
[2024][pdf] Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Valentin Hoffman, Tomasz Limisiewicz, Yulia Tsvetkov, Noah A. Smith, "MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization", preprint.
[2024][pdf][code][data][blog] Faeze Brahman*, Sachin Kumar*, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi, "The Art of Saying No: Contextual Noncompliance in Language Models", preprint.
[2024][pdf][code] Liwei Jiang, Kavel Rao, Seungju Han, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Yejin Choi, Nouha Dziri, "WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models", preprint.
[2024][pdf][code] Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, "RewardBench: Evaluating Reward Models for Language Modeling", preprint.
[2024][pdf][code] Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo, "Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research", 2024 Conference of the Association for Computational Linguistics (ACL 2024).
[2024][pdf][code] YuHan Liu, Shangbin Feng, Xiaochuang Han, Vidhisha Balachandran, Chan Young Park, Sachin Kumar, Yulia Tsvetkov, "What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization", 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024).
[2024][pdf][code] Xiaochuang Han, Sachin Kumar, Yulia Tsvetkov, Marjan Ghazvininejad, "SSD-2: Scaling and Inference-time Fusion of Diffusion Language Models", 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024).
[2024][pdf][code] Sachin Kumar, Chan Young Park, Yulia Tsvetkov, "Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions", International Conference on Learning Representations (ICLR 2024).
[2023][pdf][code] Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Jungo Kasai, David R. Mortensen, Noah A. Smith, Yulia Tsvetkov, "Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models", 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023).
[2023][pdf][code] Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi and Yulia Tsvetkov, “Minding Language Models’ Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker”, 2023 Conference of the Association for Computational Linguistics (ACL 2023).
[2023] [pdf] [code] Tianxing He, Jingyu Zhang, Tianle Wang, Sachin Kumar, Kyunghyun Cho, James Glass, Yulia Tsvetkov, "On the Blind Spots of Model-Based Evaluation Metrics for Text Generation", 2023 Conference of the Association for Computational Linguistics (ACL 2023).
[2023] [pdf] [code][demo] Xiaochuang Han, Sachin Kumar, Yulia Tsvetkov, "SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control", 2023 Conference of the Association for Computational Linguistics (ACL 2023).
[2023][pdf] Leon Derczynski, Hannah Rose Kirk, Vidhisha Balachandran, Sachin Kumar, Yulia Tsvetkov, M. R. Leiser, Saif Mohammad, "Assessing Language Model Deployment with Risk Cards", preprint.
[2023] [pdf] Sachin Kumar*, Vidhisha Balachandran*, Lucille Njoo, Antonios Anastasopoulos, Yulia Tsvetkov, "Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey", 2023 Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023).
[2022] [pdf] [code] Melanie Sclar, Peter West, Sachin Kumar, Yulia Tsvetkov and Yejin Choi, “Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation”, 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022).
[2022] [pdf][code] Sachin Kumar, Biswajit Paria, Yulia Tsvetkov, “Gradient-based Constrained Sampling from Language Models”, 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022).
[2021] [pdf][code] Sachin Kumar, Eric Malmi, Aliaksei Severyn, Yulia Tsvetkov. Controlled Text Generation as Continuous Optimization with Multiple Constraints. Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS) 2021.
[2021] [pdf] [code] Monisha Jegadeesan, Sachin Kumar, John Wieting, Yulia Tsvetkov. Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs. Multilingual Representation Learning Workshop at EMNLP 2021.
[2021] [pdf] [code] Sachin Kumar, Antonios Anastasopoulos, Shuly Wintner, Yulia Tsvetkov. Machine Translation into Low-Resource Language Varieties. In the proceedings of 2021 Conference on Association of Computational Linguistics (ACL).
[2021] [pdf] Lidia Kidane, Sachin Kumar, Yulia Tsvetkov. An Exploration of Data Augmentation Techniques for Improving English to Tigrinya Translation. The 2nd AfricaNLP Workshop at EACL 2021.
[2020] [pdf] Zi-Yi Dou, Sachin Kumar, Yulia Tsvetkov, A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with Bilingual Semantic Similarity Rewards. The 4th Workshop on Neural Generation and Translation (ACL) 2020
[2019] [pdf] Gayatri Bhat, Sachin Kumar, Yulia Tsvetkov, A Margin-based Loss with Synthetic Negative Samples for Continuous-output Machine Translation, The 3rd Workshop on Neural Generation and Translation (EMNLP) 2019
[2019] [pdf][code] Sachin Kumar, Shuly Wintner, Noah A. Smith, Yulia Tsvetkov, Topics to Avoid: Demoting Latent Confounds in Text Classification, 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2019
[2018] [pdf] [code] Sachin Kumar & Yulia Tsvetkov, Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs, 7th International Conference on Learning Representations (ICLR) 2019.
[2018] [pdf] Shreshtha Mundra*, Sachin Kumar*, Manjira Sinha, Sandya Mannarswamy, Mining & Summarizing E-petitions for Enhanced Understanding of Public Opinion, In Proceedings of the International Conference on Information and Knowledge Management (CIKM) 2018.
[2018] Sachin Kumar, Yulia Tsvetkov, Machine Translation with Continuous Outputs, ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models.
[2017] [pdf] Sachin Kumar, Soumen Chakrabarti, Shourya Roy. Earth Mover Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI) 2017.
[2014] [pdf] Sachin Kumar, Vikas C. Raykar, and Priyanka Agrawal. Decisions under drift: Adapting binary decision thresholds to drifts in test distribution. In Proceedings of the 6th IBM Collaborative Academia Research Exchange Conference. ACM, New York, NY, USA, Article 17, 4 pages. DOI=http://dx.doi.org/10.1145/2662117.2662134