Welcome
I am a joint J.D.-Ph.D. student at Stanford University, co-advised by Professor Dan Jurafsky and Professor James Zou in the CS Department and Professor Daniel E. Ho at Stanford Law School. I am also pursuing a Ph.D. Minor in the Philosophy, Language, and the Arts program; and working as a Graduate Research Fellow at the Regulation, Evaluation, and Governance Lab (RegLab). Previously, I completed my undergraduate studies at Harvard College, pursuing a joint degree in Mathematics and Computer Science and a secondary in Folklore & Mythology.
ML & NLP Research
I am broadly interested in natural-language processing (NLP) and machine learning (ML).
My current research endeavours in NLP cover the following topics:
Understanding the limitations and capabilities of large language models.
Developing more efficient and interpretable methods for natural-language generation.
Investigating the strengths and weaknesses of prompting methods.
Using formal language theory to analyze and understand neural models for language.
Applying NLP tools to study large-scale and well-structured data such as US patent applications.
Implementing and evaluating ML solutions for societal benefits and policy impact.
Developing efficient algorithms for string-to-string problems.
Selected Publications
Belief in the Machine: Investigating Epistemological Blind Spots of Language Models
Mirac Suzgun, Tayfun Gur, Federico Bianchi, Daniel E. Ho, Thomas Icard, Dan Jurafsky, James Zou
arXiv 2024
[paper] / [website]
AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County
Faiz Surani*, Mirac Suzgun*, Vyoma Roman, Christopher D. Manning, Peter Henderson, Daniel E. Ho
arXiv 2024
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Manning, Daniel E. Ho
arXiv 2024
Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models
Matthew Dahl, Varun Magesh, Mirac Suzgun, Daniel E. Ho
Journal of Legal Analysis
[paper] / [GitHub] / [Bloomberg Law News] / [Stanford HAI blogpost]
string2string: A Modern Python Library for String-to-String Algorithms
Mirac Suzgun, Stuart M. Shieber, Dan Jurafsky
ACL 2024 (Systems Demo Track)
[paper] / [GitHub] / [documentation] / [pip install string2string]
Assessing the Potential of GPT-4 to Perpetuate Racial and Gender Biases in Health Care: A Model Evaluation Study
Travis Zack, Eric Lehman, Mirac Suzgun, Jorge A. Rodriguez, Leo Anthony Celi, Judy Gichoya, Dan Jurafsky, Peter Szolovits, David W. Bates, Raja-Elie E. Abdulnour, Atul J. Butte, Emily Alsentzer
The Lancet Digital Health
[paper] / [pubmed] / [STAT News] / [GitHub]
Do Language Models Know When They're Hallucinating References?
Ayush Agrawal, Mirac Suzgun, Lester Mackey, Adam Tauman Kalai
EACL 2024
[paper] / [GitHub]
A Benchmark for Learning to Translate a New Language from One Grammar Book
Garrett Tanzer, Mirac Suzgun, Eline Visser, Dan Jurafsky, Luke Melas-Kyriazi
ICLR 2024 (Oral Presentation)
[paper] / [GitHub] / [Gemini Blogpost] / [Jeff Dean's Twitter post]
Scaling Instruction-Finetuned Language Models
Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei
JMLR 2024
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei
ICLR 2023
The Harvard USPTO Patent Dataset: A Large-Scale, Well-Structured, and Multi-Purpose Corpus of Patent Applications
Mirac Suzgun, Luke Melas-Kyriazi, Suproteem K. Sarkar, Scott Duke Kominers, Stuart M. Shieber
NeurIPS Datasets and Benchmarks 2023 (Spotlight)
Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models
Google Research's BIG-Bench Effort
[Contributed to the Dyck Languages and Dynamic Counting tasks]
TMLR 2023
Formal Language Theory as a Framework for Understanding the Limitations of Recurrent Neural Networks
Mirac Suzgun
Undergraduate Thesis (advised by Stuart M. Shieber & Peter B. Kronheimer). Awarded the Thomas T. Hoopes Prize.
Available upon request.