How LLMs Distort Our Written Language
Marwa Abdulhai, Isadora White, Yanming Wan, Ibrahim Qureshi,
Joel Z. Leibo, Max Kleiman-Weiner, Natasha Jaques
Marwa Abdulhai, Isadora White, Yanming Wan, Ibrahim Qureshi,
Joel Z. Leibo, Max Kleiman-Weiner, Natasha Jaques
Executive Summary
LLMs are used by over a billion people globally, and the most frequent use case is to assist with writing. LLMs can provide a huge efficiency boost, but are they actually writing what we want?
Many users recognize the "feel" of LLM prose, but few people realize the extent to which LLMs distort the meaning of writing. We find this across three datasets: a human user study, a dataset of human argumentative essays, and reviews from a top machine learning conference.
Main Findings
LLMs change the conclusions of writing, changing the stance as well as the argument type
Human users report a paradox of preferences, being satisfied while reporting a statistically significant loss of voice and creativity
LLMs introduce larger semantic shifts than human edits do, even when prompted only to introduce grammar edits
These shifts apply even to our institutions: LLM reviews gave significantly different reasons for acceptance and rejection at the International Conference of Learning Representations (2026).
Why should we care?
As LLMs are integrated into society, these subtle changes in meaning could fundamentally alter politics, culture, science, and even the way we communicate with our friends and family. Our study focuses on argumentative writing, but our findings may generalize to many other forms of writing and communication as well.
When LLMs revise human writing, they induce large homogenizing changes very unlike how people would have edited the same essay.
Upper leftmost shows our counterfactual analysis of human-written essays from ArgRewrite-v2 and human edits. The remaining panels are all LLM edits on the original human essay.
We project these edits into the embedding space MiniLM-L6 and show that semantics change much more dramatically under LLM edits than human edits.
Above is an intuitive an example of how writing with an LLM alters the conclusions and removes the humans voice in essays from ArgRewrite-v2 dataset.
We illustrate this more extensively in our results below.
Methodology & Datasets
We study how LLMs distort meaning in our written language in three datasets:
Human User Study: To understand how humans use LLMs while writing, we conduct a human user study, with 55 users enabled to use the LLM and 45 without access to the LLM. Since many human users chose to abstain from LLM use during their session, we condition our results on this choice and split into two groups: LLM-Influenced, for those who chose not to use or use only for information seeking, and LLM, the group of extensive users. We split them into these groups a priori, by observing their transcripts, final essays, and their self-reported usage score, before evaluating and running our analyses.
ArgRewrite-v2: Using a dataset of 86 human-written essays collected in 2021 — before the widespread release of LLMs — we prompted three production LLMs (gpt-5-mini, gemini-2.5-flash, claude-haiku) to edit essays across five revision types: general revision, minimal edits, grammar edits, completion, and expansion. We compare LLM-generated drafts to human-written revisions along dimensions of semantics, lexical usage, part-of-speech distributions, emotional tone, and stylistic features.
ICLR 2026 Review Analysis: We analyze 18k peer reviews from ICLR 2026, selecting papers with one entirely human-written review and one entirely LLM-generated review. We use an LLM-as-a-Judge classifier to identify the strengths and weaknesses cited in each review and compare scores assigned by humans vs. LLMs.
Heavy LLM users report that their essays do not reflect their own voice.
This presents a paradox of preferences where the user reports satisfaction, but the shift in voice and creativity is apparent.
RLHF optimizes for preferences, but this is not sufficient for maintaining creativity and semantics.
LLMs distort writing by shifting essays in a common semantic direction.
Essays written by humans in the control group are widely spread out throughout the embedding space, occupying a broad region that reflects the diversity of individual perspectives, writing styles, and argumentation.
Essays written by LLMs form a tight cluster in a region that is not occupied by any of the human-written essays. LLM revisions produce large semantic shifts strongly aligned in a common direction, and to a region of space not previously occupied by any human-written essay.
This provides clear evidence that LLMs are shifting semantics in a way human editors do not.
LLMs Alter the Conclusions of Written Language.
LLM users wrote essays that were significantly more neutral and avoided taking a definitive stance on the question "Does money lead to happiness?".
This represents a fundamental shift in argument stance.
LLMs make substantially larger lexical changes than humans.
LLM edits substantially alter the words used in an essay compared to human edits
The unique lexical fingerprint of each writer is overwritten by the LLM's preferred vocabulary.
LLMs Systematically Restructure Grammar Toward a Less Personal, Formal Style.
LLMs adopt a more formal style of writing, increasing the use of nouns and adjectives, and decreasing the use of pronouns, signifying a removal of first-person, experience-based argumentation toward impersonal language.
The Use of LLMs for Writing Increases Emotional Language.
We see a substantial increase in emotional language across both negative and positive emotions when comparing LLM edits to human edits.
Surprisingly, this occurs even when prompted to make minimal changes and with expert feedback.
The Use of LLMs for Writing Increases Analytical, Logical, and Statistical Language.
Left: LIWC analysis on ArgRewrite-v2 edits from LLMs show increase in degree of formal, logical, and hierarchical thinking patterns
Right: In the user study, we find that people are more likely to use arguments related to personal experience, while LLM-written essays are more likely to use statistical and logical arguments.
LLM-influenced essays also cite expert opinions, something that human-written essays rarely do.
LLMs distort decisions affecting scientific institutions.
When LLMs are employed in the scientific review process, LLMs assign scores 10% higher than humans. Humans are 32% more likely to comment on the clarity as a strength, 58% more likely to comment on clarity as a weaknesses, and 32% more likely to comment on the relevance of research, while LLMs are 136% more likely to comment on reproducibility and 84% more likely to comment on scalability for both strengths and weaknesses.
Differences in evaluation criteria between human and LLM reviews may impact decisions made about what scientific work is valid and incentivized.
Conclusions
These results present a troubling picture of AI subtly distorting our written language, and with it, our cultural institutions.
AI-generated content has infiltrated parliamentary speeches, song lyrics, movie scripts, spoken language, and even messages we send to our coworkers and loved ones. What kind of content is prioritized?
Even though people who rely heavily on AI recognize that it diminishes their voice and creativity, they are nevertheless equally satisfied with the results.
The ease of use, combined with the potential to accelerate individual careers, is likely to continue to incentivize people to produce AI-generated text, and even to attempt to pass it off as their own in professional contexts, as the ICLR data shows.
@misc{abdulhai2026llmdistort,
title={How LLMs Distort Our Written Language},
author={Abdulhai, Marwa and White,
Isadora and Wan, Yanming and Qureshi,
Ibrahim and Leibo, Joel and Kleiman-Weiner, Max
and Jaques, Natasha},
year={2026}}