It is critical to the future of higher education, the value of the certification, and the success of our students and graduates, that we do not ignore potential integrity violations because they circumvent learning and/or impede the honest and fair evaluation of learning outcomes achievement.
Having said that, it is quite difficult to identify when GenAI has been misused in a way that undermines academic integrity and harms learning.
This page outlines why we should be cautious in identifying misuse and then what might account as sufficient evidence to suspect misuse
When we say that GenAI has been "misused" we mean that it was used when it was prohibited, or in ways that were prohibited or in ways that undermined honesty and fairness.
So, for example, a student uses ChatGPT 4o to analyze their collected data and produce graphs even though GenAI use was prohibited.
Or, a student who was allowed to use GenAI to edit their assignment, but used it to generate fake data instead of collecting real data for their lab.
A student was allowed to use GenAI to understand course concepts, but they submitted a paper that talked about readings that don't actually exist.
The Academic Integrity Office at UC San Diego has met with over 200 students alleged to have violated academic integrity by misusing GenAI. In 69% of those cases, the student admit responsibility for using GenAI inappropriately or unethically. 9% fail to respond (and so are presumed to be accepting per Policy) and 22% contest the allegation. In those contested cases: 8% are withdrawn by the instructor; 5% are held responsible by the AI Review Board; and, 2% are not held responsible by the AI Review Board.
Interestingly, then, the percentage of UC San Diego cases confirmed as GenAI misuse (74%) is pretty close to what has been found in one research study, which shows that psychology instructors are accurate in identifying papers as AI versus human generated about 70% of the time.
So, what might this evidence of GenAI use look like?
Language Patterns or Irregularities
repetitive phrases or words
odd or inconsistent use of language
a more sophisticated vocabulary, grammar or structure than normal for that student or for that course
Sources and Citations
GenAI tools confabulate, so they will make up research, books, authors, titles
GenAI tools also have a hard time sticking to the instructions, so they might use real sources, but not those associated with the course
AI produced papers often lack in-text citations or any citations/references at all
Factual Errors
GenAI tools are just prediction machines that make choices based on patterns, so they can't tell right from wrong, or fact from fiction
Blandness
the content is vague or generic, often not connected to course readings, discussions or context
Off-Topic
the submission doesn't address the prompt
AI Disclaimers
the student's submission includes phrases typical in GenAI generated output when it is "helping" address an assignment prompt
these might look like:
"but you should run this experiment yourself"
"I am just an AI so I don't have feelings"
"if you were to do this paper, I suggest that you..."
GenAI Output Similarities
the student's submission is the only one in the class that matches the GenAI output, and the submission is unusual in its formatting, errors, sources, etc.
Institutionally Approved AI Detector Finding
AI Detector tools should only be used if vetted and approved by your institution
AI Detectors should not be used on submissions of less than 350 words (Turnitin, for example, won't run their detector on such short submissions because it decreases its accuracy)
Even though, take their "findings" with a large grain of salt (see next section)
You may find this checklist of typical evidence helpful when reviewing student work to try to decide if you have sufficient evidence to report.
NOTE: none of these indicators (including so-called "AI Detectors" constitute, by themselves, evidence of GenAI misuse. In fact, you have likely noticed that any one of these indicators could also be found in an honestly completed "bad" assignment (see below).
First, because we overly rely on product as artifacts of learning in higher education, the signs of AI misuse are often the same as signs of an honestly completed, but poorly executed, assessment. So, how is an instructor to know that the "product" is bad because the student violated academic integrity by misusing GenAI versus the student just didn't do the readings, attend the lectures, or spend the thinking and doing time required to produce a "good product"?
Second, while many humans express that they can differentiate between AI and human generated products, that is not always true. Two independent studies by UC San Diego researchers studied this phenomenon. In one student, the researchers found that psychology instructors have a 70% chance of correctly identifying when ChatGPT versus a student wrote a psychology essay, and those who expressed confidence did not have a better success rate. Another set of UC San Diego researchers studied online communications and found that humans who could not interrogate the source of the text were less accurate at identifying AI-generated content than humans who could, but even those who could, had only 64.8% accuracy rate.
Third, AI tools are also not very good at detecting AI-generated versus human text. In the Rathi et al study, AI tools were less accurate than the humans, especially when they were analyzing text that they produced! And in the Waltzer et al study, ChatGPT performed worse than instructors with a 63% accuracy rate. In studies that compare the efficacy of different "AI detectors", the success rates vary from as low as 35% to a high of just under 80%, with false positives (identifying human text as AI generated) and false negatives (identify AI generated text as human generated). Now, 80% is pretty good; that's the Turnitin product and it also has very low false positive rates. However, that is a paid service that many instructors might not have access to (depending on their institution).
If you are at least 50% sure that GenAI was misused on an assessment (the "more likely than not" standard), then you can proceed to Addressing the Suspected Misuse Section of this site, which recommends talking with the student as a next step.
However, you are really not sure and what the student submitted would fail regardless, then perhaps just grade it accordingly. If GenAI was misused, at least the student isn't gaining an unfair academic advantage in such a case.