Can you trust AI detectors?

The rise of AI-generated content has led to the development of AI detection tools designed to distinguish between human and machine-authored text. But as a quick look at student forums will show, the reliability of these AI checkers has become the subject of intense debate.

Accuracy and Limitations

Studies indicate that AI detectors often struggle with accuracy. For instance, research published in the International Journal for Educational Integrity evaluated 14 detection tools, including Turnitin and GPTZero, and found that none of them — including Turnitin, which is widely used by universities to check student work — were particularly accurate.

The fact that AI detectors can produce false positives, which means flagging human-written content as AI-generated, is particularly concerning in academic settings, where accusations of AI misuse can have serious consequences.

Savvy users who are familiar with the tools and their shortcomings will take such detectors’ output as just one data point and treat it with the skepticism it deserves, but for users who are less familiar with how AI detectors operate, the manner in which their outputs are given (often as a percentage) provides a false impression of precision, accuracy, and scientific reliability.

Evading detection

As tools aimed at detecting AI-generated content have proliferated, they’ve given rise to other tools designed to evade detection. Paraphrasing text or using adversarial tools have emerged as two techniques that can reduce the effectiveness of AI detectors. One study involving paraphrased AI-generated abstracts showed that detection accuracy dropped significantly after reprocessing by paraphrasers designed to bypass AI detection, yet attempts at paraphrasing text in this manner often result in garbled text that’s far removed from anything a student or academic would want to turn in for assessment or peer review.

Bias and ethical considerations

For us, the final nail in the AI detector coffin is their apparent bias against non-native English speakers, with some detection tools disproportionately flagging text written by individuals whose first language is not English. This raises profound ethical concerns about their deployment against a group of learners and academics who are already at a comparative disadvantage to their native-speaking peers. We know of at least one instance where a client elected to turn in their original, human-written but ungrammatical text to avoid accusations of AI use after an entirely human edit increased their AI score to nearly 80%.

Recommendations

Given these challenges, we strongly discourage reliance on AI detection tools: the technology simply isn’t ready for serious deployment in high-stakes contexts. While it’s possible to combine AI detectors with human oversight to better assess authenticity, the baseline assumption has to be that such checkers are seriously flawed, and that no one with a vested interest in selling subscriptions to AI checkers or paraphrasers is going to be entirely honest about just how inadequate they truly are.

Next
Next

How to reduce your word count