Can Turnitin detect GPT-4 specifically?

Turnitin's marketing names GPT-4 and ChatGPT explicitly among the models the report is trained against. Detection is statistical, not model-fingerprint based — the report doesn't tag your text 'GPT-4,' it returns an AI probability the document was machine-generated.

Will Turnitin flag my paper if I used ChatGPT only for outlining?

Probably not on its own. Turnitin reads finished prose, not your process. An outline that you then expanded by hand looks like your writing, because it is. The risk shows up when AI-generated sentences make it into the final draft untouched.

What if I rewrite ChatGPT output by hand?

Hand-rewriting changes the variables the detector reads — sentence length, word predictability, structural variation. A genuine line-by-line rewrite usually drops the AI score substantially. A synonym swap usually doesn't.

Can I see my own Turnitin AI score before submitting?

Generally no. The AI writing report is visible to instructors and admins only at most institutions. The closest workaround is a Turnitin-parity report — a four-detector PDF you can generate yourself before submitting.

Does Turnitin Detect ChatGPT? (2026 Update)

Q: Does Turnitin detect ChatGPT?

Yes, in the common case. Turnitin's AI writing report is trained on output from GPT-4, Claude, Gemini, and other major LLMs per Turnitin's own product page. Paste raw ChatGPT output into a 1,000-word essay and the report will usually return a high AI percentage. Edited or mixed content is far less consistent.

The short version is yes: paste raw ChatGPT output into a paper and Turnitin’s AI writing report will usually catch it. The longer version is the one that matters, what changes when you edit, when you mix AI prose with your own, and what a “detected” verdict actually means inside an instructor’s grading view.

This post is part of our Turnitin cluster. For the broader picture of how the detector works, start with the Turnitin AI detection guide. For the accuracy debate, see the accuracy post.

What Turnitin says it catches

Turnitin’s product pages describe their AI writing detector as trained against the major LLM families, ChatGPT (GPT-3.5 and GPT-4), Claude, Gemini, and other large models. The marketing figure they quote is 98% AI-detection accuracy with under 1% false positives on the test set Turnitin used internally. They have not published the test set methodology, sample composition, or per-model breakdown.

Weber-Wulff et al. 2023 (Int J Educ Integr 19:26) benchmarked 14 detection tools and found none reached the accuracy needed to be considered reliable in academic integrity workflows — most tools either over-flagged human writing or missed machine-paraphrased AI text.

For students, the practical takeaway from Turnitin’s own materials is:

They explicitly claim ChatGPT (all major versions) is covered.
They report a single document-level percentage, not a per-model verdict.
They scope the claim to long-form prose; short responses are explicitly listed as less reliable in their support documentation.

Anything beyond that (per-model bypass rates, exact thresholds, false-positive rates by demographic) comes from third-party classroom audits, not Turnitin.

StealthZero numbers for Turnitin workflows

Free tier handles 600 rephrase requests per month with a 20-per-day cap. Sentrio v2 enforces a 100-word minimum for accurate scoring. Multi-detector Proof Reports bundle four detectors — Turnitin, GPTZero, Winston, and CopyLeaks — for $2.80 per single report or $22.40 for a 10-pack.

Free plan: 600 requests/month, 20/day hard cap, unlimited words per request
Starter ($9.99/mo): 1,500 combined Sentinel/F.R.I.D.A.Y requests, 50/day cap, 1 AI Report credit/month
Pro ($19.99/mo): 3,000 advanced requests, 100/day cap, 2 AI Reports/month, unlimited detector scans
Premium ($29.99/mo): unlimited all models, 3 AI Reports/month
Proof Report bundle: Turnitin + GPTZero + Winston + CopyLeaks in one PDF
Liang et al. 2023 (arXiv:2304.02819) found ESL writers received false positives at over 60% on multiple GPT detectors — relevant context for any Turnitin appeal

Why ChatGPT prose stands out

ChatGPT is a transformer trained to predict the next token. That training objective bakes in three statistical habits that AI detectors look for, regardless of which detector you use.

Low perplexity. Each word is the likely word given the words around it. The model is paid, metaphorically, to be predictable.
Low burstiness. Sentence lengths cluster. Complexity holds steady across paragraphs. Real human writing has more rhythmic variation.
Stylistic uniformity. ChatGPT doesn’t get tired, doesn’t lose interest in paragraph six, doesn’t switch tone halfway through. Student writing does.

A Turnitin AI score is essentially a weighted measure of these three signals across your document. ChatGPT output, undisturbed, scores high on every one.

The Turnitin AI detection guide walks through these signals in more depth.

What changes when you edit ChatGPT output

The detector reads finished prose, not your editing history. What changes between “paste and submit” and “edit then submit” is the statistical shape of the text, and small edits change very little.

A rough mental model for how prose-level changes affect the score:

Edit level	What you actually did	Typical effect on the score
Synonym swap	Replace a few words with thesaurus picks	Negligible. Sentence cadence, length, and structure all preserved.
Sentence-level paraphrase	Rewrite each sentence with same meaning	Modest. The model still produced the underlying structure.
Section rewrite	Rewrite a whole paragraph from your understanding of the points	Substantial. You’re introducing your own cadence.
Outline-only assist	AI gave the structure, you wrote the prose	Usually invisible to the detector, the prose is yours.
Humanizer rewrite	Tool rewrites sentence-by-sentence targeting perplexity/burstiness	Substantial. This is what the category exists to do.

Sadasivan et al. 2023 (arXiv:2303.11156) showed that even the strongest AI text detectors degrade toward random-chance accuracy under light paraphrasing attacks, suggesting a theoretical ceiling on reliable detection of high-quality AI text.

The trap students fall into is the first two rows. Synonym swaps and light paraphrases feel like meaningful edits because they take effort. They aren’t, statistically. The detector doesn’t care that you replaced “demonstrate” with “show”, it cares that the surrounding sentence still has ChatGPT’s cadence.

”But what about GPT-4 / GPT-4o / Claude Opus?”

Students often ask whether newer models bypass detection. Turnitin’s marketing pages explicitly name GPT-4 in the list of models the report is trained against. They do not name GPT-4o, Claude 3 Opus, or the most recent model releases, but they update the model on a rolling basis, and their public materials say they re-train against new LLM versions as they appear.

A safer frame than “GPT-4 vs GPT-4o detection rates” is this: all transformer-based LLMs share the same statistical fingerprints. A new model from any major lab is still optimising to produce the most likely next token. Until the underlying training objective changes, the fingerprints are there to find.

Where newer models can shift the picture is when you prompt them carefully (e.g. “vary sentence lengths, occasionally start with a conjunction”), or when their output happens to fall closer to a competent human writer’s style. Neither of those is a reliable bypass, they’re luck.

What a “detected” verdict looks like for the instructor

A high AI percentage on the writing report doesn’t drop you into an academic-integrity hearing automatically. It opens the report inside Turnitin’s instructor view with:

A single document-level percentage.
Highlighted sentences the model marked as AI-likely.
A standard disclaimer that the score is probabilistic.

What happens next is up to the instructor and the institution. The brackets most departments work with in practice:

AI score	What an instructor usually does
0–19%	Doesn’t open the AI report. Grades the paper.
20–39%	May read the report. May email the student.
40–59%	Reads the report carefully. Often asks for a chat or sees drafts.
60–100%	Treated as evidence of substantial AI involvement. May trigger formal review.

These are practice norms, not Turnitin’s published policy. Every institution differs.

How to check before you submit

Most institutions do not let students run Turnitin’s AI writing report on their own paper. The detector is gated to instructors. The realistic pre-submission options:

Use a Turnitin-parity report. A StealthZero AI Report bundles four detectors. Turnitin’s score, GPTZero, Winston, CopyLeaks, into a single PDF that you can open before the institutional submission. Add-ons start at $2.80 for a single report; included credits ship with Starter / Pro / Premium plans.
Use a strong proxy detector. The free StealthZero AI Detector runs the E.D.I.T.H engine, calibrated against real-world Turnitin scores. Sentrio v2 (four modes, Standard, Aggressive, Multilingual, Scholar) is stricter than E.D.I.T.H and useful as a second check.
Use GPTZero or Winston directly. Both publish accuracy claims on their homepages. GPTZero claims 99% accuracy with a 10,000 words/month free tier; Winston claims 99.98% accuracy with a 2,000-credit 14-day free trial. Neither is Turnitin, they have different training data and different score scales, but both are useful as a second opinion. See our Turnitin vs GPTZero comparison for the side-by-side.

A pattern that works in practice: run the detector first, fix what lights up, then generate a Turnitin-parity report only if you want the screenshot-ready PDF.

If you used ChatGPT and want to keep the work

Most students using AI on assignments don’t want to bin the work, they want it to read like their writing. The workflow that actually targets the detector’s signals:

Generate or paste the ChatGPT draft you want to keep.
Run it through a rewriter, not a paraphraser. StealthZero’s Humanizer ships five models. Origin (free unlimited), Sentinel-Lite, Sentinel-Max, F.R.I.D.A.Y, and Jarvis (Homer / Cohera / Max sub-models). Cohera is the strongest tier; per operator’s internal testing, it achieves 100% bypass on the supported detectors.
Lock your citations, quotes, numbers, and key terms. The humanizer’s locked-phrase feature pins these so the rewrite doesn’t accidentally rephrase a Vancouver-style citation into nonsense.
Verify with E.D.I.T.H or Sentrio v2 in the same window. Or export a four-detector Turnitin-parity report if you want a portable PDF.
Read the output before you submit. A humanizer is a rewrite tool, not a publish-tool. You’re still the author.

For the broader humanizer landscape, see What is an AI humanizer and Humanize AI text for Turnitin.

What doesn’t work

A few persistent myths worth shooting down explicitly:

Adding typos. The detector doesn’t reward perfection; it reads statistical patterns. Typos don’t change perplexity or burstiness.
Cyrillic / homoglyph swaps. Turnitin’s Flags Insight Panel explicitly looks for character substitution and shows it to the instructor as a red flag.
White-on-white text. Same, Turnitin’s Feedback Studio detects hidden text and surfaces it.
Submitting as a PDF or image. Turnitin extracts text regardless of format. Image-based submissions are flagged as suspicious.
Asking ChatGPT to “write like a human.” Prompt-engineering can shift output slightly, but it doesn’t override the underlying training objective. The fingerprints are still there.

Honest answer to the title

Does Turnitin detect ChatGPT? In its untouched form, yes, almost always. In its lightly-edited form, usually. In its substantially rewritten or properly humanized form, frequently not, because the prose’s statistical signature is no longer ChatGPT’s.

That gap is the entire reason the humanizer category exists, and the reason a real-world workflow looks more like AI for draft, human for cadence than AI for everything, hope for the best.

Turnitin AI detection guide, the long-form explainer on perplexity, burstiness, and how the report is built
Turnitin AI detection accuracy — Turnitin’s published figures vs independent audits
Turnitin vs GPTZero, institutional detector vs the consumer detector students reach for first
Free Turnitin check options — what counts as a real pre-submission proxy

References

Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., & Zou, J. (2023). “GPT detectors are biased against non-native English writers.” arXiv:2304.02819 — https://arxiv.org/abs/2304.02819
Sadasivan, V. S., Kumar, A., Balasubramanian, S., Wang, W., & Feizi, S. (2023). “Can AI-Generated Text Be Reliably Detected?” arXiv:2303.11156. https://arxiv.org/abs/2303.11156
Weber-Wulff, D., Anohina-Naumeca, A., Bjelobaba, S., et al. (2023). “Testing of detection tools for AI-generated text.” International Journal for Educational Integrity, 19(1). https://doi.org/10.1007/s40979-023-00146-z

Product

StealthZero AI Humanizer, five-model rewriter with locked-phrase preservation
StealthZero AI Detector, free E.D.I.T.H scans, four-mode Sentrio v2 on paid plans
Pricing, plans from $9.99/mo; Turnitin-parity reports from $2.80 each