Our Testing Methodology
Every claim we make is backed by systematic testing. No cherry-picked results, no marketing metrics. Here's exactly how we measure what we measure.
Our Testing Pipeline
We run this exact sequence every Monday morning. Results are published within 48 hours of any meaningful change to the bypass rate.
Generate Test Documents
50 documents per AI model (ChatGPT-4, Claude 3 Opus, Gemini Pro) × 5 models = 250 documents. Each document is 400–600 words on a randomly selected academic or professional topic. Documents are generated with default settings — no special prompting to make them more or less detectable.
Baseline Detection Scan
Every document is submitted to all 5 detection systems before humanization: Turnitin (institutional API), GPTZero, Originality.ai, Copyleaks, and ZeroGPT. We record the exact AI probability score for each document from each detector.
Humanization Pass
All 250 documents go through TextHumanizer on each of the three tone modes (Scholarly, Creative, Casual). This generates 750 humanized documents per week in addition to the 250 baseline documents.
Post-Humanization Scan
Every humanized document is re-submitted to all 5 detectors. We record the new AI probability score. A document is counted as "passed" when the AI probability score falls below 25% (or equivalent) on a given detector.
Meaning Preservation Test
A stratified random sample of 30 humanized documents per week is reviewed by three trained human annotators. Annotators score each document on a 10-point scale across five dimensions: factual accuracy, argument preservation, citation integrity, tone appropriateness, and readability. Documents scoring 8.5+ on average are counted as "meaning preserved."
Publish & Update
Results are added to our tracking database. If bypass rate on any detector drops below 95%, we trigger an algorithm review. We publish updated figures on this page within 48 hours of any change exceeding 2 percentage points.
How Each Detector Works
Not all detectors are equal. Here's what we know about each system, how we test against it, and our current results.
| Detector | Primary Method | False Positive Rate | Our Bypass Rate | Update Frequency |
|---|---|---|---|---|
| Turnitin | Perplexity + sentence-level ML | ~3–8% (higher for non-native speakers) | 98% | Major: quarterly |
| GPTZero | Perplexity + burstiness analysis | ~4–6% | 97% | Continuous |
| Originality.ai | Custom LLM classifier | ~3–5% | 96% | Monthly |
| Copyleaks | Semantic pattern matching | ~2–4% | 99% | Bi-monthly |
| ZeroGPT | Statistical distribution analysis | ~5–9% | 95% | Weekly |
False positive rates sourced from published research and independent audits. Our bypass rates reflect the current week's testing. Last updated: April 7, 2026.
Current Bypass Rates
Results from the week of April 7, 2026. 250 GPT-4 documents tested against all 5 detectors after semantic humanization.
How We Measure Meaning Preservation
Bypass rate is half the story. An output that passes every detector but scrambles the original argument is useless. Our meaning preservation methodology uses trained human annotators, not automated scoring.
Each sampled document is reviewed by three annotators independently. Annotators never see the original AI-generated input — they score based on whether the humanized output makes coherent, accurate claims. Scores are averaged; documents with high annotator disagreement (score variance > 2 points) are escalated to a senior reviewer.
Independence & Editorial Standards
Testing Team Separation
Our testing team operates separately from marketing. Bypass rate figures are calculated by automated testing infrastructure. Humans don't select which test results to publish.
Unfavorable Results Published
When a detection system update reduces our bypass rate, we disclose the drop immediately and publish the updated figures before our algorithm response catches up.
Weekly Update Cycle
Every Monday we run the full 250-document test suite. Results on this page are never more than 7 days old. The date of the last update is shown at the top of this page.
Verification is better than trust. Try the tool and run the output through your own detectors.
Test It Yourself — FreeFrequently Asked Questions
How do you decide when to update your algorithm?
We run automated tests every Monday against all 5 detectors. If any detector's bypass rate drops below 95%, we immediately begin analyzing why the detection system was updated and adjust our semantic restructuring approach accordingly. Major detector updates (like Turnitin's quarterly releases) always trigger a review. We publish changes immediately when bypass rates shift by 2+ percentage points.
What does "meaning preservation" actually measure?
Meaning preservation is the degree to which humanized output maintains the original argument, facts, and citations from the AI-generated input. We measure this through human annotation across five dimensions: factual accuracy, argument preservation, citation integrity, tone appropriateness, and readability. Three independent annotators score each document; scores averaging 8.5+ count as "preserved." This prevents using detection bypass at the cost of scrambling your actual message.
Why do different detectors require different approaches?
Each detector uses fundamentally different methods: Turnitin uses perplexity and sentence-level machine learning, GPTZero analyzes burstiness patterns, Originality.ai trains custom LLM classifiers, Copyleaks uses semantic pattern matching, and ZeroGPT looks at statistical distributions. Our semantic restructuring approach works across all of them because it reorders ideas and varies sentence construction at the meaning level rather than just swapping words.
How often do detectors update their models, and how does that affect you?
Detector update frequency varies widely. Turnitin releases major updates quarterly, Originality.ai updates monthly, while GPTZero and Copyleaks update continuously. This is why we test weekly — we catch detector changes within days rather than weeks. When a detector updates and our bypass rate drops, we immediately disclose the drop and begin refining our algorithm. Weekly testing means you always know our real, current performance.
Can I verify your testing results myself?
Absolutely. Our methodology is fully transparent and designed for independent verification. You can take any humanized document from our tool and submit it to Turnitin, GPTZero, or any other detector to confirm our claimed bypass rates. We encourage this verification rather than asking you to trust our numbers. The only limitation is that institutional APIs (like Turnitin) may require institutional access, but public detectors are fully available for testing.