How AI-Generated Text Leaves Technical Footprints in Your CMS

More and more of what we read online today is written by AI. That’s not inherently a bad thing. I rely heavily on AI myself when crafting content—especially since English isn’t my first language. Before using AI, my writing was often tangled in complex sentences, grammar slips, and awkward phrasing. I think in layers, and it used to show—sometimes too much.

Now, AI helps me express ideas more clearly. It doesn’t replace my voice; it supports it. It makes my thoughts easier to consume—for me and for the people I write for. That’s a win.

But not everyone uses AI this way. Some treat it as a shortcut. Instead of collaborating with it, they churn out content at scale—often unedited, often impersonal, and clearly disconnected from their own thinking. It’s content for content’s sake. The goal? Game Google rankings, attract traffic, and monetize clicks—whether through ads, affiliates, or product sales.

Readers are starting to notice. That uncanny “this sounds like ChatGPT” feeling? It’s real. I sense it instinctively now—like spotting a formula hidden in plain sight.

That’s led to a rise in AI detection tools—platforms that claim they can spot whether text was written by ChatGPT, Gemini, DeepSeek, Mistral, Baidu Ernie, and others. I’ve tested quite a few. Some work surprisingly well. Others completely miss the mark.

But this article isn’t about linguistic fingerprints or writing style. It’s about something else—the technical footprints AI-generated content can leave behind when it’s dropped straight into a CMS. In many cases, the clues aren’t in the words. They’re in the code.

🧱 It’s Not Just Text — It’s Code

When you copy and paste AI-generated content into your CMS, you’re not just moving words. You’re moving style—and with style comes markup. And that means HTML.

Most people don’t think twice about this. They generate content in a chat window, paste it into a WYSIWYG editor, hit “publish,” and move on. But behind the scenes, some AIs wrap their output in very specific HTML structures. Paragraph tags, span styles, inline CSS—it’s all baked in.

Some of these patterns are subtle. Others are surprisingly easy to spot—especially if you know what to look for. And while Google may not officially penalize AI-generated content, it does analyze page structure, and unusual markup can be a red flag.

In other words: your CMS might be leaking the fact that your content is AI-generated—whether you realize it or not.

🔍 HTML Markup Patterns by AI Model

DeepSeek’s Markup Structure in CMS

DeepSeek AI text output with identifiable HTML tags in CMS
Screenshot of DeepSeek AI-generated HTML markup showing inline styles and CMS formatting patterns

DeepSeek wraps each paragraph in a <p>-tag with a class named “ds-markdown-paragraph”—a clear indicator of its origin, with “ds” likely standing for DeepSeek. The structure is minimal and organized, making it one of the cleaner outputs among current AI models.

ChatGPT’s Source Code Footprint

Baidu Doubao AI-generated text with HTML structure and styling in CMS view
ChatGPT HTML output example with span tags and embedded formatting in CMS content

ChatGPT-generated content typically uses standard <p>-tags for paragraphs—but it goes further. Elements used for formatting, such as <srong>, are also wrapped with custom attributes like data-start and data-end. These attributes seem to store vertical position values (as plain integers), possibly for internal rendering or layout purposes. While they don’t appear to be used by any major CMS, they’re clear signs the content originated from a generative AI system.

Doubao’s Text Styling & HTML Output

Example of HTML output from Doubao AI content pasted into CMS
Example of HTML output from Doubao AI content pasted into CMS – quite nested and deep for only little content

Doubao’s HTML output is notably complex. Instead of simple paragraph tags, it wraps text inside a series of nested <div>-elements – sometimes 18 or more deep – each loaded with multiple classes like “auto-hide-last-sibling-br”, “paragraph-fz9qvc”, and others like “relative”, “children-wrapper”, and more. This heavy structure even includes spans for spacing and a data-testid=”doc-card” attribute on some containers. Interestingly, list elements aren’t properly enclosed in tags, which can complicate rendering and editing in your CMS.

Gemini’s Underlying HTML Code Signature

Google Gemini AI content HTML showing unique formatting and inline code patterns
Screenshot of Gemini AI text and its HTML markup in a content editor

Gemini’s output is quite clean and similar to ChatGPT’s, using <p>, <hr>, and headline tags to structure content. What stands out is the consistent use of a “data-sourcepos” attribute, with values like “1:1-1:445”. This attribute appears to mark the start and end positions within the source text—likely indicating line and character ranges—which could help trace the exact location of each content block in the original input.

Mistral’s HTML Markup Traces

Mistral-generated content showing underlying HTML and technical style
Example of HTML patterns from Mistral AI-pasted content

Mistral’s generated content is remarkably clean and minimalistic. It primarily uses simple <p>-tags, each carrying a dir=”auto” attribute—likely indicating automatic text direction. This subtle marker, combined with the streamlined structure, makes Mistral’s HTML footprint easy to spot and very lightweight compared to other AI-generated content.

Other AI Tools: No Clear HTML Footprints Detected

I also tested several other AI content generators—like Monica, Baidu’s AI, and a few more—but didn’t find any distinctive HTML markers or coding patterns in their output. Their formatting appears cleaner or more neutral, making it much harder to detect AI origins through the HTML source alone.

⚠️ A Quick Disclaimer on CMS Behavior

Before you draw conclusions, here’s one important caveat:

Not all CMS platforms preserve the technical footprints left by AI-generated content.

In fact, when I copied content from ChatGPT into our WordPress instance and published it, none of the extra tags or attributes appeared. WordPress seemed to clean things up automatically behind the scenes.

But in a different case – working on a Shopify blog for a client – those same hidden HTML markers remained intact and visible in the source code. That’s actually how I discovered this in the first place.

So:

  • 🔍 Your CMS may handle AI markup differently.
  • 🧪 I encourage you to inspect your own process.

This post isn’t about naming and shaming any platform or tool. Instead, it’s a heads-up—especially for fellow SEOs, content strategists, and technical marketers—about a subtle but increasingly relevant layer of content hygiene.

Marcus Pentzek, International and China SEO Expert
About the author: Marcus Pentzek, Director of SEO, Jademond Digital - Marcus Pentzek has been shaping the SEO landscape since 2008, beginning as a consultant in Germany and later pioneering SEO strategies at Searchmetrics GmbH. His deep dive into the Chinese market began in 2012 while directing marketing at Yoybuy Ltd in Beijing, gaining firsthand experience in e-commerce SEO in China. Since 2022, he leads SEO at Jademond Digital, focusing on innovative, data-driven methods tailored for Chinese audiences. His blog posts merge over a decade of global SEO expertise with practical insights into the Chinese digital environment.