AI Content & Technical SEO

How AI-Generated Content Leaves HTML Footprints in Your CMS: ChatGPT, Gemini, DeepSeek, Doubao & Mistral

Copy-pasting from ChatGPT, Gemini or DeepSeek into WordPress or Shopify? Your source code may be leaking it — here's what data-start, data-sourcepos, ds-markdown-paragraph and other AI HTML signatures look like

Published

When you paste AI-generated content directly into a CMS, you're not just pasting words — you're pasting HTML. And that HTML often contains model-specific fingerprints invisible to readers but readable in source code. Marcus Pentzek, Director of SEO at Jademond Digital, identified distinct markup signatures across five major AI models: ChatGPT embeds data-start and data-end attributes on formatted elements; Gemini uses data-sourcepos with line/character range values; DeepSeek wraps paragraphs in class="ds-markdown-paragraph"; Doubao generates 18+ levels of nested divs with classes like auto-hide-last-sibling-br; Mistral adds dir="auto" to every paragraph tag. Whether Google penalizes these is unconfirmed — but on platforms like Shopify, these markers survive publishing intact and are fully visible in page source.

🧱 It’s Not Just Text — It’s Code

When you copy and paste AI-generated content into your CMS, you’re not just moving words. You’re moving style—and with style comes markup. And that means HTML.

Most people don’t think twice about this. They generate content in a chat window, paste it into a WYSIWYG editor, hit “publish,” and move on. But behind the scenes, some AIs wrap their output in very specific HTML structures. Paragraph tags, span styles, inline CSS—it’s all baked in.

Some of these patterns are subtle. Others are surprisingly easy to spot—especially if you know what to look for. And while Google may not officially penalize AI-generated content, it does analyze page structure, and unusual markup can be a red flag.

In other words: your CMS might be leaking the fact that your content is AI-generated—whether you realize it or not.

🔍 HTML Markup Patterns by AI Model

DeepSeek’s Markup Structure in CMS

DeepSeek AI text output with identifiable HTML tags in CMS
Screenshot of DeepSeek AI-generated HTML markup showing inline styles and CMS formatting patterns

DeepSeek wraps each paragraph in a <p>-tag with a class named “ds-markdown-paragraph”—a clear indicator of its origin, with “ds” likely standing for DeepSeek. The structure is minimal and organized, making it one of the cleaner outputs among current AI models.

ChatGPT’s Source Code Footprint

Baidu Doubao AI-generated text with HTML structure and styling in CMS view
ChatGPT HTML output example with span tags and embedded formatting in CMS content

ChatGPT-generated content typically uses standard <p>-tags for paragraphs—but it goes further. Elements used for formatting, such as <srong>, are also wrapped with custom attributes like data-start and data-end. These attributes seem to store vertical position values (as plain integers), possibly for internal rendering or layout purposes. While they don’t appear to be used by any major CMS, they’re clear signs the content originated from a generative AI system.

Doubao’s Text Styling & HTML Output

Example of HTML output from Doubao AI content pasted into CMS
Example of HTML output from Doubao AI content pasted into CMS – quite nested and deep for only little content

Doubao’s HTML output is notably complex. Instead of simple paragraph tags, it wraps text inside a series of nested <div>-elements – sometimes 18 or more deep – each loaded with multiple classes like “auto-hide-last-sibling-br”, “paragraph-fz9qvc”, and others like “relative”, “children-wrapper”, and more. This heavy structure even includes spans for spacing and a data-testid=”doc-card” attribute on some containers. Interestingly, list elements aren’t properly enclosed in tags, which can complicate rendering and editing in your CMS.

Gemini’s Underlying HTML Code Signature

Google Gemini AI content HTML showing unique formatting and inline code patterns
Screenshot of Gemini AI text and its HTML markup in a content editor

Gemini’s output is quite clean and similar to ChatGPT’s, using <p>, <hr>, and headline tags to structure content. What stands out is the consistent use of a “data-sourcepos” attribute, with values like “1:1-1:445”. This attribute appears to mark the start and end positions within the source text—likely indicating line and character ranges—which could help trace the exact location of each content block in the original input.

Mistral’s HTML Markup Traces

Mistral-generated content showing underlying HTML and technical style
Example of HTML patterns from Mistral AI-pasted content

Mistral’s generated content is remarkably clean and minimalistic. It primarily uses simple <p>-tags, each carrying a dir=”auto” attribute—likely indicating automatic text direction. This subtle marker, combined with the streamlined structure, makes Mistral’s HTML footprint easy to spot and very lightweight compared to other AI-generated content.

Other AI Tools: No Clear HTML Footprints Detected

I also tested several other AI content generators—like Monica, Baidu’s AI, and a few more—but didn’t find any distinctive HTML markers or coding patterns in their output. Their formatting appears cleaner or more neutral, making it much harder to detect AI origins through the HTML source alone.

⚠️ A Quick Disclaimer on CMS Behavior

Before you draw conclusions, here’s one important caveat:

Not all CMS platforms preserve the technical footprints left by AI-generated content.

In fact, when I copied content from ChatGPT into our WordPress instance and published it, none of the extra tags or attributes appeared. WordPress seemed to clean things up automatically behind the scenes.

But in a different case – working on a Shopify blog for a client – those same hidden HTML markers remained intact and visible in the source code. That’s actually how I discovered this in the first place.

So:

  • 🔍 Your CMS may handle AI markup differently.
  • 🧪 I encourage you to inspect your own process.

This post isn’t about naming and shaming any platform or tool. Instead, it’s a heads-up—especially for fellow SEOs, content strategists, and technical marketers—about a subtle but increasingly relevant layer of content hygiene.