Why CMS and CDN Pipelines Break AI Image Provenance
WordPress gets blamed most often, but it is only the most visible example. The real problem is broader: modern web publishing almost never delivers the untouched original file to the public.
The short version
If an image passes through a CMS, CDN, image optimizer, messenger, or social platform, it will often be resized, re-encoded, or converted before the public sees it. That can destroy embedded provenance signals such as C2PA, even when the original asset was marked correctly.
The market keeps talking about the wrong file
Most provenance discussions implicitly assume a single file: you create it, you sign it, you publish it, someone else downloads the same file. That is rarely how the web works.
In practice, publishers work with at least two assets:
1. The original publisher asset
The file stored in the DAM, CMS, or Media Library. This is the easiest file to mark correctly.
2. The delivered rendition
The actual file the browser, app, or platform serves after resizing, format conversion, cropping, optimization, and caching.
Compliance and verification questions tend to concern the second file. Engineering teams often focus on the first.
Why WordPress is only the visible tip of the iceberg
WordPress makes the problem easy to observe because it creates thumbnails, `srcset` variants, and plugin-specific renditions right in front of you. But the same pattern appears almost everywhere modern images are delivered.
A marked original lands in the Media Library, then WordPress generates thumbnails, srcset variants, WebP conversions, or theme-specific crops. The browser may receive a derivative, not the marked original.
Product images are frequently resized and reformatted for collection pages, PDP galleries, and mobile breakpoints. The file delivered to shoppers is often a new rendition.
Services such as Cloudflare, Bunny, Imgix, Cloudinary, and similar image CDNs can compress, resize, sharpen, strip metadata, and convert to WebP or AVIF on the fly.
Instagram, WhatsApp, X, LinkedIn, and others re-encode aggressively. Even when a file started with intact embedded provenance, the shared version often loses it.
Concrete example: WordPress
A newsroom uploads a marked PNG into WordPress. The original in the Media Library is valid. Then a theme requests `medium_large`, a performance plugin generates WebP, and Cloudflare serves a compressed edge version. A reader right-clicks the image and downloads what the browser received.
Typical web path
Media Library original → marked
WordPress rendition → resized
CDN edge version → recompressed / converted
Downloaded browser file → often not identical to original
If that downloaded file no longer contains intact C2PA, the problem is not necessarily that the publisher never marked the original. The problem may be that the delivery stack replaced the original with a new rendition.
Concrete example: CDN image optimization
CDNs are designed to transform images. That is their job. They create smaller files, reduce bandwidth, and improve Core Web Vitals. But every such transformation is potentially hostile to embedded provenance data.
Common CDN operations include:
- metadata stripping
- recompression
- format conversion to WebP or AVIF
- resizing for breakpoints
- smart cropping or sharpening
Each of those can weaken or destroy embedded provenance, especially C2PA.
What this means for the EU AI Act
Article 50 is about transparency of AI-generated or AI-manipulated content, not about whether you once had a clean master file on your server. The conservative operational interpretation is therefore simple: the content people actually receive should remain transparently identifiable.
That is why "we marked the original" is useful evidence, but not the ideal end state if the public receives a different transformed rendition.
Why MarkMyAI uses four layers
No single layer solves the full publishing problem.
| Layer | Job in real-world pipelines |
|---|---|
| C2PA | Best standards-based provenance layer while the file remains intact |
| Invisible watermark | Helps survive many real transformations after metadata is gone |
| Audit trail | Provides a retrievable publisher proof record even when the file changed |
| Blockchain anchor | Adds independent long-term integrity and timestamp evidence |
The uncomfortable but useful conclusion
The problem is not "WordPress is broken." The problem is that the modern web optimizes image delivery by transforming files, while provenance standards are easiest to preserve when files stay unchanged.
This gap exists across publishers, ecommerce, DAMs, CDNs, and social distribution. That is exactly why provenance products need both embedded proof and recovery-capable proof paths.