When Copies Aren't Perfect: A Visual Warning About AI LLM Training
We tend to trust that digital copies are exact replicas of the original. But what happens when we copy a copy, then copy that copy again?
Using my cartoon alter ego Alvin, I wish to demonstrate how the compressed JPEG image format, using its "lossy" compression method, degrades with each generation of copying. While JPEG saves disk space and speeds up downloads, it achieves this by discarding data each time you save.
Starting with a clear image of Alvin shouting "gibberish" at a screen, created in Coreldraw. I repeatedly saved and resaved it as a JPEG:- At
80% quality (the common standard), the first 5 generations showed minor
deterioration at high-contrast edges.
- Switching
to 70% quality (used by many social media platforms), problems became
obvious by generation 10
• At 50% quality, by generation 20, the image was almost unrecognisable. Even the mispelt word "gibberish" Alvin was shouting became illegible
Why This Matters
This isn't just about image quality. It's a powerful analogy for what's happening with large language models trained on "synthetic data", a dodgy term used by the LLM enthusiasts for AI-generated content fed back into AI systems.
Just as each JPEG generation compounds tiny adjustments until the image becomes gibberish, AI systems trained on their own output accumulate biases and inaccuracies. The feedback loop doesn't make things more accurate, it amplifies what's wrong.
When we assume digital processes are perfectly reliable, we miss how errors compound through iteration. Each cycle reinterprets the previous one, carrying forward and magnifying small mistakes, biases and fake stuff. Eventually, we're left with output that bears little resemblance to the original truth.
The lesson? Whether it's image compression or AI training, recursive copying without fresh input leads to degradation. Garbage in, garbage out, feeding this back in and the garbage out just gets worse.
No comments:
Post a Comment