Tuesday, July 02, 2024

Protecting Creative Work in the Age of AI Scraping

As creatives in the digital age, we're facing a new challenge: how to protect our work from indiscriminate scraping by AI companies. While tools like Creative Commons licensing have been a go-to solution, their effectiveness against AI data collection is questionable.

Creative Commons: A False Sense of Security?

I've long relied on Creative Commons to share my work while maintaining some control. My license specifies attribution, non-commercial use, and (previously) share-alike terms. However, I'm beginning to question whether this offers real protection against AI scraping.

The Reality of AI Data Collection

Many companies, often hiding behind research organizations, are scraping vast amounts of online data to train AI models. This process often ignores licensing terms and lacks proper attribution or curation.


Changing Tactics

In response, I've updated my blog's license from "share-alike" to "no derivatives," hoping to prevent AI from copying my style. However, the legal landscape around this issue remains unclear, especially in Europe.

New Technological Defences

A promising development is the creation of tools that embed changes in image files. These alterations are invisible to humans but can disrupt AI training, potentially "poisoning" the dataset. Glaze and Nightshade are two such tools, though they're still in development and can be resource-intensive to use.

The Path Forward

Despite these efforts, I'm still uncertain about how to confidently share my work with those who behave ethically while protecting it from misuse. As creatives, we need to stay informed about these issues and continue seeking effective solutions to protect our work in the AI era.

What are your thoughts on protecting creative work in the age of AI? Have you found any effective strategies?


I've prepared this blog post with some good advice and a little rewording from Claude.AI