Monday, July 08, 2024

Have You Been Scraped? Uncovering AI's Training Data

In the age of generative AIart and large language models, the question of concern for any creative artist (I am loath to call them "content creators" but social media does): 

Has our work been used to train AI without our knowledge or consent? A new tool offers some answers and a way to take action.


The website haveibeentrained.com allows users to search through vast, publicly researched AI training datasets like LAION 5B using text prompts. Curious about my own digital footprint, I decided to give it a try.

https://haveibeentrained.com/


Searching my name yielded numerous images from other Norm Hansons, but among them was a familiar face—my own. A self-portrait rock painting I'd posted long ago as a profile picture on Artists at Large had made its way into the dataset. While not overly distressed by this single instance, it did give me pause.

More concerning was the discovery that my charcoal sketch of Sir John Monash, created for an exhibition in 2018, had been scraped from my website. This unauthorized use of my work felt like a violation of my artistic rights.


Fortunately, the website offers a small measure of control. For individual images, users can tick a box that adds the image to a "Do Not Train" register, signaling to participating groups that you don't want your work included in future neural network training sets. For broader protection, entire domains can be registered.

It's worth noting that these actions are somewhat akin to closing the stable door after the horse has bolted. The data has already been used in training existing models. However, it's currently our best option for protecting our work moving forward.

This situation highlights a critical need for transparency and ethical behavior from those creating large language models, whether for legitimate research, commercial interests, or other purposes. As AI continues to evolve, so too must our understanding of its implications for creative rights and data privacy.

Have you checked if your work has been used in AI training datasets? Share your experiences and thoughts in the comments below.

No comments: