Monday, March 03, 2025

Tools to Disrupt the Unethical Scraping of Art: My thoughts on Glaze and Nightshade

 For my anti-scraping tools experiments, I chose a personally meaningful AI-generated image from 2018. Created using an early version of Google's Deep Dream Generator in style transfer mode, the image merges a photograph of my right eye (taken during a rejection episode of my then three-decade-old corneal graft) with a cloud formation.

The creation process involved using the cloud image to establish the training set and then applying this "style" to a photograph of the right side of my face, originally intended as a profile picture. While I was pleased with the full-sized result, the cloud effect became less visible when reduced to smaller dimensions, making it appear simply as an unusual eye image. Though I ultimately didn't use it for its original purpose, this unique composition proved ideal for testing anti-scraping technologies.

This image represents an interesting circular journey—AI-generated art now protected from newer AI generative systems—making it particularly appropriate for this blog post.

Two emerging technologies from research indirectly supported at the University of Chicago offer some hope: Glaze and Nightshade. Though both tools share somewhat technical interfaces and require significant processing time on standard computers, they serve different protective functions:

Glaze embeds inappropriate AI training signals into your images, making it difficult for generative AI to accurately reproduce your artistic style when prompted by your name. My test on an eye/cloud composite image (see above) took approximately three hours on a standard i5 computer. The resulting image appears visually similar to humans but contains embedded neural network weights that should confuse any AI machine learning systems. While this doesn't prevent scraping, it reduces the likelihood of your style being accurately replicated. Which makes "glazing" suited to any artist seeking to protect their unique look.

Nightshade takes a different approach by "poisoning" specific text prompts associated with your work. This tool processes faster (about 30 minutes in my test) and creates images that appear normal to humans but contain corrupted data that could potentially damage AI training datasets. The theory is that widespread adoption might eventually reduce the reliability of AI models trained on scraped content, though this remains speculative. I'm currently testing Nightshade on some of my online photos to gauge any noticeable effects or responses.

Both tools add identifiable suffixes to filenames that might eventually be recognized by scrapers, ironically helping them avoid these protected images. This might in part be what artists want, those internet giants not using or freely sharing our work. Encouraging them to contact us, seek permission and pay to use would of course be better. Additionally, social media platforms typically strip metadata and rename files upon upload, potentially limiting the effectiveness of this possibility.

For maximum protection, artists can apply both techniques sequentially, though the practical impact of this combined approach is still unclear. 

While these technologies represent steps toward better intellectual property protection, they're early solutions in what will likely be an ongoing technological and legal conversation about artists' rights in the AI era.


Saturday, March 01, 2025

Tools to Disrupt Unethical Scraping of Art: My Experience with Pixsy

Several years ago, I beta-tested through flickr an application now known as Pixsy.com that scans the internet for unauthorized uses of protected images. I continue to use the free version, which offers limited scanning capabilities, though paid tiers provide expanded monitoring and response options.

The system effectively allows users to review potential matches, verify ownership, and decide whether to ignore usage, issue takedown notices, or pursue other legal remedies. Since most of my Flickr images are posted under Creative Commons licenses permitting non-commercial reuse with attribution, I typically verify whether users have followed these terms.

While I've received occasional permission requests (which I usually approve for non-commercial users), I've only needed to issue one takedown notice when a local government publication used my work without acknowledgement. Though the digital version was removed, I suspect printed copies had already been distributed. Given the circumstances and potential legal costs, I chose not to pursue further action.

My recent monthly summary showed a significant increase in matches from social media platforms, particularly:
- Google, despite no longer actively posting to Google Photos (my content should be private except for occasional Blogger posts)
- TikTok matches via a "TikTok scraper" application (I have no TikTok account or uploads)
- Instagram matches (while my content is viewable there, it shouldn't be downloadable)

This unexpected proliferation of my images across platforms I believed were private or don't use, highlights the ongoing challenges in protecting digital creative work. Monitoring tools like Pixsy are a reasonable start. Still, there is obviously more to be resolved to ensure creatives don't have their work plundered to make a profit for unscrupulous others without permission or compensation.