Monday, November 23, 2009

A bit of a clean up

I’ve recently got a Western Digital My Book 1 terabyte USB drive, I like it, it was easy to install, I just had to reformat to use NTFS (It is distributed with FAT32 presumably for lowest level compatibility)and decided to move my photo archive to it. A terabyte is a lot of space, my collection only took up 207 GB, 125,820 photos. The trouble is I already knew my archive collection had been polluted a bit, with more than a few duplicates. Finding them by hand in such a large collection was not a task I was looking forward to.

Finding Duplicates with Picasa

After I’d moved my files, it took a while, I pointed picasa to watch the new folders on My Book drive, that took a while to “index” my photos. I thought I’d try out the picasa duplicate finder (its under Tools\Experimental\Show Duplicate files). I learnt I have duplicates in 683 folders. The bad news was unless the duplicates where in the same folder or adjacent folders (which very few where) you don’t get to see the duplicates. So I would suggest anyone still wanting to use this features, does so with extreme caution (after running a backup). Time for a rethink.

Duplicate Cleaner

Next I looked for a specialist duplicate file finder utility on the net, there are lots. I tried a few but the one I liked is freeware and called Duplicate Cleaner by Digital Volcano. I also does a good job with music collections. Basically it does what the name suggests it runs some fair deep searches, with intelligent ways to filter and direct the searches. Duplicates (or more of the same file) are group together and the tab displaying those groups has a useful image preview. What makes this more useful than the others I reviewed is it ability to match content, using Digital Volcano’s own MD5 hash code (a kind of file finger print) instead of just file names. The  bit I really liked best was that you have a choice to deleted to the recycle bin, moved iduplicate delete movento an archive elsewhere, or replaced with hardlinks (to the original). You can even save (& reload) the duplicates file list as a .csv format.
I now know that I had 934 photo duplicates. I first moved them to a new location, backed them up to DVD and deleted them. Daunting task over!
