Thursday, April 06, 2017

Making A Hash Of Checksums

I come from a very deep computer background, (I first used computers in 1969) and have managed several mini computers (DG, PDP & VAX) and networks of PC so I am well grounded in the need for both Backup (keep a safe copy) and Archive (long term storage). Now if you read a lot of net advice you might think they are the same and just call that process backup. Backup is important, and a lot of people still don’t do it adequately and I have written a fair bit about it, but archiving seems to have escape my posting attention (other than my The Importance of Cataloging your Digital Photo Archive Post).  I need to rectify that and I will begin with a bit more detail on how I seek to avoid storing, and identifying corrupt file

MD5 checksums (aka hash values)

Utility in Total Commander that creates MD5 Hash Table

With the recent death of two western Digital 2GB external drives, I have become a bit paranoid about the potential for file corruption as a drive slowly fails (One drive just dropped dead the other slowly started to show problems). Because most of the files are in a binary format such corruption can easily go unnoticed and the corrupt file could easily be diligently backed up. There are solutions, like having mirrored drives (extra hardware & software and expense) or regularly reviewing the files (a challenges with lot of photos). I have found a much simpler approach, that is to use Example of the MD5  Hash Valueschecksums created from each file. If the file is corrupted the check sum will change. So each time a file is moved or the disk is rotated these check sums can be checked. I choose MD5 because they are a public format and there are a lot of utilities to create them and they are widely used to detect duplicate files (particularly photos, video & music Files. It would be nice if photo software undertook this task automatically but alas while I can see similar numbers in picasa and lightroom they are not true MD% hash values but just look like them. My only conclusion is they are propriety formats, which creates the risk that if the company disappears (google has already washed its hands of picasa) so does the long term suitability as check sums.  The process to create these files can take some time and I am using Total commander, which creates the check sum for an entire subdirectory (folder) and writes a single .md5 file containing the check sums. This is an ASCII file and can easily be read or a particular Hash value cut and pasted into a Example of  MD5 checking processdifferent utility to verify check sums. They are small and take up negligible space. To verify the checksums also takes a little time because the checking software must read through each file. Good utilities will be able to read the .MDR file and report missing files or errors (ie corrupted files). Total commander has a second utility to do this job. So the final big question is – “how do I know the photos are ok, not corrupted, when I make the MD% hash value?” Simple answer is I can’t be sure so I also have to at least look at the files, using picasa, the default windows photo viewers, Lightroom or Corel Aftershot Pro (which actually seems the fastest option particularly with RAW files). So setting up the checksums for a proper archive does take some time but but I can have more confidence in that archive as it is passed around different media and locations.

No comments: