Wednesday, December 30, 2015

The Importance of a Catalogue to your Digital Photo Archive

A shoe box or album full of photos is easy to flick through and you can quick see both its condition and content. Digital files however have limited visibility, especially a collection of digital photos. The Photo file will be in binary and unless you have specific software to decode it it will just be symbols and numbers at best. One exception is the jpeg format which is so widely used that most software, browsers, computer and smart phone apps with recognise and decently render it. Other formats and in particular the camera RAW formats will always require social software (and given that these formats are proprietary and there is a risk of them becoming obsolete) may not be guaranteed to be available on different hardware and/or in the future. Further It is unlikely that scanning the file names from the camera (like IMG_0007.CR2) will be very meaningful. You need a reader that understand the format of that data and can render it as an image. A decent catalogue (or Index) of you photos need both to be able to easily view you photos but it also benefit from some organization that makes the images quicker to find

So how do you figure out what is there. There are two common solution,

  1. The first is Thumbnails, normally postage stamp size low resolution version that give you a peek at what is in the photo.
  2. The second is to use metadata (data about the context of the photo itself).

There are lots of decent (and free or inexpensive thumbnail programs) like Picasa XnView, Photo Mechanics, the default folders view in windows even lets you select a small, medium or Larger icon view, which are thumbnails. Some can handle RAW formats but other may not (or rely on updating a codex for that format). Other software like Aftershoot and OnOne 10 have a browse mode so you don’t have to “import” photos first, you just get presented with a grid of thumbnails. These are fast and a joy to use compared with opening up directories of images in say Lightroom. Having a good thumbnail facility that is fast and easy to use is the first additional need for a good archive.

Metadata, (data about the photo and it’s contents) is a little more complex but it should not be. There are two standards EXIF and IPTC metadata formats. The EXIF data is really mainly about the camera and image and has been well addressed by camera and phone manufacturers (although they may not record all the possible metadata). This EXIF metadata is included with the standard jpeg data format so the metadata travels with the image embedded in it. The IPTC Interchange Model is a way to record information about who created the photo and how it is licenced for distribution and publication (it was set up for exchange of information, and images between newspapers  and is not a big deal in a personal photo archive). This information can be, but seledom is, embedded in jpeg files. Most software that works with photo will let you display both forms of this metadata. If you are having trouble reading the metadata Phil Harvey has written a wonderful open source program called ExifTool, which can interact with metadata in many formats (eg, embedded in the jpeg or stored in an .xmp sidecar file)

Whilst you can see a scratch of water stain on an old photo. It is tricker to judge damage too (corruption of a digital file) because in its raw binary digital format it if not easily comprehended. Even worse even a little damage may render the digital image unusable. There are technique to ensure digital files are not corrupted, most notable the checksums and Hash function (they imageare not hash tags). Of these MD5 is the most used for photos and is usually the technique of choice applied for detecting duplicates. Whilst I can see some software uses checksums (see picasa.ini of right) they do not appear to be MD5 numbers. Adobe use two different IDs which look a little like MD5 hash values, the first is a unique image ID created when The image is scanned by lightroom (or other adobe utilities) and the seconds is an instance ID and is updated whenever the files is accesses and parametrically adjusted (ie the .XMP file and/or catalogue are altered). I can not fnd any definitive documentation on them So I’m just going to assume they are yet another propriety secret and thus I doubt I will research them any further.

Fortunately there are lots of well tried and tested MD5 utility and programs, many free to download. What I discovered is the best way to create these hash table is not one for each file but with one file for each sub-directory. I already had a program called total commander (and alternative to windows explorer) which does this. Further you just need to click on that file and then Total commander will scan the files mentioned and report back if the hash value matches. Yes I did a test truncating a photo and the checked reported a hash value difference. There are a number of utilities so I assume you will have no problems finding a similar one.

I have been using this approach for the past 6 months it very simple and I just create the Hash Table for each directory when I do my monthly backup checks. I have been reprocessing some older directories as I move to a hard disk based archive. Running the hash function did Identified 2 photos that had read problems, luckily I had another copy of those which was fine. I am now slowly adding hash table to all my archived photo folders.

I’m now even more certain you need to also keep a checksum or hash value, as well as an easily accessed thumbnails and standard metadata utilities in addition to your archived files. Copies of the applications used to created these views and information are an important part of your catalogue and should be stored with your archived photos.

Monday, December 28, 2015

Changing the way I backup my archive

This is something that keeps bugging me about ALL photo management systems. There is plenty of discussion about backups but scant regard for archives. A backup is pretty easy to understand and do, it is a second copy. The concept for computer files (or any digital information) being that if something goes wrong, I can just go the backup and recover what was lost or destroyed. This is easy enough to understand for one file, a day or month’s photos but gets a bit trickier as information mounts over a year or more. Do you keep backing up everything or start a new series of backups. For those with decent DSLR cameras with high megapixel counts that means buying new and bigger external USB drive or more racks on a RAID System or Drobo. There are good strategies to safely tame this plethora of images but I need to get back to the difference of true Archives.

An archive, is also likely to be a copy but it has two key differences to a backup. The first and most important is it must provide easy access to find the photo (or digital files). This first requirements means there must be a catalogue or index of how best to find the photo. This requires that the storage be organized in such a way that the important characteristics can be recorded where necessary but also that the storage location is recorded as well. The second difference has to do with the physical format of the storage media, where it is stored and how likely that media is likely to be readable in the future. Is the catalogue always available or is it also buried in the data.

Today I will talk a little about this second issues and keep the idea of an index of catalogue for a number of future posts.IMG_9245 copy copy

I have for several years maintained 3 archives. I have one live on an external harddisk with 2GB capacity, and two CD/DVD collections stored at two different locations (one well off site). Managing and searching in them is starting to become a daunting task (ie locating the right disk and then looking through them takes time, despite being logically labelled and well stored, in suitable storage boxes). Further I’m getting close to that 2GB of photos now. So I have opted to put at least one of the stored archives also on hard disk (or more precisely a couple of hard disks). I do have strong desire not to keep buying bigger and newer gear, so I purchased a small USB hard drive dock (mine is the Shintaro) and it can read both 2.5”and 3.5” SATA HHD drives and also has an e-sata cable. So all I had to do is take the hard drives out of older computers (which I always do before I scrap them) and now I have extra external storage. They are of course slower than a new drive and some of my older drives are smaller 512MB. However Importantly I am not letting my personal and business data get out into the wild (it is well overwritten by photos) and I am also getting a more easily stored medium (a 2.5” 1 TB drive, holding a decade of photos takes up less room than 4 or 5 DVDs roughly a month of my shooting). I did however required two old drives to backup my current archive, but I have several so I might end up using this set this year and cycling over to another pair next year.

Trying to contemplate the best way to do incremental backups (monthly) to a remote disk, has left me sticking with the DVDs as the best option for the time being.

Saturday, December 26, 2015

Searching for a suitable Archive Media

The biggest problem with setting up a long term digital archives related to what physical storage media you plan to use. The Digital world is littered with various storage media that have a very short operation lives and/or have fallen from common usage. I have used Punch paper tape, Punch Card, Reel to Reel Magnetic tape, Various Tape Cartridges, 8”Floppy Disk (yes they were once that big but they could hold around 1.3mb), 5¼” Floppy Disk (only 360kb), 3 ½” Rigid disk (720kb but later 1.4mb), Zip Disks (a glorious 100mb), CDs (compact disks with a capacity of around 700MB) on to DVD (which generally hold 4.7gb) and Blu-ray (which can hold up to 25gb). Unfortunately the unspoken secret here is as the capacity and specifically the information density has increased the shelf life tends to decrease. The punch paper tape for the late 60s is good to go (if only I had a reader) if it hasn’t been torn, so are the punch cards from the early 70s, but everything else has probably reached its best by date and there is a reasonable chance you will experience corruption or errors reading older media already. Even CD and DVDs, once thought invincible start to have troubles after only a few years, even less if they are not handled carefully.

What about solid state memory (like SD memory cards, USB keys or the newer SSD drives), unfortunately they have limited life spans as well (more to do with the number of reads and writes) in normal usage they may outlast the next form of storage HDD. The conventional hard drives (whether built into your computer or as an external USB style) also have some telling untold secrets (see backblaze study), which suggests seagate drives are more likely to fail than western digital, but I have had the opposite experience I have 4 dead western digital my book USB drives but 3 healthy seagate drives, one of which considerably older). Furthyer I have only ever had 1 ( failed under warranty and was replaced) of my 7 toshiba hardisks (some in laptops other are portabale backup style). The caveat here is your spinning disk hard drive will probably last a day or two longer than the guarantee.

So what about cloud storage that’s forever isn’t it? Well if you look in the terms of Service (TOS) offered most cloud service providers there is generally no mention of loss or damage of stored data (I guess they assume you must already have a copy as well, ie from their point of view the data is backed up). Only Amazon seems to have addressed this, and I hope companies like google could be relied upon. However they are companies that are unlikely to last forever. So cloud backup is where we are at now, not cloud archiving yet. So where does that leave us. We will need to plan to regularly move any digital archive onto new media as technology change and popular taste determines.

Clearly an Archive of Digital Photos must not be hardware/media specific.

I trust all this scares you into making a backup of your digital photos right now. Stop reading and go and do it!

Thursday, December 24, 2015

Two months of distress and dissapointment


Some may be wondering why my blogging and other social net contributions have fallen away so sharply. The Answer is simple upgrade distress. Whilst I had experimenting with upgrading to windows 10 months ago (and given up) I felt brave enough back in early november to give it another go on my main office machines. I also felt it was time to upgrade to OneOne 10. Also force upon me was several updates to google photos (on my android phone). Well it did sort of work, the windows updates left many programs having to be reinstalled or registered and I’m still trying to sort out the various drivers I need to update (some have been updated but still misbehave, eg Wacom tablets) And don’t get me started about sharing printers (that at least I have resolved).


BUT WHY? I am a person that has used computers continuously since 1969 (yes that’s 46 years!) so you might expect I would find updating some software easy. I didn’t! I do appreciate windows 10 may be lighter and tighter but without good documentation (on-line user forums and FAQs are not a substitute for real information about drivers and settings) I have floundered. I wonder how many people have just given up.

OnOne’s Technical support have been helpful despite there no longer being phone or direct email support. I won’t mention microsoft or google and various hardware companies, Its time to lift your game everyone.

My real disappointment is with two applications Microsoft’s Live Writer and googles photos.

Windows Live Writer (which I use mainly for this and other blogs) seemed to work but I gave up trying to use it under Windows 10 (despite all the user advice on how to reload it under windows 10). It just never would reconnect to my blog. Looks like microsoft was just abandoning all the live essential stuff, damn
I’ve downloaded the open source version called Open Live Writer, and it seemed to work (it does have a few restrictions). The joy didn't last long, more unifortmative errors messages
The other big disappointment is google photos, its android app has disgraced itself to the point of relegation after reloading and updating itself so many times and filling up my phones available space with I know not what. The share memories option sounds good for sharing family and private events but it reality it is yet another attempt to vacuum in all your smartphone photos.

PhotoFriday :: A Little Sweeter

Whilst much contemporary installation art is straight our boring, occasionally some art can delight, like this Christmas Feast sculpted entirely in sugar, On display in the foyer of Melbourne's Windsor Hotel, until January 5th. Well done Hotham Street Ladies


For PhotoFriday‘s topic Small Scale

Saturday, December 12, 2015

PhotoFriday :: The Milk Run

The TSS Earnslaw, the original milk run.

 For PhotoFriday‘s topic Transportation