Media backups and workflow

I have a lot of photos and video — the result of several years of enthusiastic DSLR use.

A few months ago I purchased a new NAS box and assembled a new backup scheme. This is a brief overview of how it works and some information about how I made some of the decisions.

My media tree’s “source of truth” is my desktop machine’s data drive — currently a RAID 0 (striped) array of two 2TB disks. I download photos using Downloader Pro into a directory structure like this:

H:\photos\YYYY\YYYY-MM-DD-name\filename.CR2
H:\videos\YYYY\YYYY-MM-DD-name\filename.MTS/AVI

using the download path template:

Directory: h:\{default,{E9},photos}\{Y}\{Y}-{m}-{D}-{J} Filename: {T8}{o}

The first bit is what automatically splits photos into one tree and videos into another tree. The rest uses the year, month, day, original filename and type, and abbreviated camera model to assemble a file path like:

H:\photos\2011\2011-01-28-macro\5D2_MG_7004.CR2

Where “5D2” is the abbreviated name I’ve given to my DSLR. Downloader Pro prompts for this when it sees files from a camera that it’s never seen before (it’s reading the EXIF data). “macro” there is the “job code”, which is something I enter for each download and is just a name to give me some idea what’s in the directory. Usually it’s the name of where or what I was shooting for that outing.

I use Downloader Pro because it gave me the flexibility I wanted to download into the directory structure that was reasonably navigable independent of any management tools.

I use FileBack PC to back both of these trees and the rest of the interesting files on the desktop to a Synology DS-1511+ array which can deal with one disk failure without losing data and has ample room for growth. That in turn backs itself up to an external 3TB USB disk every morning. Periodically I plug one of a rotating set of drives into a machine or the NAS and sync everything to that disk and bring it off-site.

Currently the photo+video tree totals about 1TB and all of the data together is around 1.5TB which means it still comfortably fits onto inexpensive 2TB USB disks for the off-site rotation.

Some data I keep on the NAS but don’t back up — things that are easily replicated but are merely convenient to have locally such as ISOs for Ubuntu v.current.

I’ve gone through a few backup targets since I started downloading into this tree and using FileBack PC to back things up. In that time I’ve had one catastrophic disk failure and restored successfully from the then-current backup target (an NSLU2 and USB disk).

I periodically (once a month or so, I need to automate this) run an md5sum on all of the files on the NAS, source-of-truth disk, and USB backup target and verify that the things I expect to match across those (…all of the data files…) do in fact still match. I haven’t yet caught a discrepancy here (and don’t really expect to given the total size of my data and probability of uncorrected, uncaught disk errors) but I have detected developing disk failures by exercising all the bytes and hearing/seeing the attempts to correct bad sectors. I replace disks if they give any signs of failing or after a couple of years.

I have a single Lightroom 3 catalog into which I’ve imported all of the photos (~42000 as of this writing) and (for many) assigned keywords and ratings so I can search for images.

The usual workflow now is to download the new batch of photos, import them into Lightroom, and make a pass or two over them giving them ratings from 1 to 5 stars. From there depending on the sort of outing it was I share the “selected” photos directly from Lightroom. I use Jeffrey Friedl’s plugins to export to the places I share photos.

Sometimes I run the phone’s GPS during an outing so I can geotag the photos although I haven’t yet done much to take advantage of this data.

I’ve been revamping my keyword tree now that I have some more experience using Lightroom and may make a future post about that. The summary version is “group keywords into a tree” like “animal > dog > breed > particular-dog (when applicable)” or “location > CA > San Francisco > Golden Gate Park” and arrange so when I export the photo all the higher level keywords get exported along with the details. This is just to speed keyword entry and ensure I’m using consistent names. Some keywords are not exported and I use only to search within Lightroom.

I don’t yet have a very good story for managing video clips or projects.