Reading CR2 and CRW metadata

I was curious about some statistics about the photos I’ve been shooting over the past few years and finally got around to begin writing a script to gather the stats I want. I knew virtually all of the digital camera photos in my tree were from some Canon camera, so I focused only on decoding Canon metadata (when the choice came up, as in for raw files and EXIF MakerNotes).

I started with the EXIF reading support in Singleshot and added decoding of some of Canon’s MakerNotes to get the File Number and, if available, camera serial number (DSLRs have that, PowerShots don’t).

This document described the CRW file format so I could write code to read CRW metadata and put it into the same form Singleshot’s JpegHeader does (thus: CRWHeader).

A quick hack to the script to write out a tsv with several columns and I could start to gather some stats, starting with: I had some duplicates in my tree.

I also knew I had some duplicate files in my photo tree from where I ended up copying some raw photos into more than once place, but I was still surprised to discover that “some” was ‘2,231’. The script counted any duplication of camera/filenumber as a dupe, so a lot of those turned out to be working/edited copies of the shots with the metadata intact rather than actual dupes. Still, I ended up deleting around 1000 duplicate files from directories with names like ‘stage’ and ‘stage2’ — these files were elsewhere in my tree in better-named directories.

I got out the Exif 2.2 spec (PDF) again and looked to see what EXIF data looked like embedded in a TIFF file. Happily, the CR2 is enough like TIFF that the specs for EXIF proved enough information to extract the data I wanted. I need to go back and refactor the EXIF reading code in Singleshot’s JPEG module so it can read EXIF out of TIFF/CR2 as well as JPEG.

In the meantime, though, I could finally start gathering stats. I’m still not decoding some of the tags from the files properly so I can’t yet generate stats on shutter speeds and lenses, but I did learn my ISO usage by camera body looked like this:

Model 50 100 200 400 800 1600 3200 Auto Grand Total Canon EOS D30 4298 410 439 369 137 5653 Canon EOS 10D 3782 157 236 71 1 4247 Canon EOS 20D 505 829 1916 372 1353 100 5075 Canon PowerShot S230 431 475 33 17 74 1030 Canon PowerShot SD500 6 12 44 61 447 570 Grand Total 437 9072 1473 2669 812 1491 100 521 16575 The blank boxes don’t apply to a given camera (e.g. the D30 has no Auto setting).

I hadn’t realized I was doing this so systematically, but it looks like with the 20D I started regularly shooting higher ISOs and only dropping to ISO 100 when it was particularly bright or I wanted a slower shutter speed. The cameras I’ve had less time tend to have fewer shots (no surprise there).

I want to pick up in the next pass to figure out how many shots from each camera are part of a “motor drive” sequence — holding the shutter down. The SequenceNumber field should make this doable. I also want to gather stats on lenses, focal lengths, shutter speeds and apertures. Most of this data isn’t practically useful but it’s still fun.