Ken Fox's notes, photos, and miscellany

Ken's notes

Watching my Twitter Userstream

I wanted this, and I couldn't find a working example of connecting to a Twitter userstream using eventlet and python-oauth2.

So here one is: userstream.py. There's plenty of room for improvement, but I've been using something very close to this to play with ways to watch my userstream for the past several days.

New site

A new site?

For historical hosting-related reasons my personal sites were spread across two domains and three hostnames (www.xythian.com, photos.xythian.com, and notes.xythian.net). www.xythian.com was a purely static site generated ages ago by a bespoke template system written in Python. photos.xythian.com ran Singleshot 2 and had my public photos. notes.xythian.net was my WordPress-powered blog.

The new site consolidates the old static site, photos, and blog into one site (www.xythian.com) and I've put permanent redirects in to the old sites to map the old resources to their new homes.

Why?

I wanted to consolidate my hosting and simplify discovery for the content I publish.

Why host anything at all? I run my web presence on software I control rather than using application hosting services because I want to control it rather than be at the mercy of some company for my site.

What I publish has changed over the years -- a lot of the casual sharing is now done via social networking sites which means less of that goes into the blog or photo site.

Design of the new site

Audience & Contents

Judging by my access logs, people appear to have one of three intents when visiting my site:

  • Seek. Seeking a solution to a technical problem or pictures of something in particular (actual size is a common entry point).

  • Follow. People following the notes, photos, or both. These people are mostly using feed readers.

  • Browse. Usually these folks got the address from someone (usually me) and are coming to see what's here.

The new site should facilitate all of these uses. Canonical, stable URLs help get things indexed properly for Seekers. I plan to delegate site search to Google Custom Search once the site is indexed rather than running my own search. Most people came in via a major web search engine rather than using the old site's search.

The two major types of content are still in seperate feeds (notes, photos) and they are redirected from the old homes of those feeds so followers should have a relatively transparent experience (although when the cut-over happens there may be some repeat posts).

Having a single new site in place of the multiple older sites should make discovery and navigation easier for browsers.

Devices and browsers

The new site (attempts to) uses HTML5 and CSS and make a single site that should lay out adequately on a variety of screen sizes. I tested on a collection of browsers I had handy and didn't worry about exact pixel-perfect rendering everywhere.

Navigation is suppressed for printing using media selectors. The overwhelming majority of (non-crawler) traffic to my site uses modern browsers.

URIs

I updated a bunch of links from posts and pages -- the Internet seems to decay. A lot of links scattered in the last several years of notes posts were no longer valid. Some of them I could update and some I just removed. Dismayingly, some of these links were from the last few months. I'm afraid a lot of people have forgotten (or, more likely, never learned) the lessons of from the past.

Of course, I changed almost all the URIs on my sites with this change. The consolidation of hostnames and domains that I wanted to do involved changing almost all of the URIs for everything. I thought about the design of the new site's URIs and contents before I thought about the layout and other styling attributes.

I switched to a /YYYY/MM/slug from /YYYY/MM/DD/slug for the notes posts (matching the usual format of URLs used by the photos site). Notes and posts are still kept in their own trees (/notes/ and /photos/) because judging by access logs the audiences are nearly disjoint. I dropped the trailing "/" from the canonical version of these URLs because they're not directories. All of the media and other static resources live under one of a few top-level directories so I can easily set the caching behavior on the lot of them.

The few "legacy" pages on the static site survived at nearly the same URL although a few of those are orphaned -- they're not linked from the new site, but search engines and other links still can direct traffic to them. These mostly were things I moved to the codebag github project but didn't want to make a straight redirect since that would be too abrupt given the reorganization.

Comments

The new site has no support for comments. I am still considering if and how to handle comments. The options include using something like Disqus, rolling my own comment service, and not supporting comments directly on the site. Today most feedback about things I post comes via mail, IM, Facebook, or Twitter.

Combine this with the fact that ratio of legitimate comments to spam on the WordPress blog was vanishingly low. The other sites didn't support comments -- although the photo site did for a while. Support for comments is a lot of surface area without much benefit, unfortunately.

I may post an email address more prominently to provide a channel for feedback directly on the site.

Media backups and workflow

I have a lot of photos and video -- the result of several years of enthusiastic DSLR use.

A few months ago I purchased a new NAS box and assembled a new backup scheme. This is a brief overview of how it works and some information about how I made some of the decisions.

My media tree's "source of truth" is my desktop machine's data drive -- currently a RAID 0 (striped) array of two 2TB disks. I download photos using Downloader Pro into a directory structure like this:

H:\photos\YYYY\YYYY-MM-DD-name\filename.CR2
H:\videos\YYYY\YYYY-MM-DD-name\filename.MTS/AVI

using the download path template:

Directory: h:\{default,{E9},photos}\{Y}\{Y}-{m}-{D}-{J} Filename: {T8}{o}

The first bit is what automatically splits photos into one tree and videos into another tree. The rest uses the year, month, day, original filename and type, and abbreviated camera model to assemble a file path like:

H:\photos\2011\2011-01-28-macro\5D2_MG_7004.CR2

Where "5D2" is the abbreviated name I've given to my DSLR. Downloader Pro prompts for this when it sees files from a camera that it's never seen before (it's reading the EXIF data). "macro" there is the "job code", which is something I enter for each download and is just a name to give me some idea what's in the directory. Usually it's the name of where or what I was shooting for that outing.

I use Downloader Pro because it gave me the flexibility I wanted to download into the directory structure that was reasonably navigable independent of any management tools.

I use FileBack PC to back both of these trees and the rest of the interesting files on the desktop to a Synology DS-1511+ array which can deal with one disk failure without losing data and has ample room for growth. That in turn backs itself up to an external 3TB USB disk every morning. Periodically I plug one of a rotating set of drives into a machine or the NAS and sync everything to that disk and bring it off-site.

Currently the photo+video tree totals about 1TB and all of the data together is around 1.5TB which means it still comfortably fits onto inexpensive 2TB USB disks for the off-site rotation.

Some data I keep on the NAS but don't back up -- things that are easily replicated but are merely convenient to have locally such as ISOs for Ubuntu v.current.

I've gone through a few backup targets since I started downloading into this tree and using FileBack PC to back things up. In that time I've had one catastrophic disk failure and restored successfully from the then-current backup target (an NSLU2 and USB disk).

I periodically (once a month or so, I need to automate this) run an md5sum on all of the files on the NAS, source-of-truth disk, and USB backup target and verify that the things I expect to match across those (…all of the data files…) do in fact still match. I haven't yet caught a discrepancy here (and don't really expect to given the total size of my data and probability of uncorrected, uncaught disk errors) but I have detected developing disk failures by exercising all the bytes and hearing/seeing the attempts to correct bad sectors. I replace disks if they give any signs of failing or after a couple of years.

I have a single Lightroom 3 catalog into which I've imported all of the photos (~42000 as of this writing) and (for many) assigned keywords and ratings so I can search for images.

The usual workflow now is to download the new batch of photos, import them into Lightroom, and make a pass or two over them giving them ratings from 1 to 5 stars. From there depending on the sort of outing it was I share the "selected" photos directly from Lightroom. I use Jeffrey Friedl's plugins to export to the places I share photos.

Sometimes I run the phone's GPS during an outing so I can geotag the photos although I haven't yet done much to take advantage of this data.

I've been revamping my keyword tree now that I have some more experience using Lightroom and may make a future post about that. The summary version is "group keywords into a tree" like "animal > dog > breed > particular-dog (when applicable)" or "location > CA > San Francisco > Golden Gate Park" and arrange so when I export the photo all the higher level keywords get exported along with the details. This is just to speed keyword entry and ensure I'm using consistent names. Some keywords are not exported and I use only to search within Lightroom.

I don't yet have a very good story for managing video clips or projects.

Mock service dependencies

Suppose you're building a service that depends on several other services to work. You write a bunch of code and carefully include error handling code and have a plan for what happens if each service your new service calls fails. Naturally, you want to test your code. These services are invoked over a network. Perhaps they're web services but they may be some other network protocol. Suppose further your code is nicely factored so there's a "client" class that presents the network service as a library API to the rest of the service.

There are a few approaches to ensure you have tests that exercise as much of your own service's code as possible.

All of these approaches share the idea that you want to have a mock service that you can instruct how to reply so each test can exercise an aspect of the "target" services code.

One approach is to write "mock" services at the library level -- swap out the code that calls out to the network with tame code you tell what to do as part of the test set-up. This would involve preparing to return predetermined results for a given request (a proper response or an error of some kind).

Another approach is to "mock" out at the "service transport" level -- keep the client implementation that thinks it's making, say, HTTP requests but swap out the HTTP client with a library that accepts HTTP requests at the API level and replies with appropriate HTTP responses.

A third approach is to "mock" the network service at the network level. This last approach involves writing a complete enough implementation of the target service to behave like the target service that can be instructed to reply with a proper response, a malformed response of some kind, or an error after some specified time (as fast as possible, normally, except when exercising timeout handling code) .. or even to do things like close the network connection before responding, or simply never respond at all. This exercises the entire stack of client code and permits testing of the service code and the network client code. It can be more work than the first two approaches, but almost provides the most confidence that as much of the service is exercised as possible. It also makes the test code more resilient to internal refactoring of the service since more of it is adhering to the externally visible API boundary (which presumably is more costly to change anyway) rather than introducing another surface acting as an API boundary. This mock service implementation can also be used with other client implementations helping to reduce the amount of code involved.

Communicating in code

Code is communicating. Communicating with the computer to make it do something useful. Communicating with the future people that will read and maintain the code. The former doesn't care how clever you are. The latter may know where you live. The latter may be you.

Older

Feed