Is "Good enough" the enemy of "complete"?
I spent a bit more time on the RSS fetcher since my last post. Mostly I fixed some dup-detection issues and defined a more useful format for the message bus messages. It's not really where I would call it "complete", but it's been good enough that I took rawdog out of cron and made this one run every hour (up from every 4 hours). It's still missing the feed refresh scheduling, so it hits every feed every hour. The vast majority of the feeds I hit implement one or both of ETag and If-Modified-Since, so my motivation to improve the scheduling is low. The new fetcher doesn't cause any noticable impact on the machine at all whereas rawdog was really visible load/memory spike. It's still running out of cron rather than as the envisioned daemon.
I also made an RSS->SMTP sender that picked up the NEW_ARTICLE
messages off the Spread message bus and mailed articles to my
account. That worked, but was a little slow (since the messages had
to go through Speakeasy) and a lot of them got snagged by
Speakeasy's spamassassin install. Rather than put my fetcher
address on my Speakeasy whitelist, I resurrected the old
RSS->IMAP
code I wrote a while ago and refit it to run as a daemon listening
to the Spread NEW_ARTICLE group. Now it drops new articles
directly into the IMAP folder I set up months ago for this
purpose.
It's a long way from my vision, but this setup is good enough for me to use. Perhaps too good -- the pain that caused me to write all this is gone, so I haven't made much further progress since writing the IMAP storer. It runs smoothly, hasn't caused any dupes or missed anything that I've detected in my cross-checks, and there have been no tracebacks for several days now.