Sorting topologically for friendlier configuration

I’ve been working on the bootstrap mechanism of service. One of the problems I’m addressing is the relative fragility of its configuration mechanism. Some services depend on other services (e.g. a “connection handler” service depends on both “authentication” and “database access” services). With the old configuration mechanism, the order these services were specified in the configuration file mattered. The bootstrap mechanism would always start them in the order specified, which meant several possible configuration files would result in a broken system since some services may end up started before the services they depend on. Since I’m making the configuration file less fragile, I’m making the new bootstrap system always start services after their dependencies. The dependency information is a directed acyclic graph — there can be no cycles of dependencies. It topologically sorts the services to find the start order.

A brief, contrived example implementation of this:

class Service:    depends = ()
class AService(Service):    depends = ()
class BService(Service):    depends = (AService,)
class CService(Service):    depends = (BService,)
class DService(Service):    depends = (AService,)
class EService(Service):    depends = (CService, DService)
class FService(Service):    depends = ()
def toposort(services):
    sorted = []
    def visit(service):
        if service in sorted:
            return
        for depend in service.depends:
            visit(depend)
        sorted.append(service)
    for service in services:
        visit(service)
    return sorted
def show(func, services):
    print func.__name__
    for service in func(services):
        print 'Start: ',service.__name__, ' depends on ',', '.join([s.__name__ for s in service.depends])
services = [FService, EService, DService, CService, AService]
show(toposort, services)
----
toposort
Start:  FService  depends on
Start:  AService  depends on
Start:  BService  depends on  AService
Start:  CService  depends on  BService
Start:  DService  depends on  AService
Start:  EService  depends on  CService, DService

A simple way to sort the nodes in a DAG is using a depth first traversal. In this example, if the graph is too deep we’ll blow the stack. In my install of Python, sys.getrecursionlimit limit is 1000. My own stack would pop long before Python’s if the depth of our dependency graph was the many hundreds it would take to blow the stack with the obvious recursive implementation.

But, I’ll go ahead and show a version that doesn’t use recursion anyway:

According to some testing, toposort2 is much slower than toposort:

Of course, a real version of this would also need to watch for cycles and report them in a way more friendly than blowing the stack (toposort) or running until it ran out of memory or was killed (toposort2).

toposort2 could be made to run faster, but I’ll save the “making Python code faster” effort for situations when it matters. If your algorithm is naturally expressed recursively it is best to express it recursively unless there’s really a compelling reason to do otherwise.