Variable Specificity And Complexity

Paul StovellJuly 27, 2015 • 4 min read

Octopus allows variables to be scoped to different environments, roles or machines. A simple example of where this is useful, is to have different values for a database connection string depending on whether you are deploying to a Test or Production environment.

Variables can be scoped to multiple values, so deciding which variable is sensible in a given scenario is a surprisingly tricky process. We call it variable specificity, and it works a little bit like CSS specificity.

For example, suppose we have this set of variables:

DatabaseName = DB01 (Environment: Staging, Production)
DatabaseName = DB02 (Environment: Staging)
DatabaseName = DB03 (Environment: Production)

In this case, if we deployed to Production, the correct value to use could either be DB01 or DB03, since both match. There are times when the results of variable scoping will be non-deterministic, and I think that’s pretty intuitive.

Each scoping level has a different priority:

a variable scoped to a machine is more specific than a variable scoped to a role
… which is more specific than a variable scoped to an environment
… which is more specific than a variable with no scope

This usually also makes intuitive sense - machines are more granular than environments, so that scope should take priority.

Calculating the specificity of a variable in Octopus looks like this:

public int Rank()
{
    var score = 0;
    score += Score(ScopeField.Private, 10000000);
    score += Score(ScopeField.User, 1000000);
    score += Score(ScopeField.Action, 100000);
    score += Score(ScopeField.Machine, 10000);
    score += Score(ScopeField.TargetRole, 1000);
    score += Score(ScopeField.Role, 100);
    score += Score(ScopeField.Environment, 10);
    score += Score(ScopeField.Project, 1);
    return score;
}

If multiple variables match a scenario, we simply take the top-ranking variable. Higher priority items increase ranking before lower priority items. For example, in Production I use one database, but my reporting service should always use a different database:

DatabaseName = DB01 (no scope)
DatabaseName = DB02 (Environment: Staging)
DatabaseName = DB03 (Environment: Production)
DatabaseName = DB04 (Role: Reporting)
DatabaseName = DB05 (Role: Reporting; Environment: Production)

This works - my Reporting app will get DB04 even in staging, since roles are more specific than environments. But in Production, I can force it by scoping it at two levels.

One area where this gets confusing is multiple matches involving roles. Take the first example, but instead of environments, let’s consider roles:

DatabaseName = DB01 (Role: WebServer, AppServer)
DatabaseName = DB02 (Role: WebServer)
DatabaseName = DB03 (Role: AppServer)

Now, I can’t deploy to multiple environments at once, so it was obvious that either value should apply in the first example. But I can have a machine that belongs to multiple roles. In my development environment, perhaps my budget is limited to a single machine, so it is both a WebServer and an AppServer.

In that example, which value should my WebServer+AppServer machine get? If I polled 100 people, I suspect an easy majority would expect DB01. In other words, when multiple values match, they’d expect an and relationship to apply and each match should contribute to the ranking.

But what would happen in this case?

DatabaseName = DB01 (Role: WebServer, AppServer)
DatabaseName = DB02 (Role: WebServer)
DatabaseName = DB03 (Role: AppServer; Environment: Production)

If I deployed to a WebServer+AppServer machine in Production, what value should I get? If I polled 100 people for this scenario, I’d say half would expect DB01, and half would expect DB03. There’s no intuitive way to reason about these, unlike the previous samples. If you’ve put hours of thought into this, it might make sense to you, but will it make any sense for the next person that has to maintain them?

Decisions like these are tricky. My hope is that by not trying to make specificity rules too clever, it will simply encourage people to find better ways to model the deployments (maybe the environment makes better sense than a role combination) than relying on a complex matrix that few will ever understand at a passing glance.