Cowboy in the desert.

Why Octopus doesn't use the .NET Service Bus (or polling)

An Octopus server deploys software to many deployment agents ("Tentacles"). The communication model uses a standard client/server model, where the Tentacle listens on a TCP port (10933 by default) and the Octopus connects to it.

Octopus Server -------> Tentacle

A common feature request is for this communication to be reversed, so:

Octopus Server <------- Tentacle

Some implementation options for this might be:

  • Having the Tentacles 'poll' the Octopus server for deployment commands
  • Using a technology like web sockets/SignalR
  • Using the .NET Service Bus

As an example, consider a cloud startup that sells software to a bank. The Octopus server might be installed on Amazon EC2, while the Tentacle agents will be installed on the bank's production web servers, buried deep in a bank-managed data center.

The feature described above - having the bank's production web servers call the Amazon-hosted Octopus server - sounds like a great solution that avoids the need for the bank to open firewall rules. But what are the downsides?

Why Octopus works the way it does

The current communication architecture was chosen for a few reasons:

  1. "Servers" generally serve something - chances are there's already inbound connections allowed to the server, otherwise you wouldn't be deploying anything to it. In my experience getting sysadmins to approve an outbound connection can be just as hard as getting them to approve an inbound connection, especially if the Octopus server is in a different security zone to the production web servers.
  2. I've found that most sysadmins actually prefer this, since they can create rules to moderate the inbound connections. If the software tried to "work around" firewall rules this could cause sysadmins not to trust it.
  3. Load/performance - having the Tentacle connect outwards means some kind of active connection - whether it's using sockets or polling. By having the Tentacle listen we just have an open listen socket and a blocked thread waiting for I/O. While the overhead either way is pretty minimal, some people feel more comfortable with this approach if the Tentacle happens to be running on a production web server for example.
  4. It is in line with how most ops tools work - WinRM, PSExec, SSH - all require the "target" server to be listening
  5. It is simpler, and simpler is good

Reversing this model could damage trust

Who broke through my firewall?

The only reason to reverse the model that I can see is to "work around" firewall rules. But it's not actually about working around firewall rules (technically, these aren't hard to change); it's about working around the people that control the firewall rules.

If I were a sysadmin at the bank above, I think I'd prefer to know that there's software running on my production servers that needs a port open, and I had the chance to approve/deny it. If the software silently "worked around" my firewall rules, I'd be very upset. I'd rather someone took time to educate me on the purpose of the software (making application deployments simpler) and the security aspects of it so that I can make an informed choice. If Octopus made it easy to work around me, I'd be less likely to trust the product.

Now, it's unlikely that any bank sysadmin would agree to change firewall rules so that an Amazon-hosted web server could deploy software into the bank's production web servers. But if they wouldn't agree to it, isn't that enough reason to know that trying to work around them is a bad idea?

Are there valid scenarios?

Although it seems like the only reason for an alternative communication model is to bypass firewalls (or in other words, to bypass sysadmins), are there scenarios where it might be a good idea anyway?

One scenario where reversing the model might be a good idea is when deploying desktop software using Octopus, where the Tentacle is running on every client workstation. While it could possibly work, I think deploying software to client workstations is quite a different problem domain to deploying software to servers (due to connectivity, the large number of machines, the need for clients to rollback individually, and so on) that Octopus isn't a good fit for this scenario in the first place. That's why Windows Update doesn't build on top of MS Deploy.

Another example might be where firewalls are managed by external providers and although sysadmins would be happy to change the rules, getting them changed in a timely fashion might be hard. Again, this feels like a technical solution to a people problem, but sometimes that's just what it takes.

I'd love to hear any more scenarios that you might have encountered that make this model of communication necessary. For now, when I consider the work it would take to implement and the kind of distrust such a model could create in the product, I feel that adding a polling/service bus-based communication model isn't a high-priority feature.


Tagged with: Architecture
Loading...