The DevOps engineer's handbook The DevOps engineer's handbook

Canary deployments: Pros, cons, and 5 critical best practices

What are canary deployments?

Canary deployments and canary releases are a pattern for rolling out releases to a subset of users or servers. The idea is to first deploy the change to a small subset of servers, test it, and then roll it out to the rest of the servers. The canary deployment serves as an early warning indicator that impacts fewer end users when something goes wrong. If the canary deployment fails, it only impacts users on the subset, while those on the rest of the servers aren’t affected.

The term “canary” originated in the mining industry. Canaries were once regularly used in coal mining as an early warning system. Toxic gasses like carbon monoxide, methane, or carbon dioxide in the mine would kill the bird before affecting the miners. Signs of distress from the bird indicated to the miners that conditions were unsafe.

This is part of a series of articles about software deployment.

How canary deployments work

As an example, imagine you have an environment with 4 web servers. Rather than deploying to all deployment targets in the environment, a canary deployment would look like this:

  1. Deploy to one or more canary servers.
  2. Test, or wait until satisfied.
  3. Deploy to the remaining servers.

The test phase of the canary deployment can work in many ways. You could run some automated tests, perform manual testing yourself, or simply wait to see if end users encounter problems under production volumes. You can use all 3 approaches together. Depending on how you plan to test, you might decide to remove the canary server from the production load balancer and return it only when rolling out the change to the rest of the servers.

Canary deployments are similar to using a staging environment. The difference is that staging environments are usually dedicated to the task; a staging web server doesn’t become a production server. By contrast, in a canary deployment, the canary server remains part of the production fleet when the deployment is complete. Canary deployments may be worth considering if you don’t have the resources for a dedicated staging environment.

New versions are rolled out to one server before proceeding to the others. The rollout can be paused if problems are detected.

Canary release versus canary deployment

While the terms “canary release” and “canary deployment” are often used interchangeably, there are subtle differences between them.

A canary release refers to the gradual rollout of new software features or updates to a small subset of users or servers, often in a controlled and monitored environment. The focus of a canary release is primarily on user feedback and the performance of the new feature. This lets you gather real-world usage data, identify potential issues, and make adjustments before a broader release.

A canary deployment encompasses the entire deployment process, including infrastructure changes, configuration updates, and the release of new features. It’s a broader concept that includes both the release phase and monitoring and validation phases. Canary deployments aim to minimize the risk associated with new releases by limiting exposure to potential problems. This makes sure that only a small portion of the user base is affected if an unexpected issue arises.

As you’ll often find people referring to canary deployment or canary releases interchangeably, you should check whether the intention is to ensure the new software version works under production load and real-world usage, or to check the impact of changes on user behavior. Having a clear idea of what you’re validating is more important than using perfect terminology.

Canary versus blue/green deployment

Both canary deployments and blue/green deployments are strategies for minimizing risk during software releases, but they differ in their approaches.

Canary deployment: In a canary deployment, a new application version is gradually rolled out to a subset of users or servers. This limited exposure allows for real-time monitoring and feedback. If the canary version is stable and performs well, it is progressively rolled out to the rest of the servers. The main advantage is that it minimizes the impact of potential issues by limiting the number of users affected during the initial deployment phase.

Blue/green Deployment: A blue/green deployment involves maintaining 2 identical production environments. You designate one as the “blue” environment and the other as the “green” environment. The current production version runs in the blue environment, while the new version gets deployed to the green environment. After the new version is fully tested and validated, you switch traffic from the blue environment to the green environment, effectively making the green environment the new production environment. This approach provides a quick rollback mechanism by redirecting traffic back to the blue environment if issues arise.

The key differences are the deployment strategy and risk management. Canary deployments introduce changes incrementally, reducing risk by limiting exposure. Blue/green deployments provide a more straightforward rollback path by maintaining parallel environments but need more infrastructure resources.

Pros and cons of canary deployments

Canary deployments offer several advantages that make them a popular choice for rolling out new features and updates. However, they also have drawbacks you should be aware of.

Here are some of the key benefits:

  1. Risk mitigation: By deploying changes to a small subset of servers initially, you can find issues and address them before the full rollout, minimizing the impact on users.
  2. Incremental rollouts: This approach allows gradual exposure to new features, which helps you effectively monitor performance and user feedback.
  3. Real-time feedback: Canary deployments provide immediate insights into the performance and stability of new releases under real-world conditions.
  4. Flexibility: You can adjust the deployment process based on performance metrics. This allows for a dynamic rollout that you can pause or roll back as needed.
  5. Cost-effectiveness: Unlike blue/green deployments, canary deployments don’t require a separate environment, making them more resource-efficient.

Canary deployments also have several important limitations:

  1. Complexity: Implementing canary deployments requires sophisticated traffic management, monitoring, and automated testing to identify and resolve issues quickly.
  2. Configuration management: Managing different configurations for canary and full deployments can be challenging, especially in complex environments.
  3. Limited scope: While useful for detecting issues, canary deployments may not catch all potential problems, particularly those that only manifest under full load.
  4. Rollback complexity: Rolling back changes can be more complex than in blue/green deployments, as it might involve reverting only a subset of servers.
  5. Resource allocation: Although more resource-efficient than blue/green deployments, canary deployments still require careful planning and resource allocation to manage the different stages of the rollout.

Best practices for implementing canary deployments

1. Automate the deployment process

Automating the deployment process is crucial for achieving consistency, efficiency, and reliability. Use deployment automation tools like Octopus, Argo, or Bamboo to manage the deployment pipeline. Automation reduces the risk of human error and ensures that every deployment follows the same steps, making it easier to replicate and troubleshoot issues. Automated processes can also handle complex tasks like rolling back changes, scaling resources, and integrating with Continuous Integration/Continuous Deployment (CI/CD) pipelines.

2. Implement robust monitoring and alerting

Effective monitoring and alerting systems are vital for tracking the performance and health of canary deployments. Implement comprehensive monitoring to collect data on CPU use, memory consumption, response times, error rates, and other key metrics. Without visibility into your application and user behavior, you can’t get the vital feedback you need to validate the new version.

Tools like Prometheus, Grafana, and New Relic can provide real-time insights and visualizations. Set up alerting mechanisms to notify relevant teams immediately when performance thresholds are breached or anomalies are detected.

3. Gradual traffic shifting

Gradually shifting traffic to the canary deployment is a key strategy for minimizing risk. Start by directing a small percentage of user traffic to the canary servers and closely monitoring their performance. If the canary deployment meets the success criteria, the traffic allocation will be gradually increased. You should pin users to prevent their requests from being served by 2 different application versions. This avoids requests failing due to breaking changes and the user interface changing from request to request.

This phased approach ensures you detect issues early and limits the impact on the overall user base. Tools like load balancers and traffic management platforms can help manage traffic distribution effectively.

4. Test in production-like environments

Ensure that the canary environment closely mirrors the production environment for accurate performance and compatibility assessments. This includes using similar hardware, software configurations, and network settings.

Testing in a production-like environment helps identify potential issues that might not be apparent in a more controlled or simplified testing environment. This practice ensures that the canary deployment accurately reflects real-world conditions.

5. Use feature flags

Feature flags provide extra control during the canary deployment process. By implementing feature flags, you can enable or disable new features without redeploying code. This gives you greater flexibility in managing feature rollouts and lets you respond quickly to issues.

Feature flags also enable A/B testing and gradual feature rollouts, making it easier to assess the impact of new features on user experience.

Canary deployments in Octopus

Octopus Deploy is the leading solution for deploying your software to multi-cloud, hybrid, and on-premises environments. There are 3 ways to implement canary deployments in Octopus. The easiest is to use the “Deploy to a subset of deployment targets” feature when deploying the release. This lets you limit which deployment targets to deploy to.

To do this, you deploy using just the canary servers, then after testing, you deploy again using the remaining servers. This approach works well if you have a small number of servers and don’t deploy to production too frequently.

The alternative approach is to build canary deployments into your deployment process.

  • Deploy the package to the canary server (one or more deployment targets may be associated with the canary target tag).
  • Have a manual intervention step to wait until you’re satisfied.
  • Deploy the package to the remaining deployment targets (the web-server target tag).

Note that the first 2 steps are configured to run only for production deployments. In our pre-production environments, we can just deploy to all targets immediately. If we were performing fully automated tests, we could use a PowerShell script step to invoke them rather than the manual intervention step.

A final variation is to set up a dedicated “Canary” environment to deploy to. The environment can contain a canary deployment target, with the same deployment target also belonging to the production environment.

Start a free trial of Octopus Deploy.

Help us continuously improve

Please let us know if you have any feedback about this page.

Send feedback

Categories:

Next article
Rolling deployments