Menu

Canary deployment in Kubernetes: The basics and a quick tutorial

What Are canary deployments?

Canary deployment is a software release strategy that enables a phased rollout of new software versions. This approach reduces the risk associated with deploying a new version by initially releasing it to a small subset of users. If there are no critical issues, the deployment is gradually expanded to the entire user base. This allows for monitoring and testing the new version in a real-world environment before it affects all users.

By incrementally introducing changes, canary deployment ensures that potential issues can be detected and addressed early. It also allows teams to gather user feedback and performance metrics in an isolated yet realistic setting. This method helps maintain system stability and reliability while fostering continuous delivery and integration practices.

This is part of a series of articles about Kubernetes deployments.

Pros and cons of canary deployments in Kubernetes

Pros of canary deployments in Kubernetes include:

  • Reduced risk: Limiting the exposure of new changes to a small group of users. This means that if the new version has critical issues, their impact is confined and easier to manage. Kubernetes enables rollback of updates, so problematic changes can be promptly reversed.

  • Real-world testing: Using canary deployments provides insights into how the new version performs under real usage conditions. Unlike synthetic tests, canary deployments expose the software to actual user interactions, uncovering issues that only arise in real-world scenarios.

  • Performance monitoring: Kubernetes provides tools for tracking performance metrics, which help teams identify any deviations from the norm. These metrics include response times, error rates, and resource use.

Cons of canary deployments in Kubernetes include:

  • Increased complexity: Managing multiple versions of an application running simultaneously requires careful orchestration, monitoring, and routing of traffic. In Kubernetes, this often means configuring additional resources such as services, ingress controllers, or traffic management tools like Istio.

  • Longer feedback loops: Canary deployments tend to lengthen the time it takes to gather complete feedback. Since the new version is initially rolled out to a smaller group of users, it may take longer to detect issues that only emerge under higher traffic loads or more diverse usage patterns. This problem is exacerbated in applications with lower traffic, to the point of making canaries impractical in low traffic applications.

  • Traffic splitting challenges:

    • Depending on the toolset used, directing a precise percentage of users to the canary version can be difficult to achieve.
    • Another challenge in B2B applications is to make sure all users from the same customer see the same version of the software.
    • Sticky sessions can be difficult to achieve, and clients might find themselves interacting with two differently versioned upstream servers that exhibit different behavior.
    • In Kubernetes environments, additional configuration is needed to route traffic appropriately, and misconfigurations can lead to uneven traffic distribution, causing the canary to either be overwhelmed or underutilized.
  • Database challenges: Implementing canary deployments for database changes might be challenging. It might be necessary to create a dependency, for example, to update the database first and roll out microservices afterwards.

Related content: Read our guide to Kubernetes deployment tools

Use cases of canary deployment in Kubernetes

Configuration changes

Canary deployments are ideal for testing configuration changes. Even minor changes can have unforeseen impacts, and rolling them out incrementally reduces the chance of widespread disruptions. Kubernetes allows for easy updates and rollbacks of configuration files, ensuring that any negative effects are quickly contained.

Security patches

Canary deployments are an effective way to apply security patches cautiously. Rolling out a security fix to a small group of users first helps ensure that the patch does not introduce new vulnerabilities or disrupt existing functionality. In Kubernetes, this approach allows teams to monitor the patched version for stability and security issues before proceeding with a full-scale rollout.

Third-party service updates

Updating third-party services can be risky due to potential compatibility issues. Canary deployments provide a safe way to integrate and test these updates without affecting the entire user base. With Kubernetes, teams can easily route traffic to the older version if the update causes problems.

Tools commonly used for canaries in Kubernetes

Below are several tools that can be used to perform canary deployments. However, it’s important to realize that canary deployments can’t exist in isolation. To manage the full deployment process for complex applications, DevOps teams need a full deployment automation tool, ideally as part of an end-to-end automated continuous delivery (CD) pipeline.

Having a full deployment automation platform makes it possible to automatically test for issues, react to discovered issues fast, and roll back changes. The canary tools listed below can make rollouts easier, by replacing scripting and adding automation, but they should ideally be incorporated into a full CD pipeline.

Argo Rollouts

Argo Rollouts is an open source, Kubernetes-native tool that supports advanced deployment strategies such as canary, blue-green, and progressive delivery. It provides fine control over the rollout process, allowing teams to incrementally increase traffic to the canary based on health checks and metrics. Argo Rollouts also offers integrations with monitoring systems, making it easier to observe the impact of new versions and decide whether to proceed with the rollout or revert to a previous version.

Flagger

Flagger, part of the Flux project, is a Kubernetes operator designed specifically for progressive delivery strategies, including canary deployments. It automates the process of promoting canary versions based on metrics and thresholds.

Flagger integrates with various monitoring systems like Prometheus, enabling automatic rollbacks if performance degradation or errors are detected. This reduces the manual overhead involved in monitoring the canary and ensures that only stable versions are gradually rolled out to more users.

Istio

Istio is a popular service mesh that provides advanced traffic management features, including canary deployments. It allows teams to control how traffic is routed between different versions of an application by using fine-grained traffic policies.

Istio also enables features such as circuit breaking, fault injection, and retries, which can be crucial in managing canary releases. Additionally, Istio’s observability features help teams monitor the performance and health of both the canary and primary versions of the application in real-time.

Quick tutorial: A simple canary deployment with Argo Rollouts

The following instructions are adapted from the Argo Rollouts documentation.

To perform a basic canary deployment using Argo Rollouts, you first need to define a Rollout resource. This resource specifies the application to update, the number of replicas, and the strategy steps for controlling traffic flow.

Here’s a simple example:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: example-rollout
spec:
  replicas: 10
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:1.15.4
          ports:
            - containerPort: 80
  minReadySeconds: 20
  revisionHistoryLimit: 5
  strategy:
    canary:
      maxSurge: '20%'
      maxUnavailable: 0
      steps:
        - setWeight: 20
        - pause:
            duration: 1h
        - setWeight: 30
        - pause: {}

In this configuration:

  • The rollout initially directs 20% of traffic to the new version and pauses for one hour.
  • It then increases traffic to 30% and pauses indefinitely until manually promoted.

The setWeight step controls the percentage of traffic directed to the canary version, while the pause step ensures that teams have time to observe metrics before progressing.

If no duration is specified for a pause step, the rollout will wait indefinitely. You can resume the rollout manually using:

kubectl argo rollouts promote <rollout-name>

Argo Rollouts calculates replica scaling automatically based on the set weights. For example, if you have 10 replicas and set a 10% weight, the new version will initially be scaled to 1 pod. For finer control over traffic routing and scaling, especially in cases with fewer replicas, integrating traffic management solutions is recommended. This allows more accurate traffic distribution without relying solely on pod counts.

Advanced Example: Dynamic Scaling in a Canary Deployment

Argo Rollouts also supports dynamic scaling of the canary version without directly tying replica counts to traffic percentages. This is useful for cases like testing the new version without exposing it to users, or gradually scaling up for internal verification.

Here’s an example using the setCanaryScale step:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: example-dynamic-rollout
spec:
replicas: 10
selector:
 matchLabels:
   app: nginx
template:
 metadata:
   labels:
     app: nginx
 spec:
   containers:
     - name: nginx
       image: nginx:1.15.4
       ports:
         - containerPort: 80
strategy:
 canary:
   steps:
     - setCanaryScale:
         replicas: 3
     - pause:
         duration: 10m
     - setWeight: 50
     - pause: {}

In this configuration:

  • The canary version is immediately scaled to 3 pods, regardless of the traffic percentage.
  • After a 10-minute pause to observe system behavior, traffic is then shifted so that 50% of user traffic is directed to the canary.

By using setCanaryScale, teams can control scaling independently of traffic routing. This enables strategies like scaling up the canary early for background monitoring, without prematurely exposing users to the new version.

Important consideration: After setting an explicit scale with setCanaryScale, traffic percentages (setWeight) may not align with pod counts. To avoid uneven traffic distribution, you can reset this behavior by using:

- setCanaryScale:
    matchTrafficWeight: true

This ensures that future traffic weight settings automatically adjust the number of pods to match traffic expectations.

Canary deployments in Octopus

Octopus Deploy is the leading solution for deploying your software to multi-cloud, hybrid, and on-premises environments. There are 3 ways to implement canary deployments in Octopus. The easiest is to use the “Deploy to a subset of deployment targets” feature when deploying the release. This lets you limit which deployment targets to deploy to.

To do this, you deploy using just the canary servers, then after testing, you deploy again using the remaining servers. This approach works well if you have a small number of servers and don’t deploy to production too frequently.

The alternative approach is to build canary deployments into your deployment process.

  1. Deploy the package to the canary server (one or more deployment targets may be associated with the canary target tag).

  2. Have a manual intervention step to wait until you’re satisfied.

  3. Deploy the package to the remaining deployment targets (the web-server target tag).

Note that the first 2 steps are configured to run only for production deployments. In our pre-production environments, we can just deploy to all targets immediately. If we were performing fully automated tests, we could use a PowerShell script step to invoke them rather than the manual intervention step.

A final variation is to set up a dedicated “Canary” environment to deploy to. The environment can contain a canary deployment target, with the same deployment target also belonging to the production environment.

Start a free trial of Octopus Deploy

Help us continuously improve

Please let us know if you have any feedback about this page.

Send feedback

Categories:

Next article
Kubernetes YAML