The DevOps engineer's handbook The DevOps engineer's handbook

Measuring DevOps with the DORA metrics

DORA (DevOps Research and Assessment) are the team behind the Accelerate State of DevOps Report, a survey of over 32,000 professionals worldwide. Their research links the technical and cultural capabilities driving performance for software teams and the whole organization.

DORA recommends an approach that brings together software delivery and operational performance. The 4 keys of software delivery performance measure throughput and stability of the application. A fifth measure of operational performance captures how well the application performs against its reliability targets.

There are few metric sets backed by multi-year research programs. DORA metrics are exceptional in the type and volume of evidence supporting them.

Software delivery and operational performance

DORA measures Software delivery performance with throughput and stability metrics. The metrics are often called “the four keys”:

  • Throughput measures the health of your deployment pipeline
  1. Deployment frequency
  2. Lead time for changes
  • Stability helps you understand your software quality
  1. Change failure rate
  2. Failed deployment recovery time

To extend this to operational performance, we add a fifth metric:

  • Operations
  1. Reliability

The additional reliability metric is important. Operational performance helps software delivery impact outcomes for the organization. Applying all 5 DORA metrics will make you more likely to achieve your commercial and non-commercial goals.

Deployment frequency

Deployment frequency measures how often you deploy to Production or to end users. You can measure this with your deployment automation tool, which sees the deployment rate to Production.

Here are the 2023 performance levels for deployment frequency:

  • Low - between once a week and once a month
  • Medium - between once a week and once a month
  • High - between once a day to once a week
  • Elite - on-demand (many deploys per day)

If you have manual tasks in your deployment pipeline, people will try to reduce how often they perform them. This affects deployment frequency, increases batch size, and makes the manual task a bigger effort. Reducing deployment frequency makes batches larger, which makes deployments even more difficult. This trends towards larger batches deployed less often and feedback that comes far too late.

DevOps encourages you to reverse this negative effect. You can deploy more often by reducing batch size and automating the deployment pipeline. Feedback will arrive sooner, and you’ll be more likely to deliver valuable software.

A valuable way to use deployment frequency in your organization is to track the number of weekly deployments per developer. To do this, divide the number of deployments by the number of developers. Using per-developer numbers helps you see problems as you scale and manage expectations when a developer leaves.

Lead time for changes

There are many definitions of lead times in software delivery and manufacturing, so it’s worth being specific about the DevOps definition.

Lead time for changes is the time it takes for a code change to reach the live environment. We measure this from code commit to Production deployment.

Lead time for changes spans the deployment pipeline

You can calculate the lead time for changes by pushing metadata to your deployment tool and using it to find the oldest commit in a deployment.

If you don’t push metadata from the build system into your deployment automation tool, you can use the time from package upload to deployment. This deployment lead time will measure the progress of a change through your deployment pipeline. You still need to track your build times and code review process separately.

Here are the 2023 performance levels for lead time:

  • Low - between one week and one month
  • Medium - between one week and one month
  • High - between one day and one week
  • Elite - less than a day

Teams with shorter lead times tend to fix faults quickly because a resolution in the code won’t get stuck in a long deployment pipeline. A well-oiled deployment pipeline removes the need to fast-track a fix, reducing the risk of knock-on problems due to skipping key steps.

Change failure rate

Your change failure rate is the percentage of changes resulting in a fault, incident, or rollback. To track change failure rates, you must keep a log of all changes that cause a Production issue.

Work-tracking tools usually have a feature to link a bug request to the original change. You can use these to calculate your change failure rate. Otherwise, you can add a custom field to retrospectively mark a change as ‘failed’ to use in reporting.

Here are the 2023 performance levels for change failure rate:

  • Low - 64%
  • Medium - 15%
  • High - 10%
  • Elite - 5%

Your change failure rate is context-specific. If you’re in an early stage of product development, you can encourage risk-taking and experimentation by aiming for a higher change-failure rate.

Where people depend on the software, you’ll want to achieve a lower change failure rate. Your deployment pipeline (and not policy constraints) should be the main mechanism for reducing failures caused by software changes.

Failed deployment recovery time

Adjusted in 2023, failed deployment recovery time is how long it takes to get back into a good state after a bad deployment. The deployment might have caused a fault, or the software version may contain a critical issue you must address.

This metric doesn’t track production incidents caused by network problems, hardware faults, or other unpredictable events like earthquakes.

You can collect failed deployment recovery times from your deployment automation tool. You can also create a work item type for failed deployments in your task-tracking tools.

In past years, the research used Mean Time To Recovery (MTTR). The move to failed deployment recovery time clears up much of the confusion around how to measure MTTR.

When you need a code change to resolve a fault, your lead time will factor in the recovery time. A short lead time can be helpful as it allows you to deploy fixes without a special process to fast-track the change.

Traditionally, you measure operations on availability, which assumes you can prevent all failures. In DevOps, we accept there will always be failures outside our control, so the ability to spot an issue early and recover quickly is valuable.

When you measure recovery times for your team, you should plot all values on a scatter chart rather than aggregating to a mean or median value. Aggregation hides outliers that could spark a conversation that leads to improvement.

You can still use the mean time to compare performance to the industry performance clusters.

Here are the 2023 performance levels for failed deployment recovery times:

  • Low - between one month and six months
  • Medium - between one day and one week
  • High - less than one day
  • Elite - less than one hour

Change lead times can impact recovery times, as a code change needs to move through your deployment pipeline before it can go live. If you hurry a change by skipping steps in the deployment pipeline, you increase the risk of unexpected side effects.

Reliability

Reliability refers to teams prioritizing meeting or exceeding their reliability targets. The State of DevOps Report research finds operational performance drives benefits across many outcomes.

The quality of your internal documentation will be a key to high performance against the reliability metric. Teams with high-quality documentation were more than twice as likely to meet or exceed their targets. Documentation also improved performance against the other DORA metrics. You should measure reliability against the service level objectives of your software.

If you exceed service level objectives by too much or for too long, other systems will start to depend on the higher service level you achieve. Rather than expecting downtime and handling it gracefully, many may assume your service will always be available. This causes problems when you experience an outage.

You can use short and deliberate outages to bring availability closer to the service level objective and test system resilience. This helps ensure other systems handle outages gracefully.

Software delivery performance clusters

Based on survey responses, DORA grouped organizations into performance levels. Organizations in the higher performance groups not only have better software delivery, they often achieve better outcomes at an organizational level. Each report groups respondents to the annual survey, meaning industry trends show alongside demographic changes.

2023 performance groups

ClusterLead timeDeployment frequencyChange failure rateFailed deployment recovery time
Low1 week - 1 monthOnce a week - once a month64%1 month - 6 months
Medium1 week - 1 monthOnce a week - once a month15%1 day - 1 week
High1 day - 1 weekOnce a day - once a week10%< 1 day
Elite< 1 dayOn demand5%< 1 hour

Though performance clusters from the annual report are useful to see how you compare to the industry, your goal isn’t elite performance. Your goal is software delivery. Instead, look at what your cross-functional team aims to achieve and set an appropriate ambition for their performance.

Also, remember though metrics track software-delivery performance, they reflect many roles, not just software engineers. Everyone involved in creating software plays a part in performance, including:

  • Business experts
  • Testers
  • Operations folks
  • Security team members

If you use the metrics to assess software developers, you’ll take your team back to skill-based silos where they optimize for their own outcomes. That may conflict with the organization’s goals.

DORA metric summary

The DORA metrics use system-level outcomes to measure software delivery and operational performance. How an organization performs against these measures predicts performance against its broader goals. The top performers outpace competitors in their industry.

By removing obstacles to the fast flow of changes to Production, you can:

  • Deliver value to customers
  • Experiment with features
  • Get feedback quickly

With the DORA software delivery and operations metrics in place, you can experiment with your deployment pipeline and answer these questions:

  • Does this help us deliver software sooner or more often?
  • Does this make our software more stable?

Often, improvements in software delivery performance result in increased speed and stability. This is one of the key findings of the State of DevOps report. Teams who deliver software more frequently also create better quality software.

Top performers can:

  • Change their applications sooner
  • Deploy them more frequently
  • Have fewer failures
  • Recover quickly from faults

It’s rare to find a trade-off between speed and stability, even though this is counter-intuitive. If you find a trade-off emerging, use the Continuous Delivery statements and DevOps capabilities list in DORA’s structural equation model to check if you’re missing a critical practice.

Continuous Delivery helps you achieve high performance. The relationship between speed and stability will help amplify your improvements.

More reading

Help us continuously improve

Please let us know if you have any feedback about this page.

Send feedback

Categories:

Next article
SPACE framework