The DevOps engineer's handbook The DevOps engineer's handbook

Measuring DevOps with the DORA metrics

DORA (DevOps Research and Assessment) are the team behind the Accelerate State of DevOps Report, a survey of over 32,000 professionals worldwide. Their research links the technical and cultural capabilities driving performance for software teams and the whole organization.

DORA recommends an approach to measuring software delivery and operations using 5 metrics:

  • Throughput - measures the health of your deployment pipeline

    1. Deployment frequency (DF)
    2. Lead time for changes (LT)
  • Stability - helps you understand your software quality

    1. Change failure rate (CFR)
    2. Mean time to recovery (MTTR)
  • Operational performance

    1. Reliability

The throughput and stability metrics measure software delivery performance. Operational metrics let you measure software delivery and operational performance. The extra metric is important because operational performance unlocks the link between software delivery and organizational performance. Applying all 5 DORA metrics will make you more likely to achieve your commercial and non-commercial goals.

We explain all 5 metrics below.

Lead time for changes

There are many definitions of ‘lead times’ in software delivery and manufacturing, so it’s worth being specific about the DevOps definition. Lead time for changes is the time it takes for a code change to reach the live environment. We measure this from code commit to Production deployment.

Lead time for changes spans the deployment pipeline

You can calculate the lead time for changes by pushing metadata to your deployment tool and using it to find the oldest commit.

If you don’t push metadata from the build system into your deployment automation tool, you can use the time from package upload to deployment. This deployment lead time will measure the progress of a change through your deployment pipeline. You still need to keep track of the build time separately. Don’t let build time be a blind spot, as the time taken for automated tests affects and increases build time if not measured.

Here are the 2022 performance levels for lead time:

  • Low - between one month and six months
  • Medium - between one week and one month
  • High - between one day and one week

Teams with shorter lead times tend to fix faults quickly because a resolution in the code won’t get stuck in a long deployment pipeline. A well-oiled deployment pipeline reduces the need to fast-track a fix, reducing the risk of knock-on problems caused by skipping key steps.

Deployment frequency

Deployment frequency measures how often you deploy to Production or to end users. You can measure this with your deployment automation tool, which sees the deployment rate to Production.

Here are the 2022 performance levels for deployment frequency:

  • Low - between once per month and once every 6 months
  • Medium - between once per week and once per month
  • High - on-demand (many deploys per day)

If you have manual tasks in your deployment pipeline, people will try to reduce how often they perform them. This affects deployment frequency, increases batch size, and makes the manual task a bigger effort. Reducing deployment frequency makes batches larger, which makes deployments even more difficult. This trends towards larger batches deployed less often and feedback that comes far too late.

DevOps encourages you to reverse the spiral. You can deploy more often by reducing batch size and automating the deployment pipeline. Feedback will arrive sooner, and you’ll be more likely to deliver valuable software.

A valuable way to use deployment frequency in your organization is to track the number of weekly deployments per developer. To do this, divide the number of deployments by the number of developers. Using per-developer numbers helps you see problems as you scale and manage expectations when a developer leaves.

Change failure rate

Your change failure rate is the percentage of changes resulting in a fault, incident, or rollback. To track change failure rates, you must keep a log of all changes that cause a Production issue.

Your work-tracking tools may have a feature to link a bug request to the original change. Otherwise, you may be able to add a custom field to retrospectively mark a change as ‘failed’ to use in reporting.

Here are the 2022 performance levels for change failure rate:

  • Low - 46%-60%
  • Medium - 16%-30%
  • High - 0%-15%

Your change failure rate is context specific. If you’re in an early stage of product development, consider encouraging risk-taking and experimentation by aiming for a higher change-failure rate. Where people depend on the software, you’ll want to achieve a lower change failure rate. Your deployment pipeline (and not policy constraints) should be the primary mechanism for reducing failures introduced by changes to the software.

Mean time to recovery

Your mean time to recovery is the average time between a failure and full recovery, whether due to a code change or something else. You can collect this from your work-tracking tools by marking work items as a Production fix and measuring the time it takes to complete the work.

When you need a code change to resolve a fault, your lead time will factor in the recovery time. A short lead time can be helpful as it allows you to deploy fixes without a special process to fast-track the change.

Traditionally, you measure operations on availability, which assumes you can prevent all failures. In DevOps, we accept there will always be failures outside our control, so the ability to spot an issue early and recover quickly is valuable.

When you measure recovery times for your team, you should plot all values on a scatter chart rather than aggregating to a mean or median value. Aggregation hides outliers that could spark a conversation that leads to improvement. It’s also useful to review restore times for change failures separately from those caused by unexpected Production environment issues. Network outages and infrastructure failures, for example.

You can still use the mean time to compare performance to the industry performance clusters.

Here are the 2022 performance levels for mean time to recover:

  • Low - between one week and one month
  • Medium - between one day and one week
  • High - less than one day

Change lead times can impact recovery times, as a code change needs to move through your deployment pipeline before it can go live. If you hurry a change by skipping steps in the deployment pipeline, you increase the risk of unexpected side effects. For Production faults unrelated to changes, resolution time may reflect:

  • A weak spot in the infrastructure design
  • A low service level from a supplier
  • Missing documentation that would speed up recovery

Software delivery performance

Based on survey responses, DORA grouped organizations into performance levels. Organizations in the higher performance groups not only have better software delivery, they often achieve better outcomes at an organizational level. Each report groups respondents to the annual survey, meaning industry trends show alongside demographic changes.

2021 performance groups

For several years, there have been 4 performance clusters. Here are the 2021 groups:

Performance levelLead timeDeployment frequencyChange failure rateMean time to resolve
Elite< 1 hourMultiple times per day0-15%< 1 hour
High1 day - 1 weekWeekly to monthly16-30%< 1 day
Medium1-6 monthsMonthly to biannually16-30%1 day - 1 week
Low> 6 monthsFewer than once every 6 months16-30%> 6 months

2022 performance groups

In 2022, only 3 clusters emerged from the data. You can learn why the clusters changed on our blog. Here are the 2022 groups:

Performance levelLead timeDeployment frequencyChange failure rateMean time to resolve
High1 day - 1 weekMultiple times per day0-15%< 1 day
Medium1 week - 1 monthWeekly to monthly16-30%1 day - 1 week
Low1-6 monthsMonthly to biannually46-60%1 week - 1 month

Though performance clusters from the annual report are useful to see how you compare to the industry, your goal isn’t elite performance. Instead, look at what your cross-functional team aims to achieve and set an appropriate ambition for their performance. There’s more information on this in the operational performance section below.

Also, remember though the 4 keys measure software delivery performance, it reflects a broader set of skills than only programmers. Everyone involved in creating software plays a part in performance, including:

  • Business experts
  • Testers
  • Operations folks
  • Security team members

If you use the metrics to assess software developers, you’ll take your team back to skill-based silos where they have conflicting goals.

Operational performance

The DORA metrics focus on software delivery performance. However, the State of DevOps Report found the operational capability of reliability drives benefits across many outcomes. Reliability refers to teams prioritizing meeting or exceeding their reliability targets.

The quality of your internal documentation will be a key to high performance against the reliability metric. Teams with high-quality documentation were more than twice as likely to meet or exceed their targets. Documentation also improved performance against the other DORA metrics. You should measure reliability against the service level objectives of your software.

If you exceed service level objectives by too much or for too long, other systems will start to depend on the higher service level you achieve. Rather than expecting downtime and handling it gracefully, many may assume your service will always be available. This causes problems when you experience an outage.

You can use short and deliberate outages to bring availability closer to the service level objective and test system resilience. This helps ensure other systems handle outages gracefully.

You can use the 5 metrics to determine software delivery and operational (SDO) performance. DORA doesn’t divide SDO clusters into low, medium, and high performance levels. Instead, the groups reflect an appropriate level of performance based on a classification of the software system.

When you develop a new product, you can trade stability to get fast, high-value learning cycles. Or, if you intend to retire a product, you’re likely to reduce the development frequency as you’ll no longer make improvements or introduce new features - only resolve critical issues. The clusters represent these different stages and goals.

Often, you’ll have a well-established and business-critical system not meeting the goals of the flowing group. This indicates critical gaps in your deployment pipeline or opportunities to adopt more capabilities to improve performance.

Rather than discovering your classification through metrics, classify your system and then use metrics to identify areas for improvement.

ClusterLead timeDeployment frequencyFailure rateMTTRReliability
Starting1 week - 1 monthWeekly or monthly31-45%1-7 daysSometimes
Flowing< 1 dayOn demand0-15%< 1 hourUsually
Slowing1 week - 1 monthWeekly or monthly0-15%< 1 dayUsually
Retiring1-6 monthsMonthly or bi-annually46-60%1-6 monthsUsually

These groups are more descriptive than the software delivery performance clusters. Rather than striving for high performance across all products and teams, take a more balanced approach. You can plan for different performance levels from a team working on an active product to one working on a new product. The flowing cluster is a close match for the 2021 elite performance cluster, with lead times being a little longer. In 2022, the report only classified 17% of organizations as flowing.

DORA metric summary

The DORA metrics use system-level outcomes to measure software delivery and operational performance. How an organization performs against these measures predicts performance against its broader goals. The top performers outpace competitors in their industry.

By removing obstacles to the fast flow of changes to Production, you can:

  • Deliver value to customers
  • Experiment with features
  • Get feedback quickly

With the DORA software delivery and operations metrics in place, you can experiment with your deployment pipeline and answer the questions:

  • Does this help us deliver software faster or more often?
  • Does this make our software more stable?

Often, improvements in software delivery performance result in increased speed and stability. This is one of the key findings of the State of DevOps report. Teams who deliver software faster also write better quality software. Top performers can:

  • Change their applications faster
  • Deploy them more often
  • Have fewer failures
  • Recover quickly from faults

It’s rare to find a trade-off between speed and stability, even though this is counter-intuitive. If you find a trade-off emerging, use the Continuous Delivery statements and DevOps capabilities list in DORA’s structural equation model to check if you’re missing a critical practice.

Continuous Delivery helps you achieve high performance. The relationship between speed and stability will help amplify your improvements.

Further reading

Help us continuously improve

Please let us know if you have any feedback about this page.

Send feedback

Categories:

Next article
SPACE framework