The DevOps engineer's handbook The DevOps engineer's handbook

DevOps team structures

As Jim Benson says in The Collaboration Equation, ‘individuals in teams create value’. Individual skill combined with collaboration is where great things happen. Underperforming teams happen when you don’t build in the need for people to work together to unlock their unique talents.

Since DevOps became the industry buzzword, many examples of anti-patterns emerged. However, patterns and anti-patterns mean nothing when you put people into a room and ask them to solve a problem or deliver value. Rather than one true way of organizing teams, you need to pay attention to:

  • Culture and context
  • The skills of each individual
  • What systemic issues might be stifling collaboration.

You don’t need to start from a blank page. Look at existing DevOps team structures that other organizations use in certain circumstances. Interaction models can help you understand the nature of dependencies between teams.

DevOps

The original idea for DevOps wasn’t to change team structures at all. It was about development and operations teams working more closely to deliver software. To make this happen, you had to remove conflicting goals. After identifying and fixing systemic value-damaging behaviors, collaboration becomes possible.

Many organizations were already familiar with cross-functional teams. Unsurprisingly, operations folks began moving into existing software delivery teams to work with other disciplines, like software developers, testers, and product managers. Having one team makes it much harder to find conflicting goals.

Other organizations formed a team between development and operations, often called a ‘DevOps team.’ This is generally an anti-pattern, but if it results in more collaboration and shared understanding, it’s a step toward better outcomes. It’s likely to succeed if the team has members from both existing teams and where it’s a stepping stone to cross-functional teams.

DevSecOps, BizOps, and others

Your organization’s primary silo boundary might not be between development and operations. Many organizations used variations of DevOps as an internal campaign to increase collaboration. This is where DevSecOps and BizOps encouraged specialists to work closer together.

You might use BizOps to highlight a disconnect between the business and the teams supplying their tools. To make this successful, you must repeat the DevOps process of finding conflicting goals and other barriers preventing teams from working together. The name alone won’t change anything.

You can expand the idea wherever you find silos separating people that need to work together. If you have many silos, you must address the core cultural issues causing these defensive barriers. The section on Team Topologies can help you redesign your teams and interactions.

Measure all DevOps initiatives on organizational outcomes rather than local measures.

In all cases, the DevOps research and modelling covers leadership, culture, and technical practices. DevOps bakes in collaboration, with many opting for cross-functional, autonomous teams. These other names reflect pressing concerns for specific organizations.

DevOps PATHS

There are two common mistakes in DevOps team design:

  1. Hiring one person for each skill, which creates silos-of-one
  2. Loading a single person with too many skills and causing cognitive overload

You can only avoid these two extremes by adopting a position somewhere in the middle. You must find a mix of people who bring different skill combinations to the team. It’s a complex task as each person you add changes what you need from the next person.

To help you design your teams, you can use DevOps PATHS.

PATHS stands for:

  • Process (Lean, Agile, Kanban, DevOps, Continuous Delivery, Extreme Programming)
  • Automation (scripting, tools)
  • Technical (Programming, architecture, cloud)
  • Human (communication, culture, empathy)
  • Specialist (development, operations, security, test)

Using PATHS can help you balance individual silos and cognitive overload.

You can use DevOps PATHS to detect common accidental team structures to fix and avoid long-term problems.

Hero teams

Teams filled with specialists, like software developers, are ‘Hero teams’. One highly-skilled team member manages builds, deployments, and responding to service outages.

In some cases, the hero gets a role such as DevOps engineer. Often they are just passionate about the broader software delivery process and want to improve it.

In the long run, the burden on this team member can lead to burnout. It’s not great for the individual, and it’s bad news for the organization when their performance drops or they leave.

Use DevOps PATHS to detect dense skill clusters and encourage team members to explore other areas they have an interest in. It helps you spread the knowledge and load more evenly.

Blinkers

A team with blinkers is performing well against many of the PATHS skills, but there are massive blind spots. They may be delivering a great product but with low automation. The lack of automation isn’t clear during regular operation, but it takes a long time to deploy a fix when you discover a critical production issue.

Over the long term, cracks start to appear, spreading from the blind spots into areas the team initially did well. Many low-performing teams were previously blinkered teams that were delivering well.

Use DevOps PATHS to detect skill areas with little or no coverage and look for champions in the team to grow into those subjects.

Using DevOps PATHS

As well as these examples, many other designs are problematic over the longer term. The DevOps PATHS provides a way to address overloaded team members and skill gaps.

By creating communities of practice around each of the five skill areas, you can:

  • Encourage knowledge sharing
  • Highlight critical skills you need in the team
  • Spread ideas across team boundaries

You can use your skill map when team members are looking for growth opportunities or during the hiring process.

Team size and composition are part of management’s broader system design. As teams grow, individual productivity decreases, but you’re more resilient to sickness, holidays, and team members moving on to new roles.

It’s easy to create a team with all the needed skills by hiring many people, but the team won’t have resilience as each member handles a small, isolated area. A professional manager’s job is to build a team with a strong mix of skills with overlap while keeping the team as small as possible.

Team Topologies

You can revisit your understanding of these DevOps team structures using Team Topologies. This model recognizes that communication within a team is high-bandwidth. How closely aligned two teams are can affect the speed that information moves between them. In some cases, information flows very slowly.

You can take this into account when you design teams. This doesn’t mean putting people together if they will regularly share information. You also must solve problems causing unnecessary communication.

For example, if everyone needs to speak to a team developing an API, it’s possible the API:

  • Isn’t well designed
  • Has many problems
  • Isn’t self-documenting

You can also use the 4 fundamental team topologies to understand a team’s role, responsibilities, and interaction mode. The 4 team topologies are:

  • Stream-aligned
  • Complicated subsystem
  • Enabling
  • Platform

You don’t need a team of each type, but any given team should resemble one of the 4 types. The authors describe this as a series of magnetic poles, with each team attracted to one type.

Steam-aligned teams

Stream-aligned teams work on a single valuable stream of work, usually aligned to a business domain. They might focus on a specific feature or group of features, work only on one user journey, or align with a particular persona.

A stream-aligned team works on everything needed for delivery. For example, the team would discover user problems and operate and monitor the system in production. When you view a stream-aligned team, they have no critical dependencies on any other team. They have all the necessary skills to deliver value.

Finding the right mix of individuals to create a small team with the necessary skills is challenging. Still, the results are high-bandwidth information flow and increasingly brilliant collaboration.

Complicated subsystem teams

Where part of your system is highly specialized, you might use a complicated subsystem team to manage it. You shouldn’t create this team unless circumstances force you. For example, if the skills needed are so specialized, you must pool them.

Stick with stream-aligned teams wherever possible. If you have to create a groundbreaking 3D rendering engine, you may need a complicated subsystem team to handle the challenges.

Enabling teams

An enabling team takes a long-term view of technology to bring a competitive advantage to organizations.

They research potential new:

  • Tools
  • Capabilities
  • Practices
  • Other technologies

They protect the autonomy of stream-aligned teams by helping increase skills and install new technology. As an enabling team, the goal is to give the knowledge to teams, not to dictate what they do with it.

Enabling teams are helpful as a part of a scaling strategy, as stream-aligned teams are often too busy to research and prototype new tools and technology. The enabling team can explore the new territory and package the knowledge for general use within the organization.

Platform teams

A platform team acts like an enabling team that packages the knowledge into a self-service offering. Stream-aligned teams can use the products created by platform teams to simplify and accelerate their work.

Platform teams promote good technical practices by making good decisions easier to access. They help stream-aligned teams achieve more.

Interaction modes

Even more useful than the team types are the interaction modes. The interaction between two teams should be one of the 3 interaction types:

  • Collaborating
  • Facilitating
  • Providing a service

Classifying each interaction can help you understand the nature of dependency and the level of service offered. You will likely interact with teams differently, but each relationship should be identifiable as one of these modes.

Platform Engineering

Platform Engineering is often found alongside DevOps and has a strong link with software delivery performance. It intersects with team topologies, as platform teams have many ‘as-a-service’ interactions with the other team types.

Platform teams work with development teams to create one or more golden pathways. These pathways don’t prevent teams from using something else but offer supported self-service products that help teams improve delivery capability. The pathways encourage alignment without removing team autonomy.

The Accelerate State of DevOps Report shows that you commonly find Platform Engineering teams in high-performance organizations.

Category% with Platform Engineering
Low8%
Mid25%
High48%

If you’re expanding the number of teams delivering software, Platform Engineering offers consistency without stifling team choice. Because your teams don’t have to use the platform, it benefits from competition with other software delivery pathways.

Site Reliability Engineering

Site Reliability Engineering (SRE) solves operations as if it’s a software problem. The SRE team strongly focuses on performance, capacity, availability, and latency for products operating at massive scale. Google pioneered this approach to manage continental-level service capacity.

Although the role of SRE is to impact reliability, many aspects of Site Reliability Engineering align with DevOps concepts.

  • Cross-functional collaboration
  • Extensive automation
  • Empirical learning
  • The use of measurement techniques

SRE practices are commonly found in DevOps teams, regardless of if they formally adopt them. DORA’s research has found reliability unlocks the effect of software delivery performance on organizational outcomes.

Summary

You shouldn’t judge team design from an external perspective. Every organization is on a journey of continuous improvement. You can only assess their current state relative to how things were before. The objective is higher collaboration and continuous improvement. If an organization achieves these goals, it’s irrelevant that it looks like an anti-pattern from the outside.

Problematic team designs (like hero teams or dedicated DevOps teams) are necessary for stable long-term solutions.

You can use DevOps PATHS and Team Topologies to inform your team design. Take inspiration from Platform Engineering and Site Reliability Engineering when you need to scale.

The needed conditions for success are:

  • A high-trust, low-blame culture
  • Transformational leadership
  • Lean management.

The technical capabilities will have less effect on your outcomes without these prerequisites.

Further reading

  • Team Topologies by Matthew Skelton and Manuel Pais (2019)
  • The Collaboration Equation by Jim Benson (2022)

Categories:

Next article
DevOps metrics