Site Reliability Engineer
Octopus Deploy is looking for a Senior Site Reliability Engineer (SRE) who can:
- Use their SRE skills to keep systems running with high reliability.
- Help improve and iterate our existing reliability practices.
- Bring new ideas/practices to increase reliability and reduce toil.
- Spearhead implementation of new capabilities.
- Share SRE expertise with other teams in the company.
You will be a great fit for this role if:
- The way of working outlined here (https://github.com/OctopusDeploy/People/tree/main/Engineering/Site-Reliability-Engineering) is your natural way of getting things done.
- You excel in an environment focused on availability, reliability, and observability.
- You are skilled in systems engineering and may have specialized expertise in specific areas.
- You find value in applying safety culture lessons from other industries to your work.
- You are adept at leading postmortems and designing deployment and monitoring pipelines.
- You have a passion for automating builds, tests, deployments, infrastructure, and operational tasks.
- You embrace a "you built it, you run it" culture, with a commitment to quality and system availability, participating in a humane on-call program.
- You are self-motivated, work independently with high-quality output, and seek help or new tasks when needed.
- You collaborate effectively to solve problems, combining passion, pragmatism, and empathy.
- You are results-oriented, adaptive to business direction changes, and encourage the same approach in others.
- You thrive on candid feedback, solving complex problems, and helping fellow engineers succeed while working on valuable projects.
Our Tech Stack:
- Please note - this is to give you an idea of our tools, we don't expect expertise in everything.
Octopus Server:
- Our primary focus and flagship product.
- Written in .NET and uses SQL database.
CI/CD:
- TeamCity is our build system for Octopus Server.
- Github Actions are used for some internal tools.
- CD - Octopus Deploy.
Workloads:
- A mix of internally developed applications and 3rd Party Software (e.g. TeamCity).
- Run in Azure with a mix of AppServices, AKS Clusters, and Azure Functions.
- We use Linux containers mostly with a few Windows containers.
- Container workloads are run on AKS.
- Dockerhub and Artifactory container registries.
Infrastructure as Code(IaC):
- We use Terraform as our primary IaC tool.
- IaC workloads run in Octopus Deploy, with a few running as github actions.
Observability:
- We have adopted OpenTelemetry for a lot of our Builds systems.
- We are adopting OpenTelemetry for more use cases company-wide, delivering a full telemetry pipeline.
- SumoLogic and Honeycomb for analysis.
A typical day might include:
- Working on building new capabilities to increase reliability (we don’t want you staring at monitoring dashboards all day).
- Working where you work best, in a home office designed by you, using a device of your choosing, with or without music, in an atmosphere you create for yourself.
- Handling a request from an internal team, helping solve a challenging build, test or packaging issue, or offering advice to an engineer to help them fall into the pit of success.
- Pairing with another engineer on a Zoom call to solve a complex technical problem or explore and define the problem space for future innovation.
- Responding to an actionable alert and working to maintain the reliability of the platform used across the company.
- Improving our documentation to help engineers discover solutions for themselves and reduce lead time.
- Writing a blog post about something interesting for other engineers or preparing a presentation on what was learned from a recent incident.
- Facilitating an incident review or preparing a presentation on what was learned.
- Proactively reducing future toil by building automation.
Why Octopus Deploy
If you join Octopus, you’ll be joining a high trust, remote-first team that helps over 150,000 people around the world to deliver working software to production.
We’ve been around since 2012 and we’re used by over 25,000 companies, including ASOS, Xero, StackOverflow, NASA and Disney.
Growing, sustainable company
We’re profitable and long-term oriented. And yet we’re growing quickly. It’s a unique opportunity to join a growing company without all the craziness. With that growth comes all kinds of opportunities for career progression and professional and personal development.
What to expect
What’s it really like here? We’ve made our internal handbook public, so you can see for yourself. It will answer many of the questions you might have.
Benefits & perks
Impact
Trust and autonomy to do the best work of your life, in work that makes a meaningful difference.
Competitive salary
Along with a pro-active salary review and feedback system.
Learn more.
Opportunities
As we grow, new opportunities to learn, advance and contribute are continuously opening.
Laptop & home office
Choose your own laptop and home office program.
Learn more.
Health & dental
Excellent health care, dental and vision.
(US only)
Retirement
Generous 401K / pension retirement plan matching.
(US & UK only)
Work/life harmony
No crazy hours or arbitrary deadlines.
Parental Leave
Paid parental leave for primary & secondary carers.
Don’t see a role you’re interested in?
If you share our core values but none of the roles above seem quite right, but you think you can bring something unique that might help, we’d love to hear from you. Just talk to us.