Job Description
About the Role:
* We're looking for an experienced Service Delivery Manager to take ownership of our service operations.
This is a hands-on role, not a pure governance role. You will be close to real incidents, engineers, and customers and you'll be expected to bring in practices you've already used successfully in previous service or managed-services environments.
This includes:
1. Designing and maintaining an on-call and coverage plan that ensures all critical skills are available when needed;
2. Owning the incident management process for your accounts: priorities, roles, communication cadence, escalations, and post-incident reviews;
3. Defining and monitoring key service metrics (e.g., MTTA, MTTR, SLA compliance, backlog health) and driving improvements based on them;
4. Acting as incident lead / coordinator during major incidents, keeping engineers focused and customers informed;
5. Creating and maintaining SOPs, runbooks, and triage guides for SRE engineers, covering common incident types and operational tasks;
6. Training and coaching first-line/SRE teams so they can confidently handle initial triage, basic troubleshooting, and clear communication, escalating only when needed;
7. Establishing and running a configuration management process that keeps track of each customer's environment (platforms in use, clusters, regions, configs, access, monitoring, key contacts);
8. Proactively closing information gaps by working directly with customers and engineers;
9. Ensuring configuration information is available and trustworthy during incidents and for onboarding new engineers;
10. Being the primary operational contact for a set of enterprise customers;
11. Leading regular service reviews and status calls, presenting SLA performance, key incidents, risks, and improvement actions;
12. Presenting and agreeing on the incident management process with customers (channels, priorities, escalation paths, expectations);
13. Working closely with Account Management / Sales on renewals, expansions, and expectation management;
14. Clarifying what is in scope vs. out of scope and working with customers and Sales to shape paid change requests when additional work is needed;
15. Monitoring effort vs. contract, helping protect margins, and flagging risks early (under-scoped contracts, chronic over-use, under-utilized capacity);
16. Working in a matrix environment, coordinating with different technical teams (e.g., database engineering, DevOps, SRE) to staff and deliver engagements effectively;
17. Designing and maintaining onboarding paths for new engineers joining support/delivery (shadowing, training on SOPs, environment overviews,