About this job
Job description:
We're looking for an experienced Service Delivery Manager to oversee our service operations.
* SLAs and incident processes
* On-call and skills coverage
* SOPs and first-line/SRE enablement
* Configuration management
* SLA metrics and reporting
* Coordination between customers and engineering teams
This is a hands-on role, not a pure governance role. You'll be close to real incidents, engineers, and customers and you'll be expected to bring in practices you've already used successfully in previous service or managed-services environments.
Responsibilities:
1. Service operations, on-call & incidents
* Design and maintain an on-call and coverage plan that ensures all critical skills are available when needed (initially weekdays, evolving to full 24/7 where required).
* Own the incident management process for your accounts: priorities, roles, communication cadence, escalations, and post-incident reviews.
* Define and monitor key service metrics (e.g., MTTA, MTTR, SLA compliance, backlog health) and drive improvements based on them.
* Act as incident lead/coordinator during major incidents, keeping engineers focused and customers informed.
2. SOPs, runbooks & first-line enablement
* Create and maintain SOPs, runbooks, and triage guides for SRE engineers, covering common incident types and operational tasks.
* Train and coach first-line/SRE teams so they can confidently handle initial triage, basic troubleshooting, and clear communication, escalating only when needed.
* Continuously refine documentation based on real incident experience and feedback.
3. Configuration management & readiness
* Establish and run a configuration management process that keeps track of each customer's environment (platforms in use, clusters, regions, configs, access, monitoring, key contacts).
* Proactively close information gaps by working directly with customers and engineers.
* Ensure configuration information is available and trustworthy during incidents and for onboarding new engineers.
4. Customer communication & governance
* Be the primary operational contact for a set of enterprise customers.
* Lead regular service reviews and status calls, presenting SLA performance, key incidents, risks, and improvement actions.
* Present and agree on the incident management process with customers (channels, priorities, escalation paths, expectations).
* Work closely with Account Management/Sales on renewals, expansions, and expectation management.
5. Commercial & delivery management
* Clarify what is in scope vs. out of scope and work with customers and Sales to shape paid change requests when additional work is needed.
* Monitor effort vs. contract, help protect margins, and flag risks early (under-scoped contracts, chronic over-use, under-utilized capacity).
* Work in a matrix environment, coordinating with different technical teams (e.g., database engineering, DevOps, SRE) to staff and deliver engagements effectively.
6. Onboarding & training
7. Design and maintain onboarding paths for new engineers joining support/delivery (shadowing, training on SOPs, environment overviews,