About Our Team
We're a world-class expert, remote-first team working across multiple time zones to provide enterprise support and consulting for open-source analytics and data infrastructure platforms.
Job Description:
We're looking for an experienced Service Delivery Manager to take ownership of our service operations, including SLAs and incident processes, on-call and skills coverage, SOPs and first-line/SRE enablement, configuration management, SLA metrics and reporting, and coordination between customers and our engineering teams.
Key Responsibilities:
* Designing a clear on-call plan that ensures all critical skills are available when needed;
* Owning the incident management process for your accounts;
* Maintaining key service metrics (e.g., MTTA, MTTR) and driving improvements based on them;
* Acting as incident lead/coordinator during major incidents;
SOPs & Runbooks:
1. Create & maintain SOPs/runbooks/triage guides for SRE engineers covering common incident types & operational tasks
2. Train coach first-line/SRE teams so they can handle initial triage/troobleshooting/clear communication escalating only when needed