Job Description
We provide enterprise support and consulting services for open-source analytics and data infrastructure platforms. Our customers run mission-critical systems relying on us to keep them fast, stable, and available. We are a remote-first team working across multiple time zones supporting 100+ customer environments with various service levels.
About the Role
* We seek an experienced Service Delivery Manager to take ownership of our service operations including SLAs, incident processes, on-call coverage, SOPs enablement, configuration management, SLA metrics reporting coordination between customers and engineering teams.
Key Responsibilities:
* Service Operations On-Call & Incidents:
• Design maintain an on-call plan ensuring all critical skills are available when needed initially weekdays evolving to full 24/7 where required.
• Own the incident management process for accounts prioritizing roles communication cadence escalations post-incident reviews defining monitoring key service metrics driving improvements based on them acting as incident lead/coordinator during major incidents keeping engineers focused customers informed.
SOPs Runbooks & First-Line Enablement:
* Create maintain SOPs runbooks triage guides for SRE engineers covering common incident types operational tasks train coach first-line/SRE teams handle initial triage basic troubleshooting clear communication escalating when needed continuously refine documentation based real experience feedback.
,