Job Description:
About us
We provide enterprise support and consulting services for open-source analytics and data infrastructure platforms.
Our customers rely on us to keep their systems fast, stable, and available. We're a small team working across multiple time zones, supporting 100+ customer environments with SLAs ranging from advisory support to 24/7 incident coverage.
About the Role
The role involves taking ownership of service operations: SLAs and incident processes; on-call and skills coverage; SOPs and first-line/SRE enablement; configuration management; SLA metrics and reporting; coordination between customers and our engineering teams.
Responsibilities:
•SERVICE OPERATIONS, ON-CALL & INCIDENTS
Design an on-call plan that ensures all critical skills are available when needed.
Owning the incident management process for your accounts: priorities, roles, communication cadence, escalations, post-incident reviews.
Defining key service metrics (e.g., MTTA,Mttr ,SLA compliance )and drive improvements based on them.
Acuting as incdient lead/coordinator during major incidents,