We are looking for a highly skilled Service Operations Manager to oversee the delivery of our enterprise support services. This is a key role that requires strong leadership and technical skills to ensure seamless service operations.
Job Description
In this role, you will be responsible for designing and maintaining on-call and coverage plans, owning incident management processes, defining and monitoring service metrics, and acting as an incident lead/coordinator during major incidents. You will also create and maintain SOPs, runbooks, triage guides for SRE engineers, train and coach first-line/SRE teams, continuously refine documentation based on real incident experience and feedback.
1. Designing on-call coverage plans that ensure all critical skills are available when needed.
2. Owning the incident management process for your accounts: priorities roles communication cadence escalations post-incident reviews.
3. Defining monitoring key service metrics (e.g., MTTA MTTR SLA compliance backlog health) drive improvements based on them.
4. ACTION AS INCIDENT LEAD COORDINATOR DURING MAJOR INCIDENTS KEEPING ENGINEERS FOCUSED AND CUSTOMERS INFORMED FOR EXAMPLE SETTING UP
during onsite visits leading individuals through root cause analysis problem resolution discussions prioritizing efforts ensuring clear transparent communication with customers throughout the process.