We are seeking a skilled Site Reliability Engineer II to join our team and contribute to the transformation of the insurance industry with our leading cloud platform. In this role, you will be responsible for ensuring the reliability, performance, and scalability of our cloud-based applications, applying your skills in automation, software engineering, and operational discipline to drive business growth and improvement.
Key Areas of Focus
* Collaboration with development teams to troubleshoot and resolve issues, minimizing customer impact and ensuring high-quality service delivery.
* Development and maintenance of automated runbooks to address common issues proactively, enhancing operational efficiency and reducing downtime.
* Application of engineering principles and automation to optimize operating environments, ensuring high-performance service delivery and meeting business objectives.
* Expert guidance on reliability and performance improvements, driving business success and innovation.
Essential Qualifications
* Proven experience in SRE or similar roles, with a focus on system reliability and availability, as well as strong problem-solving skills and the ability to collaborate effectively with cross-functional teams.
* Familiarity with automation, monitoring, and performance optimization tools and techniques, including SLI's, SLO's, and Error Budgets, as well as a commitment to maximizing uptime, scalability, and delivering exceptional end-user experience.