As a Site Reliability Engineer II, you will play a critical role in ensuring the reliability, performance, and scalability of applications running on our cloud platform. This position offers a unique opportunity to apply your skills in automation, software engineering, and operational discipline to support cloud-based solutions.
Key Responsibilities
* Assist in troubleshooting and resolving issues in collaboration with development teams, reducing customer impact.
* Develop and maintain automated runbooks to address common issues proactively.
* Apply engineering principles and basic automation to enhance operating environments.
* Monitor applications and help improve their reliability and performance on the cloud platform.
* Use software engineering skills to optimize systems and reduce manual tasks.
* Document incidents and assist in refining processes to prevent future occurrences.
* Stay informed about industry trends, tools, and best practices in site reliability engineering.
Requirements
* Experience as an SRE or similar role, focusing on improving system reliability.
* Strong problem-solving skills and ability to assist in analyzing complex systems and devising effective solutions.
* Effective collaboration and communication skills to work cross-functionally and document processes.
* Experience with automation, monitoring, and performance optimization tools and techniques.
* Commitment to maximizing uptime, scalability, and delivering exceptional end-user experience.