Job Overview:As a Site Reliability Engineer, you will be responsible for managing and maintaining the reliability of our software systems.Key responsibilities include:Handling major incidents via critical issue response protocols and providing regular updates until resolution.Performing in-depth troubleshooting and analysis to identify root causes and implement preventive actions.Managing requests related to system deployments, feature toggles, and data fixes.Coordinating with cross-functional teams to resolve production issues.Enhancing monitoring capabilities using tools like Dynatrace, Kibana, and Splunk.Developing and improving monitoring scripts and alerts based on incident learnings.Handling customer escalations and coordinating with support and engineering teams.Supporting planned activities and responding to ad-hoc requests from various teams.Requirements and Qualifications:To succeed in this role, you should have:Deep experience in DevOps and production support.Proficiency in automation and CI/CD practices.Familiarity with cloud platforms (GCP, AWS, or Azure).Hands-on experience with monitoring tools such as Dynatrace, Kibana, and Splunk.Strong analytical and problem-solving skills.Excellent communication and coordination skills across teams.