**Job Title:** Site Reliability Engineer
This is a challenging role that requires strong technical skills and a mindset focused on development and automation to solve complex problems.
The ideal candidate will be eager to tackle technology's greatest challenges and make an impact on thousands of users.
This position involves working as part of a team to enable and enhance the day-to-day operational workflows of critical applications and services in a 24x7x365 environment located in cloud and physical data centers.
The successful candidate will continuously improve application observability to ensure uptime and reliability of our applications and infrastructure.
A wide variety of open source technologies will be utilized to create fault-tolerant, scalable, and secure high-performance services and pipelines on a global scale.
Responsibilities:
* Enable and enhance day-to-day operational workflows of critical applications and services
* Continuously improve application observability to ensure uptime and reliability
* Utilize open source technologies to create high-performance services and pipelines
Requirements:
* Strong experience building scalable production environments
* Experience with source code control systems, versioning, branching, and merging
* Extensive working experience on Continuous Integration and Continuous Delivery procedures and tools
* Strong scripting skills
* Experience using Docker containers and container orchestration tools
* Experience with infrastructure as code tools
* Ability to work as part of a distributed team
* Experience in monitoring and metrics systems
Nice To Have:
* Familiarity with database technologies
* Hands-on experience with messaging technologies like RabbitMQ and Apache Kafka configuration and troubleshooting
* Programming skills in JAVA or .NET