Site reliability engineer - sre

Santa Cruz do Sul

Kto Group

Anunciada dia 17 julho

Descrição

Welcome to KTO Group, where innovation drives excitement in iGaming. Founded in 2018 by Andreas Bardun, we’re transforming online gaming with a focus on transparency and player satisfaction.

At KTO.com, we blend the thrill of sports betting with online casino entertainment, tailored to local markets and powered by our proprietary platform for a seamless, personalized experience.

KTO is a rising leader in LATAM, proudly ranked among Brazil’s top 10 iGaming brands. Join us as we set new standards in trust, innovation, and the future of iGaming.

SUMMARY OF THE POSITION

We are seeking a highly skilled Site Reliability Engineer (SRE) to join our technology team at KTO. The successful candidate will be responsible for designing, implementing, and maintaining scalable and reliable infrastructure while ensuring seamless deployments and system stability. You will work closely with development teams, applying SRE principles to optimize system performance, observability, and automation.

MAIN RESPONSIBILITIES

* Design, develop, and maintain automation solutions using Terraform for centralized Infrastructure as Code (IaC) management.
* Implement and manage CI/CD pipelines with GitHub Actions and ArgoCD to support continuous and secure application deployments.
* Enhance system stability and reliability by establishing advanced observability practices, leveraging Elastic Cloud, Grafana, and Prometheus for monitoring APM, logs, and metrics with event correlation.
* Proactively identify and resolve performance and availability issues in distributed systems to ensure minimal downtime and high reliability.
* Manage and optimize containerized environments with Kubernetes, ensuring scalability and high availability of applications.
* Collaborate with development teams to align operations with SRE best practices, including SLIs, SLOs, and Error Budgets.
* Advocate for and implement strategies for blue/green deployments and Continuous Deployment to minimize risk during releases.

EXPERIENCE & QUALIFICATIONS REQUIRED

* Proven experience with Terraform for managing Infrastructure as Code.
* Strong expertise in container orchestration with Kubernetes and Helm for centralized management.
* Proficiency in observability tools, particularly Elastic Cloud, Grafana, and Prometheus, for monitoring and troubleshooting.
* Expertise in CI/CD pipelines with GitHub Actions and ArgoCD.
* Solid experience with Linux systems and shell scripting.
* Hands-on experience with AWS or similar cloud platforms.
* Knowledge of programming languages such as Python, Go, or Java.
* Experience with SQL and NoSQL databases.
* Background in deploying and managing highly available, scalable production environments.

NICE TO HAVE

* Experience with advanced deployment strategies such as canary releases or feature flags.
* Knowledge of distributed tracing and correlation techniques.
* Exposure to DevOps practices and tools applied to reliability engineering.
* Certifications in Cloud Computing, Kubernetes, or DevOps-related areas.
#J-18808-Ljbffr

Se candidatar

Criar um alerta

Salvar