Job Title: Senior Software Engineer, Platform Reliability
Location: Remote-LATAM
We are looking for a Senior Software Engineer with 5+ years of experience in platform reliability, backend systems, and cloud infrastructure. This role focuses on improving the reliability, scalability, and performance of a large-scale consumer video streaming platform.
Key Responsibilities:
* Design and enhance highly reliable, scalable, and self-healing systems
* Build and maintain observability (monitoring, logging, tracing, alerting)
* Collaborate with engineering teams to improve system performance and architecture
* Define and maintain SLAs/SLOs for backend services
* Troubleshoot production issues and drive long-term solutions
* Act as SME for platform reliability across streaming systems
Required Skills:
* Strong coding experience in Golang and JavaScript/TypeScript
* Expertise in microservices and distributed systems
* Hands-on experience with Kubernetes, Terraform, and Linux systems
* Knowledge of networking (TCP/IP, load balancing)
* Experience with observability tools (Datadog, OpenTelemetry)
Good to Have:
* Experience with eBPF
* Knowledge of video streaming (HLS, transcoding, playback)