100% Remote
USA Timezone
Contractor / PJ position
Role Overview
The goal is to shape the reliability and scalability of mission-critical platforms on Azure, Kubernetes, and modern DevOps toolchains. You will solve complex infrastructure challenges, automate end-to-end operations, and ensure systems operate with high availability, performance, and security.
What You Will Do
* Design, build, and improve CI/CD pipelines for applications and infrastructure
* Develop automation frameworks that reduce manual effort and increase consistency.
* Configure and optimize cloud infrastructure to align with security, scalability, and performance best practices.
* Collaborate with development teams to remove deployment blockers and improve delivery workflows.
* Monitor reliability and performance, identify issues early, and implement data-driven improvements to increase uptime and efficiency.
* Participate in on-call rotations and drive incident resolution with clear postmortems and preventive actions.
* Maintain technical documentation for pipelines, configurations, and runbooks.
* Perform readiness assessments and validation tests before production rollouts.
* Implement Infrastructure as Code using Terraform and ARM templates with version control and reproducibility.
* Troubleshoot complex deployment, provisioning, and performance issues across multi-cloud and containerized environments.
Minimum Qualifications
* Proven track record in SRE or DevOps roles operating production systems
* Hands-on experience running production workloads on Kubernetes in a cloud environment, including cluster design, autoscaling, upgrades, and network policies.
* Proven CI/CD delivery using GitHub Actions or Jenkins, including promotion across environments, approvals, and rollback strategies.
* Infrastructure as Code expertise with Terraform and ARM templates, including modules, remote state, workspaces, and policy enforcement.
* Strong scripting in PowerShell, Bash, or Python for automation and diagnostics.
* GitOps experience with Argo CD or Flux, managing multi-environment application delivery and drift remediation.
* Containerization with Docker and Kubernetes, including health probes, PodDisruptionBudgets, resource quotas, HorizontalPodAutoscaler, and operators.
* Networking fundamentals with cloud network security practices such as VNet design, NSGs, Private Link, and ingress controllers.
* Working knowledge of cloud security and compliance, including least privilege, secrets management, audit trails, and control evidence.
* Excellent written and spoken English.
* Ability to collaborate across US time zone.
Preferred Qualifications
* Microsoft Azure certification, such as Developer Associate, Administrator, or DevOps Engineer Expert
* Observability using Application Insights, Elastic Stack (ELK), Grafana, and Prometheus for metrics, logs, and traces.
* Experience with log aggregation and alerting at scale using Elastic and Prometheus.
* Understanding of high availability, scalability, disaster recovery, and cost optimization strategies.
* Experience managing Windows-based containerized applications.
Plus
* Experience in Google Cloud Platform (GCP) or AWS