Job Title: Lead DevOps Engineer
Location: Remote
Experience Level: 10+ years
About the Role
We are seeking a highly experienced Lead DevOps Engineer to spearhead our cloud infrastructure and DevOps initiatives. In this role, you will lead a small but growing team of engineers and drive the strategic direction of our DevOps practices. The ideal candidate has a proven track record of building reliable, scalable, and secure platforms, coupled with the leadership skills to mentor engineers and align infrastructure with business goals.
Key Responsibilities
* Leadership & Strategy
* Lead, mentor, and grow a DevOps engineering team, fostering a culture of automation, reliability, and continuous improvement.
* Define best practices, standards, and architectural direction for DevOps across the organization.
* Partner with engineering, security, and product teams to ensure infrastructure supports business needs.
Cloud & Infrastructure
* Design, implement, and manage large-scale cloud infrastructure (AWS, Azure, or GCP).
* Architect and maintain infrastructure as code (IaC) using tools like Terraform, Pulumi, or CloudFormation.
* Establish and enforce high-availability and disaster recovery strategies.
Automation & CI/CD
* Build and optimize CI/CD pipelines for efficient, secure, and reliable software delivery.
* Automate operational tasks, deployments, monitoring, and scaling.
* Ensure fast feedback loops and minimal downtime through advanced release strategies (blue/green, canary).
Reliability & Observability
* Implement and manage monitoring, logging, and alerting systems (e.G., Prometheus, Grafana, ELK/EFK, Datadog, New Relic).
* Drive service-level objectives (SLOs) and error budgets to enhance reliability.
* Perform root cause analysis and lead postmortems to prevent recurrence.
Security & Compliance
* Embed security practices into infrastructure and CI/CD pipelines (“shift-left” security).
* Ensure compliance with industry standards (ISO 27001, SOC2, HIPAA, GDPR, etc.).
* Implement secrets management, access controls, and vulnerability scanning.
Collaboration & Mentorship
* Provide technical guidance, code reviews, and hands-on support to team members.
* Collaborate cross-functionally with developers, QA, and operations teams.
* Advocate for DevOps culture, evangelizing best practices throughout the organization.
Required Qualifications
* 10+ years of experience in DevOps, Site Reliability Engineering (SRE), or Platform Engineering.
* 5+ years of leadership experience (team lead, manager, or architect role).
* Expert-level proficiency in at least one major cloud provider (AWS, Azure, or GCP).
* Strong hands-on experience with:
* IaC: Terraform, Pulumi, or CloudFormation
* CI/CD: Jenkins, GitHub Actions, GitLab CI, ArgoCD, Spinnaker, etc.
* Containers & Orchestration: Docker, Kubernetes, Helm
* Observability Tools: Prometheus, Grafana, ELK/EFK, Datadog, Splunk, New Relic
* Security Tools: HashiCorp Vault, AWS IAM, OPA, Prisma, Aqua Security
* Proven track record of designing and maintaining large-scale, highly available, secure systems.
* Strong background in Linux/Unix systems administration and networking fundamentals.
* Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.).
* Excellent communication, leadership, and collaboration skills.
Nice to Have
* Experience with hybrid or multi-cloud environments.
* Familiarity with service meshes (Istio, Linkerd) and API gateways.
* Background in cost optimization and FinOps practices.
* Contributions to open-source DevOps or cloud-native projects.