About Our Client
Our client is a Gartner-recognized leader in AI security and AI TRiSM solutions. They help organizations across sectors such as pharmaceuticals, healthcare, logistics, finance, and tourism to build, deploy, and monitor AI systems with best-in-class security and compliance. They are looking for a Senior DevOps Engineer to join their platform team as they scale their security-first infrastructure to support hybrid, cloud, and on-premise deployments of machine-learning workloads—including large language models.
Role Overview
As a Senior DevOps Engineer, you will design, build, and maintain the end-to-end infrastructure that powers our client's AI security platform. You'll collaborate closely with data scientists, ML engineers, and security architects to ensure that environments are scalable, reliable, and compliant with industry best practices.
This is a hands-on role that includes architecting containerized pipelines, automating deployments, and monitoring production workloads, particularly those supporting mission-critical LLM deployments.
Core Responsibilities
Containerization & Orchestration
Lead Docker image design and hardening for AI workloads.
Operate and optimize Kubernetes clusters (EKS, AKS, GKE) in hybrid and on-premise settings.
Customer-Facing Demos, Deployments & Training
Deploy AI security solutions into customers' private cloud environments, managing end-to-end implementation for both PoC and production.
Deliver comprehensive training including hands-on workshops, documentation, and video tutorials for both technical and non-technical users.
CI/CD & Release Management
Build and maintain CI/CD pipelines (Jenkins, GitLab CI, Azure DevOps) with automated testing, security scans, and canary deployments.
Enforce Git-based workflows, including branching strategies, pull-request reviews, and artifact versioning.
Monitoring, Logging & Alerting
Design observability stacks using Datadog, ELK/EFK; integrate metrics, logs, and traces.
Implement SLIs/SLOs and set up automated alerts to maintain 99.9% uptime.
Security & Compliance
Collaborate with security teams to implement practices like network segmentation, IAM policies, secrets management, and vulnerability scanning.
Document and enforce compliance controls aligned with SOC 2, GDPR, HIPAA, and other regulatory frameworks.
Collaboration & Mentorship
Serve as a technical leader by mentoring junior engineers, conducting architecture reviews, and driving DevOps best practices in an Agile/Scrum environment.
Communicate complex infrastructure concepts clearly to cross-functional stakeholders.
Must-Have Qualifications
5+ years of experience in DevOps, SRE, or related roles.
Deep expertise in Docker and Kubernetes (EKS, AKS, or GKE).
Strong cloud knowledge: AWS, Azure, or GCP—especially with hybrid and on-premise environments.
Hands-on experience with CI/CD tools like Jenkins, GitLab CI, or Azure DevOps.
Proficient in monitoring tools (Prometheus, Grafana, ELK/EFK) and defining SLIs/SLOs.
Strong Linux and scripting skills (Bash, Python).
Excellent communication and collaboration skills in Agile/Scrum environments.
Nice-to-Have Skills
Experience with LLM deployment tools (e.g., VLLM, Ollama).
Knowledge of service mesh technologies (Istio, Linkerd).
Familiarity with serverless (AWS Lambda, Azure Functions).
Exposure to SQL (PostgreSQL, MySQL) and NoSQL (MongoDB, Cassandra).
Advanced IaC experience: Terraform, CloudFormation, and policy-as-code.
What Our Client Offers
Competitive salary + performance-based bonus (5–20%).
Equity in a fast-growing, Gartner-recognized AI security startup.
Fully remote work environment with flexible hours.
Generous professional development and conference budget.
A collaborative culture where your contributions shape the future of secure AI.