Overview
Join to apply for the Machine Learning Ops Engineer (LATAM) role at Lateral Group .
Lateral is a profitable, award-winning design and technology company with over 20 years of experience launching bold ventures and transforming businesses. A globally distributed team of 200+ experts united by the pursuit of quality.
We do things differently at Lateral. Our mission is simple: design and build great products. We work with speed, focus, and integrity, delivering high-quality work and continuous improvement.
What You'll Do
 * Infrastructure Management: define and propose an infrastructure management stack that drives business objectives.
 * Troubleshooting and Optimization: identify and mitigate AI infrastructure issues and improve model training speed on specific hardware.
 * Platform Evaluation and Implementation: evaluate and implement new AI training and development platforms.
 * Automation and Orchestration: automate model training and checkpointing using MLOps tools; maintain containerization tools (Docker, Singularity) for reproducibility.
 * Deployment and Lifecycle Management: transfer and replicate models from R&D to production; manage model lifecycle, tracking, and ensure compatibility with evolving training packages (e.g., CUDA, PyTorch, drivers). Update packages proactively to avoid regressions.
What We're Looking For
 * 5+ years of hands-on experience with ML Ops tools such as SLURM, MLflow, Kubeflow, SageMaker, or Vertex AI.
 * Experience managing Kubernetes clusters and distributed training workloads at scale.
 * Proficiency with containerization (Docker, Singularity) and reproducible ML environments.
 * Familiarity with popular deep learning frameworks (PyTorch, TensorFlow) and how they operate at infra level.
 * Strong understanding of model lifecycle best practices (training, validation, deployment, tracking).
 * Strong scripting and automation skills in Python, Bash, or similar.
 * Comfort working closely with ML researchers to translate needs into scalable, production-grade systems.
 * A proactive mindset: you're excited to take ownership of infra problems others avoid.
Bonus points for
 * Experience with multi-node, hardware-optimized training setups (e.g., GPU clusters, TPUs).
 * Contributions to internal tools or open-source projects in the ML Infra space.
 * Prior experience helping bring ML systems through regulatory, safety, or quality review stages.
Why You'll Love Working Here
 * Real Impact: meaningful products across healthcare, commerce, sustainability, and next-gen tech.
 * Remote-First, Office Friendly: work from anywhere; offices available for in-person collaboration if desired.
 * Async collaboration that respects time zones and outcomes over hours.
 * Outstanding Team: talented, kind professionals who care about craft and each other.
 * Growth: opportunities to grow your craft and take on greater responsibility at your pace.
 * Culture of Excellence: high-quality work delivered sustainably with no burnout or crunch.
 * Variety & Stability: profitable, independent, and over a decade of experience with new challenges in each project.
How to Apply and What to Expect
 1. Express Your Interest: send your resume, a short note about what excites you, and links to your work that show how you think and build.
 2. Talent Partner Conversation: a structured discussion with our People Experience team about your career trajectory and fit with Lateral's values.
 3. Technical Interview: evaluate your technical proficiency with real-world scenarios relevant to the role.
 4. Client Interview: assess collaboration and communication with external stakeholders.
 5. Operational Interview: understand your approach to prioritizing, collaborating, shipping, and iterating.
 6. Reference Checks: we may contact 2–3 references who've worked closely with you to understand patterns, strengths, and fit.
 7. Offer: if selected, you'll receive a comprehensive offer detailing compensation and other pertinent information.
Join us and let's build something extraordinary.
#J-18808-Ljbffr