Machine learning ops engineer

São Paulo (SP)

Lateral Group

Anunciada dia 4 outubro

Descrição

Overview

Join to apply for the Machine Learning Ops Engineer (LATAM) role at Lateral Group .

Lateral is a profitable, award-winning design and technology company with over 20 years of experience launching bold ventures and transforming businesses. A globally distributed team of 200+ experts united by the pursuit of quality.

We do things differently at Lateral. Our mission is simple: design and build great products. We work with speed, focus, and integrity, delivering high-quality work and continuous improvement.

What You'll Do

* Infrastructure Management: define and propose an infrastructure management stack that drives business objectives.
* Troubleshooting and Optimization: identify and mitigate AI infrastructure issues and improve model training speed on specific hardware.
* Platform Evaluation and Implementation: evaluate and implement new AI training and development platforms.
* Automation and Orchestration: automate model training and checkpointing using MLOps tools; maintain containerization tools (Docker, Singularity) for reproducibility.
* Deployment and Lifecycle Management: transfer and replicate models from R&D to production; manage model lifecycle, tracking, and ensure compatibility with evolving training packages (e.g., CUDA, PyTorch, drivers). Update packages proactively to avoid regressions.

What We're Looking For

* 5+ years of hands-on experience with ML Ops tools such as SLURM, MLflow, Kubeflow, SageMaker, or Vertex AI.
* Experience managing Kubernetes clusters and distributed training workloads at scale.
* Proficiency with containerization (Docker, Singularity) and reproducible ML environments.
* Familiarity with popular deep learning frameworks (PyTorch, TensorFlow) and how they operate at infra level.
* Strong understanding of model lifecycle best practices (training, validation, deployment, tracking).
* Strong scripting and automation skills in Python, Bash, or similar.
* Comfort working closely with ML researchers to translate needs into scalable, production-grade systems.
* A proactive mindset: you're excited to take ownership of infra problems others avoid.

Bonus points for

* Experience with multi-node, hardware-optimized training setups (e.g., GPU clusters, TPUs).
* Contributions to internal tools or open-source projects in the ML Infra space.
* Prior experience helping bring ML systems through regulatory, safety, or quality review stages.

Why You'll Love Working Here

* Real Impact: meaningful products across healthcare, commerce, sustainability, and next-gen tech.
* Remote-First, Office Friendly: work from anywhere; offices available for in-person collaboration if desired.
* Async collaboration that respects time zones and outcomes over hours.
* Outstanding Team: talented, kind professionals who care about craft and each other.
* Growth: opportunities to grow your craft and take on greater responsibility at your pace.
* Culture of Excellence: high-quality work delivered sustainably with no burnout or crunch.
* Variety & Stability: profitable, independent, and over a decade of experience with new challenges in each project.

How to Apply and What to Expect

1. Express Your Interest: send your resume, a short note about what excites you, and links to your work that show how you think and build.
2. Talent Partner Conversation: a structured discussion with our People Experience team about your career trajectory and fit with Lateral's values.
3. Technical Interview: evaluate your technical proficiency with real-world scenarios relevant to the role.
4. Client Interview: assess collaboration and communication with external stakeholders.
5. Operational Interview: understand your approach to prioritizing, collaborating, shipping, and iterating.
6. Reference Checks: we may contact 2–3 references who've worked closely with you to understand patterns, strengths, and fit.
7. Offer: if selected, you'll receive a comprehensive offer detailing compensation and other pertinent information.

Join us and let's build something extraordinary.

#J-18808-Ljbffr

Se candidatar

Criar um alerta

Salvar