Location: Remote - LATAM Schedule: Full-time (8 hrs/day) — must have 4 hrs overlap with PST
About the Role
We're looking for a hands-on Machine Learning Engineering Manager to lead cross-functional teams in designing, training, and deploying large-scale ML and LLM systems.
You'll drive the full lifecycle of AI development — from research and experimentation to distributed training and production deployment — while mentoring top-tier engineers and partnering closely with product, research, and infra leaders.
This role blends deep ML/MLOps expertise with strong leadership and execution, ensuring all AI initiatives translate into measurable business impact.
Key Responsibilities
Lead and mentor ML engineers, data scientists, and MLOps professionals.
Manage end-to-end ML/LLM project lifecycle: data pipelines, training, evaluation, deployment, and monitoring.
Provide technical direction for distributed training, large-scale model optimization, and system architecture.
Collaborate with Research, Product, and Infrastructure teams to define objectives, milestones, and KPIs.
Implement MLOps best practices: experiment tracking, CI/CD, model governance, observability.
Manage compute resources, cloud budgets, and enforce Responsible AI + data security standards.
Communicate technical progress, blockers, and results clearly to leadership and stakeholders.
Required Skills & Qualifications
5+ years of experience in Machine Learning, NLP, and Deep Learning (Transformers, LLMs).
2+ years leading teams delivering ML/LLM systems in production.
Strong proficiency in Python and frameworks like PyTorch, TensorFlow, Hugging Face, DeepSpeed.
Experience with distributed training, GPU/TPU optimization, and cloud platforms (AWS, GCP, Azure).
Knowledge of MLOps tools (MLflow, Kubeflow, Vertex AI, etc.).
Excellent leadership, communication, and cross-functional collaboration skills.
Bachelor's/Master's in Computer Science, Engineering, or related field (PhD preferred).
Nice to Have
Experience training or fine-tuning foundation models.
Contributions to open-source ML/LLM frameworks.
Knowledge of Responsible AI practices, bias mitigation, and model interpretability.