Research engineer, rl environments

São Gonçalo dos Campos

Jobrapido

Anunciada dia A 14 h atrás

Descrição

About Turing

Turing builds large-scale datasets and reinforcement learning (RL) environments that power post-training for the world's leading AI labs and enterprises, including OpenAI, Anthropic, Google DeepMind, Microsoft AI, Amazon, Apple, and many more. We create RL environments to evaluate and improve our customers' models on complex, multi-step workflows across high-value domains — designing the tasks, reward signals, and verifiers that drive measurable model improvement through RL training.

The Role

We are looking for a Research Engineer, RL Environments to help design and build frontier-quality RL environments, reward systems, and evaluations that improve state-of-the-art models for leading AI labs and enterprise clients.

This is a hands-on, research-facing technical role. You will work directly with customer researchers and engineers to translate their post-training and RL goals into concrete environment specs — designing tasks with verifiable outcomes, building reward and verifier systems, and ensuring environments are robust against reward hacking and distribution collapse.

We're targeting candidates with roughly 4–5 years of experience building deep learning or RL systems — especially where strong results depended on environment design, reward engineering, training infrastructure, or rigorous evaluation.

What You'll Do:

1. Design and build RL environments and verifier systems

* Work with customer researchers to translate RL training goals into environment specs: target capabilities, task distributions, difficulty curves, and success criteria.
* Design task suites with verifiable outcomes — single-step and long-horizon workflows with clear ground-truth signals.
* Build reward functions, verifiers, automatic validators, and evaluation harnesses that provide reliable training signal and resist gaming.
* Define environment interfaces: APIs, tool schemas, state abstractions, database schemas, and simulator-like dynamics that give agents realistic constraints.

2. Build quality systems that keep environments robust at scale

* Audit environments and trajectories for subtle failure modes: reward hacking, leakage, ambiguity, distribution shifts, and degenerate task distributions.
* Implement automated validation: deduplication, decontamination, consistency checks, difficulty controls, and diversity monitoring.
* Develop synthetic task generation and augmentation pipelines: programmatic generators, controlled perturbations, hard negatives, and scenario templating with diversity constraints.

3. Prove impact through evaluations and training runs

* Design and run evals reflecting customer-intended model capabilities.
* Produce analysis connecting environment design choices to training outcomes: pre/post comparisons, error breakdowns, and ablations identifying which environment and reward attributes drive model lift.
* When needed, run RL or fine-tuning experiments (or partner with research) to validate that your environments produce measurable improvement.

4. Collaborate with cross-functional delivery teams

* Provide clear specs, examples, and edge cases to engineers, domain SMEs, and data production groups.
* Run fast feedback loops based on trajectory analysis and quantitative signals.
* Review and improve outputs from large-scale task and environment creation efforts.

Who We're Looking For:

1. Required Qualifications

* 4–5 years of experience building or improving deep learning or RL systems where environment design, data quality, or reward engineering mattered materially.
* Hands-on experience with reinforcement learning — any of: environment/gym development, reward design, PPO/GRPO/DPO, trajectory collection, or RL training infrastructure.
* Strong intuition for what makes a good training task: verifiability, appropriate difficulty, resistance to shortcuts, and realistic constraints.
* Demonstrated ability to be extremely detail-oriented in diagnosing subtle quality issues and failure modes in data or environments.
* Python proficiency required;
comfort with SQL and structured data workflows strongly preferred.
* Ability to communicate clearly with researchers and engineers — turning RL training objectives into concrete environment specs.

2. Highly valued experience

* Post-training experience: RLHF/RLAIF, verifier training, reward modeling, or constitutional AI methods.
* Experience building custom gym environments, simulators, or distributed RL training systems.
* Agentic evaluation: tool use, multi-step workflows, long-horizon tasks, trajectory analysis.
* Systems thinking: ability to simulate an application's API and data schema and design tasks reflecting real-world constraints.
* Experience with LLM fine-tuning at scale.

Why Turing

* Work directly with the world's leading AI labs on the RL environments powering post-training for frontier models.
* Real impact: your environments and reward systems will directly shape how models learn to reason, act, and improve.
* Talent-dense team with high autonomy, rapid iteration, and an exceptional learning curve.

Values:

* We are client first: We put our clients at the center of everything we do, because their success is the ultimate measure of our value.
* We work at Start-Up Speed: We move fast, stay agile and favor action because momentum is the foundation of perfection
* We are Al forward: We help our clients build the future of Al and implement it in our own roles and workflow to amplify productivity.

Advantages of joining Turing:

* Amazing work culture (Super collaborative & supportive work environment;
5 days a week)
* Awesome colleagues (Surround yourself with top talent from Meta, Google, LinkedIn etc. as well as people with deep startup experience)
* Competitive compensation
* Flexible working hours

Don’t meet every single requirement?

Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single qualification. Turing is proud to be an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, disability, protected veteran status, or any other legally protected characteristics. At Turing we are dedicated to building a diverse, inclusive and authentic workplace and celebrate authenticity, so if you’re excited about this role but your past experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply anyways. You may be just the right candidate for this or other roles.

For applicants from the European Union, please review Turing's GDPR notice here.

Se candidatar

Criar um alerta

Salvar