About Turing:
Based in San Francisco, California, Turing is the world’s leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing supports customers in two ways: first, by accelerating frontier research with high-quality data, advanced training pipelines, plus top AI researchers who specialize in coding, reasoning, STEM, multilinguality, multimodality, and agents; and second, by applying that expertise to help enterprises transform AI from proof of concept into proprietary intelligence with systems that perform reliably, deliver measurable impact, and drive lasting results on the P&L
Role Overview:
We are looking for a SwarmBench Task Engineer specializing in planning and operations to design and build complex, multi-agent benchmark tasks that simulate real-world planning, scheduling, and operational decision-making scenarios. This role focuses on creating constraint-rich problems that evaluate multi-agent reasoning, decomposition, and optimization capabilities in realistic environments.
What does day-to-day life look like?
* Design and develop multi-agent benchmark tasks involving:
* Planning, scheduling, and resource allocation
* Operational decision-making (project management, logistics, incident response, capacity planning)
* Create constraint-rich problem statements with multiple interacting variables
* Develop verification scripts to evaluate:
* Feasibility (all constraints satisfied)
* Completeness (all requirements addressed)
* Optimality (efficient solutions)
* Build decomposition strategies:
* Split tasks across specialized sub-agents (resource-based, constraint-based, conflict resolution, optimization)
* Model real-world operational scenarios with dependencies, timelines, and resource constraints
* Collaborate on improving task quality, coverage, and evaluation rigor
Requirements:
* 5+ years of experience in operations, project management, logistics, or supply chain
* Strong ability to formalize constraints, dependencies, and scheduling logic
* Proficiency in Python for building verification and validation scripts
* Strong structured problem-solving and decomposition skills
* Clear and precise technical writing skills
* Experience with AI coding benchmarks (e.g., SWE-bench, Terminal-bench)
* Hands-on experience with Docker (Dockerfiles, image builds, debugging)
Nice to have:
* Experience with optimization techniques (linear programming, constraint satisfaction, scheduling algorithms)
* Background in operations research
* Experience with simulation or modeling tools
* Knowledge of AI planning systems or automated reasoning
* Project management experience or certifications (PMP, Agile, etc.)
Perks of Freelancing With Turing:
* Work in a fully remote environment.
* Opportunity to work on cutting-edge AI projects with leading LLM companies.
Offer Details:
* Commitments Required: 40 hours per week with overlap of 4 hours with PST.
* Engagement Type: Contractor assignment (no medical/paid leave)
* Duration of Contract: 4 weeks (adjustable based on engagement)
* Location : Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria, Turkey, Vietnam
Evaluation Process:
* Take home assessment
After applying, you will receive an email with a login link. Please use that link to access the portal and complete your profile.
Know amazing talent? Refer them at turing.com/referrals, and earn money from your network.