High-Quality Evaluation and Training Datasets
About Us
Turing is a leading AI company pushing the boundaries of AI-assisted software development. Our mission is to empower next-gen AI systems to reason about real-world software repositories.
Project Overview
We're building high-quality evaluation and training datasets to improve LLMs' interaction with realistic software engineering tasks. A key focus is curating verifiable software engineering challenges from public GitHub repository histories using a human-in-the-loop process.
Why This Role Is Unique
* Collaborate directly with AI researchers shaping the future of AI-powered software development.
* Work with high-impact open-source projects and evaluate how LLMs perform on real bugs, issues, and developer tasks.
* Influence dataset design that will train and benchmark next-gen LLMs.
Required Skills and Qualifications
* 7+ years of professional software engineering experience at top-tier product companies.
* Strong fundamentals in software design, coding best practices, and debugging.
* Excellent ability to assess code quality, correctness, and maintainability.
* Proficient with code review processes and reading diffs in real-world repositories.
* Exceptional written communication skills to articulate evaluation rationale clearly.
Engagement Details
* Commitment: ~20 hours/week (partial PST overlap required)
* Type: Contractor (no medical/paid leave)
* Duration: 1 month (potential extensions based on performance and fit)