Senior Software Engineer - LLM Evaluation and Training
We are looking for experienced software engineers to contribute to our project of building LLM evaluation and training datasets.
The goal is to train LLMs to work on realistic software engineering problems by creating verifiable SWE tasks based on public repository histories in a synthetic approach with human-in-the-loop.
This role involves hands-on software engineering work, including development environment automation, issue triaging, and evaluating test coverage and quality.
Main Responsibilities:
* Analyze and triage GitHub issues across trending open-source libraries
* Set up and configure code repositories, including Dockerization and environment setup
* Evaluate unit test coverage and quality
* Modify and run codebases locally to assess LLM performance in bug-fixing scenarios
* Collaborate with researchers to design and identify repositories and issues that are challenging for LLMs
Requirements:
* Strong experience with at least one of the following languages: Python, JavaScript, Java, Go, Rust, C/C++, C#, or Ruby
* Experience working with well-maintained, widely-used repositories with 500+ stars
* Proficiency with Git, Docker, and basic software pipeline setup
* Ability to understand and navigate complex codebases
* Comfortable running, modifying, and testing real-world projects locally
Benefits:
* Fully remote work environment
* Opportunity to work on cutting-edge AI projects
Commitment: 20 hours per week with some overlap with PST