Senior Software Engineer for LLM Evaluation and Repository Validation
You will be working on building LLM evaluation and training datasets to train LLMs to work on realistic software engineering problems. This involves creating verifiable SWE tasks based on public repository histories in a synthetic approach with human-in-the-loop, while expanding the dataset coverage to different types of tasks in terms of programming language, difficulty level, and more.
This role requires experienced software engineers who are familiar with high-quality public GitHub repositories and can contribute to this project. You should have experience working with well-maintained, widely-used repos with 500+ stars. Your responsibilities will include hands-on software engineering work, including development environment automation, issue triaging, and evaluating test coverage and quality.
The ideal candidate will have strong experience with at least one of the following languages: Python, JavaScript, Java, Go, Rust, C/C++, C#, or Ruby. They should also have proficiency with Git, Docker, and basic software pipeline setup, as well as the ability to understand and navigate complex codebases. Additionally, they should be comfortable running, modifying, and testing real-world projects locally. Experience contributing to or evaluating open-source projects is a plus.
The successful candidate will have the opportunity to work on cutting-edge AI projects with leading LLM companies. As a contractor, you will be required to commit 20 hours per week with some overlap with PST. The duration of the contract is 1 month, with an expected start date as next week.