About LLM Evaluation and Repository Validation:
Our project aims to build LLM evaluation and training datasets to train LLMs to work on realistic software engineering problems.
We approach this by building verifiable SWE tasks based on public repository histories in a synthetic approach with human-in-the-loop; while expanding the dataset coverage to different types of tasks in terms of programming language, difficulty level, etc.
* Analyze and triage GitHub issues across trending open-source libraries
* Set up and configure code repositories, including Dockerization and environment setup
* Evaluate unit test coverage and quality
* Modify and run codebases locally to assess LLM performance in bug-fixing scenarios
* Collaborate with researchers to design and identify repositories and issues that are challenging for LLMs
Key Responsibilities:
* Experience with at least one of the following languages: Python, JavaScript, Java, Go, Rust, C/C++, C#, or Ruby
* Proficiency with Git, Docker, and basic software pipeline setup
* Ability to understand and navigate complex codebases
* Comfortable running, modifying, and testing real-world projects locally
Benefits:
* Work in a fully remote environment
* Opportunity to work on cutting-edge AI projects with leading LLM companies
* 20 hours per week commitment required with some overlap with PST