About this role:
We're seeking a skilled software engineer to collaborate with AI researchers and work on high-impact open-source projects. You'll evaluate how Large Language Models (LLMs) perform on real bugs, issues, and developer tasks.
Key Responsibilities:
* Evaluate model-generated code responses for each task using a structured ranking system.
* Assess code quality, correctness, style, and efficiency.
* Provide clear, detailed rationales explaining the reasoning behind each ranking decision.
Requirements:
* 7+ years of professional software engineering experience at top-tier product companies.
* Strong fundamentals in software design, coding best practices, and debugging.
* Excellent ability to assess code quality and maintainability.
* Proficient with code review processes and reading diffs in real-world repositories.
Bonus Points:
* Experience in LLM research, developer agents, or AI evaluation projects.
* Background in building or scaling developer tools or automation systems.
Engagement Details:
* Commitment: ~20 hours/week (partial PST overlap required)
* Type: Contractor
* Duration: 1 month (potential extensions based on performance and fit)