About the Role
">
We are seeking a highly skilled Senior Software Engineer - Large Language Model (LLM) to join our team. In this role, you will be working at the intersection of software engineering, open-source ecosystems, and frontier AI.
">
Project Overview
">
We are building high-quality evaluation and training datasets to improve how LLMs interact with realistic software engineering tasks. A key focus of this project is curating verifiable software engineering challenges from public GitHub repository histories using a human-in-the-loop process.
">
Key Responsibilities
">
* Review and compare model-generated code responses for each task using a structured ranking system.
">
* Evaluate code diffs for correctness, code quality, style, and efficiency.
">
* Provide clear, detailed rationales explaining the reasoning behind each ranking decision.
">
* Maintain high consistency and objectivity across evaluations.
">
* Collaborate with the team to identify edge cases and ambiguities in model behavior.
">
">
Requirements
">
* 7+ years of professional software engineering experience.
">
* Strong fundamentals in software design, coding best practices, and debugging.
">
* Excellent ability to assess code quality, correctness, and maintainability.
">
* Proficient with code review processes and reading diffs in real-world repositories.
">
* Exceptional written communication skills to articulate evaluation rationale clearly.
">
* Prior experience with LLM-generated code or evaluation work is a plus.
">
">
Bonus Points
">
* Experience in LLM research, developer agents, or AI evaluation projects.
">
* Background in building or scaling developer tools or automation systems.
">
">
Engagement Details
">
* Commitment: ~20 hours/week.
">
* Type: Contractor.
">
* Duration: 1 month (potential extensions based on performance and fit).
">