Senior software engineer for large language model evaluation

Timbó

beBee Careers

Modelista

Anunciada dia 14 junho

Descrição

About the Project

We are building high-quality evaluation and training datasets to improve how Large Language Models (LLMs) interact with realistic software engineering tasks.

A key focus of this project is curating verifiable software engineering challenges from public GitHub repository histories using a human-in-the-loop process.

What You'll Do

* Review and compare 3–4 model-generated code responses for each task using a structured ranking system.
* Evaluate code diffs for correctness, code quality, style, and efficiency.
* Provide clear, detailed rationales explaining the reasoning behind each ranking decision.
* Maintain high consistency and objectivity across evaluations.
* Collaborate with the team to identify edge cases and ambiguities in model behavior.

Requirements

* 7+ years of professional software engineering experience.
* Strong fundamentals in software design, coding best practices, and debugging.
* Excellent ability to assess code quality, correctness, and maintainability.
* Proficient with code review processes and reading diffs in real-world repositories.
* Exceptional written communication skills to articulate evaluation rationale clearly.
* Prior experience with LLM-generated code or evaluation work is a plus.

Engagement Details

* Commitment: ~20 hours/week.
* Type: Contractor.
* Duration: 1 month, with potential extensions based on performance and fit.

Se candidatar

Criar um alerta

Salvar

Vaga parecida

Modelista

Indaial

People Capital Humano

Modelista

Vaga parecida

Modelista pl - geral

Pomerode

Grupo Kyly

Modelista

Vaga parecida

Especialista em modelos de inteligência artificial

Timbó

beBee Careers

Modelista