**Large Language Model Evaluation Specialist**
We are seeking a skilled and linguistically aware professional to evaluate and enhance multilingual prompt-response datasets for large language models. This role involves designing evaluation rubrics, assessing translations and model outputs, creating prompts, and identifying cultural nuances and biases in LLM behavior.
Key Responsibilities:
* Create region/language-specific rubric definitions to ensure cultural and linguistic relevance.
* Identify the need for additional rubrics tailored to specific languages or regional contexts.
* Review translated prompts from English into the target language and revise where translations appear unnatural or inaccurate.
* Develop thoughtful prompts to test the cultural awareness of LLM models.
* Evaluate prompt-response pairs using a standardized template based on rubrics and provide detailed justifications.
* Document problematic outputs with clear explanations of rubric violations or cultural insensitivities.
Required Qualifications:
* Native proficiency in the target language and deep familiarity with cultural norms in the corresponding region.
* Experience in LLM evaluation, content moderation, or linguistic QA is preferred.
* Strong attention to detail with the ability to identify subtle issues in language use, tone, and cultural references.
* Comfortable working in spreadsheets and evaluation templates.
* A Master's degree in a relevant field is required.
Preferred Qualifications:
* Prior experience with prompt engineering or LLM testing is beneficial.
* Familiarity with tools such as Gemini, ChatGPT, or similar LLM platforms is an asset.
* Ability to clearly articulate reasoning behind rubric ratings or prompt edits.
The ideal candidate will be able to work independently and as part of a team, with strong communication skills and attention to detail.