Job Summary:
We seek highly detail-oriented and analytically skilled professionals to evaluate AI system outputs across various modalities.
Responsibilities:
1. Evaluate outputs generated by large language models across multiple formats, including text, images, videos, and multimodal interactions.
2. Assess quality against project-specific criteria such as correctness, coherence, completeness, style, cultural appropriateness, and safety.
3. Identify subtle errors, hallucinations, or biases in AI responses.
4. Apply domain expertise and logical reasoning to resolve ambiguous or unclear outputs.
5. Provide detailed written feedback, tagging, and scoring of outputs to ensure consistency across the evaluation team.
6. Escalate unclear cases and contribute to refining evaluation guidelines.
7. Collaborate with project managers and quality leads to meet accuracy, reliability, and turnaround benchmarks.
Requirements:
* Bachelor's degree or equivalent educational qualification.
* At least one year of experience in data annotation, LLM evaluation, content moderation, or related AI/ML domains.
* Demonstrated experience working with data annotation tools and software platforms.
* Strong understanding of language and multimodal communication (instruction following in image generation, fact-checking, narrative coherence in video, etc.).
* Ability to adapt quickly to changing project directions and fast-paced work environments.
* Previous experience creating or annotating complex data specifically for large language model training.
* Prior exposure to generative AI, prompt engineering, or LLM fine-tuning workflows is an asset.