Are you looking for a challenging role where you can utilize your analytical skills to ensure the highest quality of AI-generated content?
* Our ideal candidate will help us evaluate and stress-test AI prompts across multiple domains, developing and applying test cases to assess accuracy, bias, toxicity, hallucinations, and misuse potential in AI-generated responses.
Key Responsibilities:
* Conduct thorough evaluations of AI-generated content to identify vulnerabilities and risks.
* Develop and apply test cases to assess the accuracy and reliability of AI-generated responses.
* Collaborate with data scientists, safety researchers, and prompt engineers to report risks and suggest mitigations.
* Perform manual QA and content validation across model versions, ensuring factual consistency, coherence, and guideline adherence.
Requirements:
* Proven experience in AI evaluation, LLM testing, or adversarial prompt design.
* Familiarity with prompt engineering, NLP tasks, and ethical considerations in generative AI.
* Strong background in Quality Assurance, content review, or test case development for AI/ML systems.
* Understanding of LLM behaviors, failure modes, and model evaluation metrics.
* Excellent critical thinking, pattern recognition, and analytical writing skills.