Our team is seeking skilled professionals with expertise in red teaming, prompt evaluation, and AI/LLM quality assurance.
The ideal candidate will rigorously test and evaluate AI-generated content to identify vulnerabilities, assess risks, and ensure compliance with safety, ethical, and quality standards.
* Conduct thorough analyses of AI systems to identify potential security threats and flaws.
* Evaluate and stress-test AI prompts across multiple domains to uncover potential failure modes.
* Develop and apply test cases to assess accuracy, bias, toxicity, hallucinations, and misuse potential in AI-generated responses.
* Collaborate with data scientists, safety researchers, and prompt engineers to report risks and suggest mitigations.
* Perform manual QA and content validation across model versions, ensuring factual consistency, coherence, and guideline adherence.
* Create evaluation frameworks and scoring rubrics for prompt performance and safety compliance.
* Document findings, edge cases, and vulnerability reports with high clarity and structure.