 
        
        Job Overview
We are seeking a highly analytical and detail-oriented professional with hands-on experience in AI/LLM quality assurance, red teaming, and prompt evaluation.
About the Role:
 * Conduct comprehensive security tests to identify adversarial or harmful outputs from large language models (LLMs).
 * Evaluate and stress-test AI prompts across multiple domains (e.g., finance, healthcare, security) to uncover potential failure modes.
 * Develop and apply test cases to assess accuracy, bias, toxicity, hallucinations, and misuse potential in AI-generated responses.
 * Collaborate with data scientists, safety researchers, and prompt engineers to report risks and suggest mitigations.
 * Perform manual content validation across model versions, ensuring factual consistency, coherence, and guideline adherence.
 * Create evaluation frameworks and scoring rubrics for prompt performance and safety compliance.
 * Document findings, edge cases, and vulnerability reports with high clarity and structure.
Necessary Skills and Qualifications:
 * Prior experience in AI red teaming, LLM safety testing, or adversarial prompt design.
 * Familiarity with prompt engineering, NLP tasks, and ethical considerations in generative AI.
 * Strong background in quality assurance, content review, or test case development for AI/ML systems.
 * Understanding of LLM behavior, failure modes, and model evaluation metrics.
 * Excellent critical thinking, pattern recognition, and analytical writing skills.
 * Ability to work independently, follow detailed evaluation protocols, and meet tight deadlines.
Desirable Attributes:
 * Prior work with teams focused on LLM safety initiatives.
 * Experience in risk assessment, red team security testing, or AI policy & governance.
Linguistics, psychology, or computational ethics background is advantageous. Kindly proceed with the assessment below.