About FirstIgnite FirstIgnite is the AI-powered business development platform for university technology transfer offices (TTOs). We help research institutions turn breakthroughs into partnerships, licenses, and companies by combining deep LLM-driven workflows with the relationships that actually move deals forward. Our product suite spans expert discovery, grants search, and AI-driven outreach — all built on a modern, agentic stack. We ship fast, we measure everything, and we believe evaluations are the difference between AI features that demo well and AI features that work in production. The Role We're hiring an AI Evaluation Engineer to own the quality bar for every LLM-powered feature we ship. You'll design, build, and scale the infrastructure that tells us — with evidence — whether a prompt change, model swap, or agent refactor made things better or worse. This is a high-leverage role. Every customer-facing AI capability at FirstIgnite flows through your evals. You'll work directly with the Head of Engineering and partner closely with product, applied AI, and the full-stack team to establish evaluation as a first-class discipline across the company. What You'll Do Build evaluation infrastructure: Design and maintain eval suites using Promptfoo, LLM-as-judge methodologies, and custom harnesses for features like our expert search system, natural language grants search, and AI SDR agents. Define what 'good' means: Partner with product and domain experts to translate fuzzy customer outcomes (\ "does this surface the right principal investigator?\ ") into precise, measurable rubrics. Own the feedback loop: Instrument production traffic, curate golden datasets from real customer interactions, and build pipelines that turn user behavior into regression tests. Ship quickly under uncertainty: We routinely run 48-hour eval sprints for greenfield features with no production traffic. You'll be c