About FirstIgnite:
FirstIgnite builds best-in-class software that accelerates commercialization and innovation by connecting scientific research with industry partners. We support universities, national labs, and research hospitals worldwide with tools that simplify complex science and identify real partnership opportunities.
Role Overview:
The Data Engineer will lead the design, development, and maintenance of scalable data systems that integrate diverse datasets. This role includes deploying AI and machine learning solutions to enhance insights, manage complex data relationships, and drive actionable intelligence for FirstIgnite’s platform. You will mentor and manage a team of data engineers while working cross-functionally to innovate and deliver impactful AI-driven solutions.
Responsibilities :
* Lead the architecture, development, and optimization of scalable data pipelines for patents, grants, clinical trials, publications, labs, and firmographic sources.
* Build and maintain ETL workflows on AWS using Glue (PySpark), Lambda, S3, and RDS PostgreSQL.
* Explore and implement complementary data storage solutions as needed.
* Apply machine learning and LLM-based techniques to extract, structure, and enrich data at scale.
* Ensure data integrity, security, accuracy, and governance across all systems.
* Implement tools for data exploration, visualization, and actionable insights.
* Collaborate with Product, Engineering, and Customer Success teams to drive AI-driven innovation.
* Evaluate emerging AI/ML techniques and tools to enhance platform capabilities.
* Maintain version control, CI/CD pipelines, and reproducible workflows for the data engineering team.
Requirements :
* Strong programming expertise in Python, with experience building production data pipelines.
* Hands-on experience with AWS data services (Glue/PySpark, Lambda, S3, RDS PostgreSQL, SSM) or equivalent cloud data stacks.
* Familiarity with RESTful and GraphQL APIs, ETL processes, and data integration strategies.
* Understanding of Retrieval-Augmented Generation (RAG) and AI-enhanced data applications.
* Strong analytical skills, including evaluating trade-offs of data ingestion and storage methods.
* Leadership experience managing data engineering or AI teams in SaaS or tech environments.
* Solid knowledge of version control (Git) and CI/CD practices.
* Exposure to data science / ML libraries (Pandas, Scikit-learn, PyTorch) and managed ML platforms (SageMaker, Vertex AI) is a plus.
* Excellent problem-solving, collaboration, and communication skills in cross-functional teams.
* Ability to work independently, prioritize tasks, and drive projects from conception to deployment.
Additional Qualifications:
* Experience with web scraping at scale (Firecrawl or similar).
* LLM-based data extraction and prompt engineering for structured output.
* Data quality instrumentation and observability for pipelines.