OverviewAt Zyte, we eat data for breakfast and you can eat your breakfast anywhere and work for Zyte. Founded in 2010, we are a globally distributed team of over 250 Zytans working from over 28 countries who are on a mission to enable our customers to extract the data they need to continue to innovate and grow their businesses. We believe that all businesses deserve a smooth pathway to data.For more than a decade, Zyte has led the way in building powerful, easy-to-use tools to collect, format, and deliver web data, quickly, dependably, and at scale. Today, the data we extract helps thousands of organizations make smarter business decisions, secure competitive advantage, and drive sustainable growth. Today, over 3,000 companies and 1 million developers rely on our tools and services to get the data they need from the web.ResponsibilitiesDesign and implement AI-driven quality checks: build models to detect anomalies, identify schema drift, and classify data errors in real timeAutomate and scale QA: replace manual and rule-based validation with ML-powered solutions that continuously improveLeverage GenAI for validation: use embedding models, LLMs, and prompt-driven pipelines to perform semantic checks on scraped dataDevelop monitoring & alerting pipelines: quantify data quality via KPIs, dashboards, and automated reports for stakeholdersExperiment & innovate: research and prototype new AI techniques for QA, e.g. using embeddings, synthetic data, and reinforcement learning to stress-test scrapersCollaborate cross-functionally: work with developers, product managers, and account teams to integrate AI-based QA into production workflowsCommunicate insights: present findings with clear visualizations, metrics, and evidence-based recommendations to technical and non-technical audiencesQualificationsProficiency in Python & PyData stack (NumPy, pandas, scikit-learn, PyTorch/TensorFlow preferred)3+ years in a data science, applied ML, or data engineering role (ideally with exposure to QA or data validation at scale)Hands-on experience with GenAI tools: LLM APIs (OpenAI, Anthropic, Google), prompt engineering, cost/token optimizationStrong ML fundamentals: anomaly detection, classification, clustering, embeddings, evaluation metricsExperience with big data frameworks (Spark, BigQuery, or similar)Ability to work with very large datasets (millions+ of records)Version control skills (GitHub/Bitbucket)Excellent communication in English, both technical and non-technicalDesired SkillsPrior experience in data quality automation or web data QAFamiliarity with LangChain, MCP, Marvin, or similar orchestration frameworksExperience building QA dashboards or visualization layersBackground in statistics or applied mathematicsPrevious remote/distributed work experienceBenefitsBecome part of a self-motivated, progressive, multi-cultural team.Have the freedom and flexibility to work from where you do your best work.Attend conferences and meet with team members from across the globe.Work with cutting-edge open source technologies and tools.Seniority levelMid-Senior levelEmployment typeFull-timeJob functionOtherIndustriesIT Services and IT Consulting
#J-18808-Ljbffr