We need someone who understands data deeply and uses Python to wrangle it — not a platform engineer, not a pure pipeline builder, but a data specialist who is comfortable with research, investigation, and the unglamorous work of making messy energy market data actually usable.
You’ll spend significant time on tasks like mapping BM units to power plants and fuel types, reconciling legacy data formats with current ones, ensuring consistency between different Elexon message types, and cleaning time-series data (outliers, gaps, overlaps). Some of this requires genuine investigation — cross‑referencing sources, making judgment calls, documenting edge cases. There’s no API that solves these problems for you. Python is your primary tool (Pandas, Numpy, standard libraries) to minimise manual effort, but you should be comfortable that some detective work is unavoidable. If you find satisfaction in truly understanding a dataset’s structure and quirks — rather than just piping data through and hoping for the best — this role is for you.
Data Mapping and Research
Map BM units from Elexon to their corresponding power plants, substations, and fuel types — combining API data, public registers, and manual research
Map substations to ETYS zones and grid supply points
Build and maintain reference/master datasets that link identifiers across disparate sources (Elexon, National Grid ESO, TEC register, etc.)
Document mappings, assumptions, and known limitations clearly for downstream users
Data Reconciliation and Consistency
Reconcile legacy data formats with current formats (e.g., historical operational data stored in different schemas or granularities)
Ensure consistency between different Elexon message types — understand the market data structure well enough to know why BOALF, BOD, and DISBSAD might not perfectly align and how to handle it
Investigate discrepancies between data sources and determine authoritative values
Data Cleaning and Quality
Clean time‑series data: detect outliers (price spikes, meter errors), fill gaps appropriately, resolve overlapping or duplicate timestamps
Develop reusable Python‑based cleaning routines that can be applied across datasets
Understand why data quality issues occur (settlement reruns, late submissions, format changes) not just patch them
Pipeline Development (Supporting the Above)
Write and maintain Python data grabbers for energy market APIs
Build dbt models to transform raw data into clean, analysis‑ready datasets
Orchestrate workflows via GitHub Actions
Design PostgreSQL schemas that reflect your understanding of the domain
Requirements
Strong Python skills for data work — you’re fluent with pandas, comfortable writing clean, testable code, and can build reusable data processing logic. This is not an Excel role
Solid SQL skills — complex queries, window functions, CTEs in PostgreSQL
Experience with messy, real‑world data — you’ve done reconciliation, cleaning, or mapping work before and understand it’s not always automatable
Methodical and detail‑oriented — you notice inconsistencies and want to understand root causes
Good documentation habits — you know that undocumented mappings and assumptions are technical debt
Self‑directed — you can own ambiguous problems, do your own research, and communicate findings clearly
Nice to Have
Experience with energy, utilities, or market data (any geography)
Familiarity with UK energy markets, Elexon data, or grid operations
dbt experience for transformation pipelines
Exposure to time‑series data challenges (irregular timestamps, gaps, restatements)
Highly Desirable — Agentic AI Coding Experience
We value candidates who can build software using agentic AI coding systems. This is fundamentally different from using code completion tools or chat‑based assistants.
What we ARE looking for:
Hands‑on experience with agentic coding systems such as Claude Code, Codex, Open Code, or Cursor.
Ideal candidates will demonstrate:
Breadth of experience — proficiency with at least 2 agentic systems (experience with only one is insufficient)
End‑to‑end development — ability to design and build software from the ground up using these tools, not just generating isolated snippets
Multi‑agent orchestration — demonstrated experience orchestrating multiple agents using skills, tools, and agent coordination, not just one‑shot problem solving
Deep system knowledge — familiarity with hooks, permission systems, MCP (Model Context Protocol) servers, custom skills and tool definitions, and context management
Not What We're Looking For
Platform/infrastructure engineers who prefer to stay above the data layer
People who expect clean, well‑documented data as input
Those uncomfortable with research, ambiguity, or "manual" investigation work
Practicalities
Remote‑first with async collaboration (Slack, GitHub, documented decisions)
Core overlap with UK business hours expected (at least 4 hours daily)
Competitive compensation based on location and experience
Benefits
Plenty of opportunities for learning and professional growth
B2b contract with a paid vacation
#J-18808-Ljbffr