We're seeking a Data Pipeline Engineer to own and evolve our exercise recognition training data infrastructure. You'll manage the end-to-end pipeline that collects, synchronizes, validates, and prepares IMU sensor and video data for ML model training.
This role combines systems engineering, data quality automation, and hands-on problem-solving in a production environment.
What You’ll Do
Pipeline Operations & Improvement
* Maintain and enhance our multi-source data collection system: IMU sensors (via mobile app) and synchronized video streams from gym-based cameras.
* Improve video capture software robustness, particularly handling network interruptions and operational monitoring.
* Deploy and monitor services in remote Linux environments with appropriate DevOps practices.
Data Quality & Validation
* Evolve our Python-based QC engine that validates data pre- and post-annotation
* Implement checks for IMU-video time synchronization, sensor health, and measurement consistency
* Apply digital signal processing techniques to identify sensor failures, connectivity issues, and measurement irregularities.
* Develop validation logic comparing annotations against sensor data to ensure temporal alignment.
Analysis & Troubleshooting
* Perform ad-hoc analysis on ~1,200+ workout tasks to classify failure modes
* Identify whether issues stem from pipeline bugs, sensor problems, or annotation errors
* Prioritize engineering work based on data quality impact and coordinate with annotation team on fixes
Tooling and Visualization
* Maintain and extend our NextJS UI serving annotators, data scientists, and stakeholders
* Create visualizations (Chart.js) for QC metrics and signal analysis
* Integrate with LabelStudio annotation interface
What You Bring
Required
* Strong Python programming skills, particularly for data processing pipelines
* Experience with time-series data and digital signal processing
* Comfortable working in Linux environments and deploying/monitoring remote services
* Ability to debug complex multi-component systems (sensors, video, networks, sync)
* Data quality mindset: designing validation rules, tracking metrics, investigating anomalies
* SQL/database experience for managing pipeline metadata
Highly Valued
* Video processing experience (RTSP streams, encoding, OCR)
* Working with sensor/IoT data and handling connectivity challenges
* NextJS or modern web frameworks for data tooling
* DevOps practices: containerization, monitoring, logging, alerting
* Experience with annotation pipelines and ML training data workflows
* Background in biomechanics, sports science, or wearable sensors
Tech Stack
* Languages: Python (primary), JavaScript/TypeScript (NextJS UI)
* Data: IMU sensor streams, video (RTSP), time-series analysis, DSP
* Tools: LabelStudio, Chart.js, Linux/bash, OCR libraries
* Infrastructure: Remote deployment, monitoring systems
You'll Thrive Here If You
* Enjoy detective work: diagnosing why data doesn't match expectations
* Balance pragmatism with quality: shipping improvements while maintaining reliability
* Communicate well across technical and non-technical stakeholders
* Can work autonomously in a small, mission-driven team