Data Engineer (Databricks & AWS)
Remote (Latin America / Europe) | 9 AM - 5 PM EST | Full-time
At CloudGeometry, we partner with industry leaders like AWS, Google, and Databricks to deliver cutting-edge cloud-native solutions. We are looking for a Senior Data Engineer to join our flagship project: a modern Data Platform for the life sciences industry, supporting global leaders like Pfizer, Moderna, and Novartis in developing innovative RNA-based solutions using cloud computing and advanced AI.
If you are an experienced data engineer who thrives in high-impact environments, zeroes out legacy systems, and wants to play a key technical leadership role in building scalable lakehouse architectures, let's talk!
Key Responsibilities
* Pipeline Engineering: Design, develop, and optimize high-performance ETL pipelines within Databricks to connect analytics-ready data back to operational services.
* Architecture Leadership: Lead technical architecture discussions with engineering, product managers, and data scientists to implement advanced analytics.
* Workflow Optimization: Build, fine-tune, and monitor Databricks workflows to ensure system reliability, performance, and data integrity.
* Data Quality & Security: Collaborate with ML teams to ensure secure, rigorous, and accurate data ingestion across all processing stages.
* Agile Execution: Actively participate in daily Scrum ceremonies within a globally distributed engineering team.
Technical Requirements & Stack
1. Core Data Engineering
* Databricks Ecosystem: 2+ years of hands-on experience (Delta tables/Iceberg, Spark jobs, MLflow, Unity Catalog, Model Registry).
* Architecture: Expert-level understanding of modern Lakehouse architectural design principles.
* Languages: Expert-level Python (for data processing/ETL) AND TypeScript / Node.js (for backend services using HapiJS, Zod, and Jest).
2. Cloud Infrastructure & DevOps (AWS)
* Compute & Storage: ECS (Fargate/EC2), Lambda, S3, and Athena.
* Messaging & Orchestration: SQS/SNS and Airflow.
* DevOps & CI/CD: GitHub Actions, CodeBuild, Docker, and repository templates via Cruft.
3. Data Stores & MLOps
* Databases: PostgreSQL (ACID/Migrations), DynamoDB (High-scale Key-Value), and Redis (Caching/Rate limiting).
* Search: OpenSearch / Elasticsearch for full-text search and aggregations.
* GenAI: Practical knowledge of LLMs, agents, function calling, and RAG architectures.
Qualifications
* Experience: 5+ years in software development with a strong focus on data engineering/analytics teams.
* Senior Autonomy: Proven ability to challenge decisions, propose architectural improvements, and deliver complex features end-to-end.
* Communication: Exceptional English skills (written and spoken) to articulate complex data ideas to global stakeholders.
* Availability: Required online presence from 9 AM to 5 PM EST.
Nice to Have:
* Professional Databricks or AWS certifications.
* Experience building internal SDKs or developer experience tooling.
* Experience working directly alongside Data Scientists and ML Developers.
What We Offer (Our Commitment to You)
* Comprehensive compensation and benefits package.
* Zero legacy systems – work exclusively with cutting-edge technologies.
* Continuous Learning: Extensive training, certifications, hackathons, and Udemy access.
* Premium Tooling: Developer Pro access to ClaudeCode, Codex, and AntiGravity.
* Top-tier Culture: A collaborative, supportive environment with global experts.
Ready to build the future of Life Sciences?
Click Apply or send us your resume. Let's build something massive together!
#DataEngineering #Databricks #AWS #Python #TypeScript #RemoteJobs #Lakehouse