Data engineer

Sinop

CloudGeometry

Anunciada dia 6 junho

Descrição

Data Engineer (Databricks & AWS)Remote (Latin America / Europe) | 9 AM - 5 PM EST | Full-timeAt CloudGeometry, we partner with industry leaders like AWS, Google, and Databricks to deliver cutting-edge cloud-native solutions. We are looking for a Senior Data Engineer to join our flagship project: a modern Data Platform for the life sciences industry, supporting global leaders like Pfizer, Moderna, and Novartis in developing innovative RNA-based solutions using cloud computing and advanced AI.If you are an experienced data engineer who thrives in high-impact environments, zeroes out legacy systems, and wants to play a key technical leadership role in building scalable lakehouse architectures, let’s talk!Key ResponsibilitiesPipeline Engineering: Design, develop, and optimize high-performance ETL pipelines within Databricks to connect analytics-ready data back to operational services.Architecture Leadership: Lead technical architecture discussions with engineering, product managers, and data scientists to implement advanced analytics.Workflow Optimization: Build, fine-tune, and monitor Databricks workflows to ensure system reliability, performance, and data integrity.Data Quality & Security: Collaborate with ML teams to ensure secure, rigorous, and accurate data ingestion across all processing stages.Agile Execution: Actively participate in daily Scrum ceremonies within a globally distributed engineering team.️ Technical Requirements & Stack1. Core Data EngineeringDatabricks Ecosystem: 2+ years of hands-on experience (Delta tables/Iceberg, Spark jobs, MLflow, Unity Catalog, Model Registry).Architecture: Expert-level understanding of modern Lakehouse architectural design principles.Languages: Expert-level Python (for data processing/ETL) AND TypeScript / Node.Js (for backend services using HapiJS, Zod, and Jest).2. Cloud Infrastructure & DevOps (AWS)Compute & Storage: ECS (Fargate/EC2), Lambda, S3, and Athena.Messaging & Orchestration: SQS/SNS and Airflow.DevOps & CI/CD: GitHub Actions, CodeBuild, Docker, and repository templates via Cruft.3. Data Stores & MLOpsDatabases: PostgreSQL (ACID/Migrations), DynamoDB (High-scale Key-Value), and Redis (Caching/Rate limiting).Search: OpenSearch / Elasticsearch for full-text search and aggregations.GenAI: Practical knowledge of LLMs, agents, function calling, and RAG architectures.QualificationsExperience: 5+ years in software development with a strong focus on data engineering/analytics teams.Senior Autonomy: Proven ability to challenge decisions, propose architectural improvements, and deliver complex features end-to-end.Communication: Exceptional English skills (written and spoken) to articulate complex data ideas to global stakeholders.Availability: Required online presence from 9 AM to 5 PM EST .⭐ Nice to Have:Professional Databricks or AWS certifications.Experience building internal SDKs or developer experience tooling.Experience working directly alongside Data Scientists and ML Developers.What We Offer (Our Commitment to You)Comprehensive compensation and benefits package.⚡ Zero legacy systems – work exclusively with cutting-edge technologies.Continuous Learning: Extensive training, certifications, hackathons, and Udemy access.️ Premium Tooling: Developer Pro access to ClaudeCode, Codex, and AntiGravity.Top-tier Culture: A collaborative, supportive environment with global experts.Ready to build the future of Life Sciences?Click Apply or send us your resume. Let’s build something massive together!#DataEngineering #Databricks #AWS #Python #TypeScript #RemoteJobs #Lakehouse

Se candidatar

Criar um alerta

Salvar