Data Warehouse Architect Needed for Scalable Lakehouse Solutions
We are seeking a highly skilled Data Warehouse Architect to spearhead the design and implementation of a new data warehouse instance for a major product line. You will be responsible for building scalable pipelines, optimizing lakehouse performance, and integrating seamlessly with diverse real-time and batch data sources across cloud platforms.
Key Responsibilities
* Design and deploy a new cloud-based Databricks Lakehouse instance tailored to the client's product-level data needs.
* Architect and implement robust data ingestion pipelines using Spark (PySpark/Scala) and Delta Lake.
* Integrate cloud-native services (S3, Glue, Athena, Redshift, Lambda) with Databricks for optimized performance and scalability.
* Define data models, optimize query performance, and establish warehouse governance best practices.
* Collaborate cross-functionally with product teams, data scientists, and DevOps to streamline data workflows.
* Maintain CI/CD, preferably DBX for data pipelines using GitOps and Infrastructure-as-Code.
Required Skills & Experience
1. Databricks / Lakehouse Architecture
* End-to-end setup of Databricks workspaces and Unity Catalog.
* Expertise in Delta Lake internals, file compaction, and schema enforcement.
* Advanced PySpark/SQL skills for ETL and transformations.
2. AWS Native Integration
* Deep experience with AWS Glue, S3, Redshift Spectrum, Lambda, and Athena.
* IAM and VPC configuration knowledge for secure cloud integrations.
3. Data Warehousing & Modeling
* Strong grasp of modern dimensional modeling (star/snowflake schemas).
* Experience setting up lakehouse design patterns for mixed workloads.
4. Automation & DevOps
* Familiarity with CI/CD for data engineering using tools like DBX, Terraform, GitHub Actions, or Azure DevOps.
* Proficient in monitoring tools like CloudWatch, Datadog, or New Relic for data pipelines.