Job Opportunity
The organization seeks a skilled Data Warehouse Architect with expertise in Databricks (DBX) and AWS-native data services to design and implement a new data warehouse instance for a major product line.
This role involves building scalable pipelines, optimizing lakehouse performance, and integrating with diverse real-time and batch data sources across AWS.
Apart from the technical skills, strong business acumen, ability to work under pressure and excellent communication skills are necessary to collaborate with various stakeholders effectively.
Key Responsibilities:
* Design and deploy a new Databricks Lakehouse instance tailored to client needs.
* Architect and implement robust data ingestion pipelines using Spark and Delta Lake.
* Integrate AWS-native services (S3, Glue, Athena, Redshift, Lambda) with Databricks for optimized performance and scalability.
* Define data models, optimize query performance, and establish warehouse governance best practices.
* Collaborate with product teams, data scientists, and DevOps to streamline data workflows.
* Maintain CI/CD, preferably DBX for data pipelines using GitOps and Infrastructure-as-Code.
Required Skills & Experience:
Technical Proficiency
* Expertise in end-to-end setup of Databricks workspaces and Unity Catalog.
* Strong understanding of Delta Lake internals, file compaction, and schema enforcement.
* Advanced PySpark/SQL skills for ETL and transformations.
AWS Native Integration
* Deep experience with AWS Glue, S3, Redshift Spectrum, Lambda, and Athena.
* Knowledge of IAM and VPC configuration for secure cloud integrations.
Data Warehousing & Modeling
* Strong grasp of modern dimensional modeling (star/snowflake schemas).
* Experience setting up lakehouse design patterns for mixed workloads.
Automation & DevOps
* Familiarity with CI/CD for data engineering using tools like DBX, Terraform, GitHub Actions, or Azure DevOps.
* Proficient in monitoring tools like CloudWatch, Datadog, or New Relic for data pipelines.