We are seeking a skilled Data Architect to design and implement a new data warehouse instance.
The ideal candidate will have experience with Databricks Lakehouse and AWS-native services, including Spark, Delta Lake, S3, Glue, Athena, Redshift, and Lambda.
Key Responsibilities:
* Design and deploy a new Databricks Lakehouse instance tailored to client's product-level data needs.
* Architect and implement robust data ingestion pipelines using Spark (PySpark/Scala) and Delta Lake.
* Integrate AWS-native services with Databricks for optimized performance and scalability.
* Define data models, optimize query performance, and establish warehouse governance best practices.
* Collaborate cross-functionally with product teams, data scientists, and DevOps to streamline data workflows.
* Maintain CI/CD, preferably DBX for data pipelines using GitOps and Infrastructure-as-Code.
* Monitor data jobs and resolve performance bottlenecks or failures across environments.
Required Skills & Experience:
* End-to-end setup of Databricks workspaces and Unity Catalog.
* Expertise in Delta Lake internals, file compaction, and schema enforcement.
* Advanced PySpark/SQL skills for ETL and transformations.
AWS Native Integration:
* Deep experience with AWS Glue, S3, Redshift Spectrum, Lambda, and Athena.
* IAM and VPC configuration knowledge for secure cloud integrations.
Data Warehousing & Modeling:
* Strong grasp of modern dimensional modeling (star/snowflake schemas).
* Experience setting up lakehouse design patterns for mixed workloads.
Automation & DevOps:
* Familiarity with CI/CD for data engineering using tools like DBX, Terraform, GitHub Actions, or Azure DevOps.
* Proficient in monitoring tools like CloudWatch, Datadog, or New Relic for data pipelines.
Bonus/Nice to Have:
* Experience supporting gaming or real-time analytics workloads.
* Familiarity with Airflow, Kafka, or EventBridge.
* Exposure to data privacy and compliance practices (GDPR, CCPA).
],