Chief Data Infrastructure Specialist
This role involves spearheading the design and implementation of a new data warehouse instance for a major product line.
You will lead the development of scalable pipelines, optimize lakehouse performance, and integrate with diverse real-time and batch data sources across AWS.
The ideal candidate should have expertise in Databricks (DBX) and AWS-native data services.
Key Responsibilities:
* Design and deploy a tailored Databricks Lakehouse instance to meet client product-level data needs.
* Architect and implement robust data ingestion pipelines using Spark (PySpark/Scala) and Delta Lake.
* Integrate AWS-native services (S3, Glue, Athena, Redshift, Lambda) with Databricks for optimized performance and scalability.
* Define data models, optimize query performance, and establish warehouse governance best practices.
* Collaborate cross-functionally with product teams, data scientists, and DevOps to streamline data workflows.
* Maintain CI/CD, preferably DBX for data pipelines using GitOps and Infrastructure-as-Code.
* Monitor data jobs and resolve performance bottlenecks or failures across environments.
Required Skills and Qualifications:
* Expertise in Databricks (DBX) and AWS-native data services.
* Proficiency in designing and deploying Databricks Lakehouse instances.
* Strong knowledge of data ingestion pipelines using Spark (PySpark/Scala) and Delta Lake.
* Experience with integrating AWS-native services (S3, Glue, Athena, Redshift, Lambda) with Databricks.
* Ability to define data models and optimize query performance.
* Collaborative mindset for working with product teams, data scientists, and DevOps.
* Understanding of CI/CD principles and experience with GitOps and Infrastructure-as-Code.
Benefits:
* Opportunity to work on a cutting-edge data infrastructure project.
* Collaborative and dynamic work environment.
* Professional growth and development opportunities.
* Competitive compensation package.
Other Information: