We are seeking an experienced Data Engineer to design, build, and scale modern data solutions using Azure Databricks. In this role, you will own the end-to-end development of data pipelines, drive best practices in data architecture, and ensure high-quality, reliable data is delivered to support analytics and business decision-making.
Key Responsibilities
* Architect, build, and optimize scalable data pipelines and analytics solutions on Azure Databricks, ensuring alignment with lakehouse architecture and medallion design patterns
* Design and implement end-to-end ETL/ELT pipelines that ingest data from diverse sources, transform it based on business requirements, and deliver it to downstream consumers
* Develop, optimize, and maintain Apache Spark and PySpark jobs to efficiently process large-scale datasets
* Implement data quality frameworks, validation processes, and monitoring/alerting systems to ensure data accuracy, reliability, and availability
* Drive best practices by developing reusable frameworks, libraries, and standardized approaches for data engineering across the team
* Optimize query performance and resource utilization within Azure environments to ensure cost efficiency and scalability
* Maintain comprehensive documentation of data pipelines, workflows, and architecture while ensuring compliance with security and governance policies
* Participate in code reviews, CI/CD processes, and promote engineering excellence within the data organization
Need to Haves
* 3–5+ years of hands-on experience in data engineering, including at least 2+ years working with Azure Databricks and Apache Spark
* Strong programming skills in Python and PySpark, along with advanced SQL (especially Spark SQL)
* Proven experience building configuration-driven ETL pipelines and orchestrating workflows using tools such as Azure Data Factory, Databricks Workflows, or Apache Airflow
* Solid understanding of data modeling concepts, including dimensional modeling and data vault methodologies
* Hands-on experience with Delta Lake and lakehouse/medallion architecture patterns
* Experience with Azure services including Azure Data Lake Storage (ADLS), Azure Data Factory, Azure SQL Database, REST APIs, SFTP, and Azure Key Vault
* Proficiency in Git for version control, including branching strategies, pull requests, and collaborative development workflows
* Experience with CI/CD practices for deploying and managing data pipelines
* Fluent English to interact with international teams
Nice to Haves
* Experience working with large-scale distributed data systems and performance optimization techniques
* Familiarity with data quality, testing, and observability frameworks
* Experience supporting analytics or data science teams with curated and accessible datasets
* Exposure to multi-cloud or hybrid cloud data environments
* Knowledge of data governance, security, and compliance best practices in cloud platforms