Data Engineer Job Description
Job Title: Data Engineer - ETL/ELT Pipelines in Databricks
* Key Responsibilities:
* Design, build, and optimize Extract, Transform, Load (ETL) / Extract, Load, Transform (ELT) pipelines using Databricks for big data processing.
* Ingest diverse data from various sources, such as APIs, SQL databases, cloud storage, SAP systems, and streaming platforms, ensuring quality and reliability.
* Develop reusable pipeline frameworks to improve efficiency, create data validation logic, and apply performance-tuned transformations for efficient processing.
* Create curated datasets and generate valuable insights through Power BI dashboards for informed business decisions.
* Implement best practices for lakehouse development, orchestration, and version control, guaranteeing high-quality results.
* Analyze and resolve issues related to pipeline performance and ensure accuracy, reliability, and consistency of the processed data.
Required Skills:
* Profound experience with Databricks technology stack, including Spark, PySpark, and Delta Lake.
* Demonstrated expertise in advanced SQL concepts for large-scale data processing.
* Able to integrate disparate data sources into unified datasets, utilizing structured and unstructured information.
* Comprehensive understanding of distributed computing concepts, performance tuning strategies, and debugging techniques for optimizing Spark jobs.
* Proficiency in creating reports, models, and DAX calculations within Power BI.
* Knowledge of implementing Continuous Integration/Continuous Deployment (CI/CD) pipelines in Azure environments.
Desirable Qualifications:
* Familiarity with data quality frameworks, lakehouse monitoring tools, or DQX.
* Prior experience with Airflow, Azure Data Factory, IoT, Kafka, or similar technologies is advantageous.
* Background knowledge of SAP data handling can be beneficial.