Design and Implement Data Engineering Solutions
The role involves designing, developing, and maintaining large-scale data engineering pipelines using cloud-based services. This includes the creation of efficient ETL/ELT workflows leveraging Azure Data Factory, Databricks, and Synapse. Additionally, high-performance data processing solutions will be built utilizing Apache Spark (PySpark) on Databricks.
1. Optimize data workflows for maximum performance and cost efficiency in a cloud environment.
2. Implement best practices around data security, governance, and compliance to ensure secure data handling.
3. Develop continuous integration and deployment pipelines for data engineering workloads.
Collaborate closely with data scientists, analysts, and business stakeholders to identify data requirements and deliver accurate, reliable datasets. The ideal candidate will have expertise in creating scalable data systems, implementing robust security measures, and collaborating effectively within cross-functional teams.
Key skills include:
* Azure Data Bricks
* Azure Data Factory
* Azure Data Services
* Data Quality tools (Great Expectations, Soda)
* Knowledge of Step Functions, EventBridge, or Kinesis
* Best practices for API security (Cognito, WAF, IAM Policies)