Role Summary:
As a skilled Data Engineer, you will be responsible for rebuilding five Data Quality Scorecards using SAP ECC data available in the Lakehouse. You will design and validate profiling logic, build rule-based data quality checks in PySpark, generate field-level and row-level results, and publish business-facing scorecards in Power BI.
This role involves defining reusable templates, naming conventions, and repeatable processes to support future scorecard expansion (47 more scorecards) and help transition the organization away from Informatica IDQ.
Key Responsibilities:
* Rebuild Data Quality scorecards in Databricks
* Develop profiling logic (nulls, distincts, pattern checks)
* Build PySpark-based Data Quality rules and row/column-level metrics
* Create curated DQ datasets for Power BI scorecards
* Establish reusable DQ rule templates and standardized development patterns
* Work with SAP ECC data models
* Support and mentor a junior developer on rule logic and development standards
Qualifications:
* Strong Databricks engineering experience (PySpark, SQL, Delta Lake)
* Hands-on experience building Data Quality rules, frameworks, or scorecards
* Experience in profiling large datasets and implementing metadata-driven DQ logic
* Ability to mentor, review code, and explain concepts clearly
* Excellent communication skills in English
* Familiarity with SAP ECC tables and key fields (preferred)
* Experience with Unity Catalog or Purview (nice to have)
* Exposure to Lakehouse Monitoring or DQX accelerators (bonus)