Job Description
We are seeking a skilled Data Architect to lead the development of our data quality scorecards using SAP ECC data in the Lakehouse.
1. Create profiling logic and validate its effectiveness.
2. Bulk rule-based data quality checks using PySpark programming language.
3. Design field-level and row-level results reporting framework.
4. Publish business-facing dashboards in Power BI.
This role will also design reusable templates, naming conventions, and repeatable processes for future scorecard expansion.
Responsibilities:
* Develop and deploy Databricks-based Data Quality scorecards.
* Define and implement profiling logic for nulls, distincts, pattern checks.
* Bulk build PySpark-based Data Quality rules and metrics.
* Produce curated datasets for Power BI scorecards.
* Establish reusable DQ rule templates and standardized patterns.
* Work with existing SAP ECC data models.
* Mentor junior developers on rule logic and best practices.
The ideal candidate has strong proficiency in Python programming language, PySpark libraries and experience working with large-scale datasets.