Looking for a data engineer to design and implement large-scale data systems, including data pipelines, warehouses, and lakes. This role requires expertise in tools such as Apache Beam, Spark, AWS Glue, Amazon Redshift, Google BigQuery, and Snowflake.
About the Job
This is a remote opportunity with the potential to become a permanent position. The successful candidate will work closely with data architects, scientists, and other stakeholders to ensure that the entire data system meets the needs of our business.
Key Responsibilities:
* Design, build, and maintain large-scale data systems.
* Design and implement data warehouses using tools like Amazon Redshift, Google BigQuery, and Snowflake.
* Develop and maintain data pipelines using tools such as Apache Beam, Apache Spark, and AWS Glue.
* Work with data architects to design and implement data models and architectures.
* Collaborate with data scientists to develop and deploy machine learning models and data products.
* Evaluate data quality and integrity by developing and implementing data validation and cleansing processes.
* Collaborate with other teams to ensure that data systems meet the business's needs.
* Stay up-to-date with new technologies and trends in data engineering and make recommendations for adoption.
Requirements
To be considered for this position, you should have:
* 5+ years of experience in data engineering or a related field.
* 2-4 years of experience in Ruby products, including the Ruby on Rails framework.
* 5+ years of experience with programming languages such as Python, Java, and Scala.
* 3+ years of experience with data modeling and architecture.
* 3+ years of experience with data engineering tools such as Apache Beam, Apache Spark, AWS Glue, Amazon Redshift, Google BigQuery, and Snowflake.
* Strong experience with data warehousing and lakes.
* Strong experience with data validation and cleansing.
* Bachelor's degree in Computer Science, Engineering, or a related field.
Nice to Have:
* Experience with machine learning and data science.
* Experience with cloud-based data platforms such as AWS, GCP, or Azure.
* Experience with containerization using Docker and Kubernetes.
* Experience with agile development methodologies such as Scrum or Kanban.
* Experience with data governance and security.