Job Description :

Data Engineer:

Responsibilities:

  • Architect, implement, and maintain scalable data architectures to meet our data processing and analytics requirements.
  • Collaborate with cross-functional teams to understand and translate data needs into effective data pipeline solutions.
  • Develop, optimize, and maintain ETL processes to facilitate smooth and accurate data movement across systems.
  • Implement best practices for data pipeline orchestration and automation using AWS Databricks.
  • Leverage AWS services to build and optimize data solutions.
  • Utilize Databricks for big data processing, analytics, and machine learning workflows.
  • Establish data quality checks and ensure data integrity and accuracy throughout the data lifecycle.
  • Implement and enforce data governance policies and procedures.
  • Optimize data processing and query performance for large-scale datasets within AWS and Databricks environments.
  • Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and provide the necessary infrastructure.
  • Document data engineering processes, architecture, and configurations.

Qualifications:

  • Proficient in data engineering roles with a focus on AWS and Databricks.
  • Strong programming skills in languages such as Python / PySpark and SQL.
  • In-depth Knowledge of Medallion architecture for data pipelines is required.
  • Experience with Delta Lake, Unity Catalog, Delta Sharing, Delta Live Tables (DLT).
  • Experience with data modeling, schema design, and database optimization.
  • Experience with data pipeline orchestration tools.
  • Experience with CI/CD on Databricks using tools such as GitHub Actions, and Databricks CLI.
  • Understanding of Data Management principles (quality, governance, security, privacy, life cycle management, cataloging)
  • Ability to troubleshoot complex data issues and implement effective solutions.
  • Build and support data infrastructure that will enable the automation of processes across a variety of ingestion, transformation, and consumption methods.
  • Design, Build, Publish, and maintain performant, reliable data models and pipelines to enable
  • Self-service consumption throughout the enterprise.
  • Publish data models to enable flexible querying of data and visualizations.
  • Operate at the intersection of upstream data producers and consumers to collaborate for successful analytical outcomes.
  • Create processes for monitoring and alerting for the health of data pipelines.
             

Similar Jobs you may be interested in ..