Job Description :

Core Responsibilities

· Develop and maintain PySpark-based ETL pipelines for batch and incremental data processing

· Build and operate AWS Glue Spark jobs (batch and event-driven), including:

o Job configuration, scaling, retries, and cost optimization

o Glue Catalog and schema management

· Design and maintain event-driven data workflows triggered by S3, EventBridge, or streaming sources

· Load and transform data into Amazon Redshift, optimizing for:

o Distribution and sort keys

o Incremental loads and upserts

o Query performance and concurrency

· Design and implement dimensional data models (star/snowflake schemas), including:

o Fact and dimension tables

o Slowly Changing Dimensions (SCDs)

o Grain definition and data quality controls

· Collaborate with analytics and reporting teams to ensure the warehouse is BI-ready

· Monitor, troubleshoot, and optimize data pipelines for reliability and performance

Required Technical Experience

· Strong PySpark experience (Spark SQL, DataFrames, performance tuning)

· Hands-on experience with AWS Glue (Spark jobs, not just crawlers)

· Experience loading and optimizing data in Amazon Redshift

· Proven experience designing dimensional data warehouse schemas

· Familiarity with AWS-native data services (S3, IAM, CloudWatch)

· Production ownership mindset (debugging, failures, reprocessing

We are an equal opportunity employer. All aspects of employment including the decision to hire, promote, discipline, or discharge, will be based on merit, competence, performance, and business needs. We do not discriminate on the basis of race, color, religion, marital status, age, national origin, ancestry, physical or mental disability, medical condition, pregnancy, genetic information, gender, sexual orientation, gender identity or expression, national origin, citizenship/ immigration status, veteran status, or any other status protected under federal, state, or local law.

             

Similar Jobs you may be interested in ..