Description/Job Summary:
The ideal candidate will have experience in developing data ingestion and transformation ETL processes for analytical data loads, from a technical perspective.
Responsibilities:
• Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities.
• Transition of legacy ETLs with Java and Hive queries to Spark ETLs.
• Design, develop, test and release ETL solutions including data quality validations and metrics that follow data governance and standardization best practices.
• Design, develop, test and release ETL mappings, mapplets, workflows using Streamsets, Java MapReduce, Spark and SQL.
• Performance tuning of end-to-end ETL integration processes.
• Monitoring performance and advising any necessary infrastructure changes
• Analyze and recommend optimal approach for obtaining data from diverse source systems.
• Work closely with the data architects, who maintain the data models, including data dictionaries/metadata registry.
• Interface with business stakeholders to understand requirements and offer solutions.
Requirements
Required Skills:
• Proficient understanding of distributed computing principles and hands on experience in Big Data Analytics and development
• Good knowledge of Hadoop and Spark ecosystems including HDFS, Hive, Spark, Yarn, MapReduce and Sqoop
• Experience in designing and developing applications in Spark using Scala that work with different file formats like Text, Sequence, Xml, parquet and Avro
• Experience of using build tools Ant, SBT Maven
• Strong SQL coding; understanding of SQL and No SQL statement optimization/tuning.
• Ability to lead designing and implementation of ETL data pipelines.
• Experience developing data quality checks and reporting to verify ETL rules and identify data anomalies.
• AWS development using big data technologies.
• Techniques for testing ETL data pipelines either manual or using tools.
• AWS cloud certified, CMS experience, Databricks and Snowflake experience a plus.
Education/Experience Level:
• Bachelor’s Degree with 5 years’ experience or 10+ years of experience in the software development field.
• 5+ years of Bigdata ETL development experience.
• 4+ years of AWS big data experience.
• 3+ years of experience developing data validation checks and quality reporting.
• 4+ years of experience tuning Spark/Java coding, SQL and No SQL.