Job Description :

Job Title: Big Data ETL Developer (Hadoop) 

Job Location: Columbia-MD 

 

Job Summary: 

This position will be supporting and working on one of Sparksoft’s technical projects. The ideal candidate will have experience in developing data ingestion and transformation ETL processes for analytical data loads, from a technical perspective. 


 

Responsibilities: 

• Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities. 

• Transition of legacy ETLs with Java and Hive queries to Spark ETLs. 

• Design, develop, test and release ETL solutions including data quality validations and metrics that follow data governance and standardization best practices. 

• Designing and Developing Databricks engineering solutions on AWS cloud. 

•  Good experience working on AWS Cloud.  

• Design, develop, test and release ETL mappings, mapplets, workflows using Streamsets, Java MapReduce, Spark and SQL. 

• Performance tuning of end-to-end ETL integration processes. 

• Monitoring performance and advising any necessary infrastructure changes 

• Analyze and recommend optimal approach for obtaining data from diverse source systems. 

• Work closely with the data architects, who maintain the data models, including data dictionaries/metadata registry. 

• Interface with business stakeholders to understand requirements and offer solutions. 

 

 

Required Skills: 

• Proficient understanding of distributed computing principles and hands on experience in Big Data Analytics and development 

• Good knowledge of Hadoop and Spark ecosystems including HDFS, Hive, Spark, Yarn, MapReduce and Sqoop 

• Experience in designing and developing applications in Spark using Scala that work with different file formats like Text, Sequence, Xml, parquet and Avro 

• Experience of using build tools Ant, SBT Maven 

• Experience of using Databricks 

• Strong SQL coding; understanding of SQL and No SQL statement optimization/tuning. 

• Ability to lead designing and implementation of ETL data pipelines. 

• Experience developing data quality checks and reporting to verify ETL rules and identify data anomalies. 

• AWS development using big data technologies. 

• Techniques for testing ETL data pipelines either manual or using tools. 

• AWS cloud certified, CMS experience, Databricks and Snowflake experience a plus. 


Education/Experience Level:
 

• Bachelor’s Degree with 5 years’ experience or 10+ years of experience in the software development field. 

• 5+ years of Bigdata ETL development experience. 

• 4+ years of AWS big data experience. 

• 3+ years of experience developing data validation checks and quality reporting. 

• 4+ years of experience tuning Spark/Java coding, SQL and No SQL. 

             

Similar Jobs you may be interested in ..