Job Description :

Job Title: ETL Developer

Work Location (s): Titusville, NJ/Franklin Park, NJ

Duration: Long term

Project Description:

  • This project is Revamp/Re-Design existing process to achieve following
  • A new Flexible framework system with better performance.
  • Integrate multiple datasets into one data model.
  • Make availability of Project most recent data to all Business users and downstream applications.

Future Phase: Design and Integrate Rule-Engine based Architecture (in ADAL, Authoritative Data Access Layer) into Client New Arch Framework which enables End-users/Business users to access Rule based data (a Self-Service Portal) for Client data.

Job Duties:

  • Capture business rules and requirements, design the associated conceptual, logical, and physical models and create the required documentation to support, communicate and validate the data models.
  • Develop and maintain the Data lineage for all the business user's usage related Entities.
  • Migrate the existing environment (that's running on Informatica, Oracle & Linux) to Hadoop eco-system on Amazon Web Services (AWS) Cloud Stack components
  • Develop scripts that store the source system files in Hadoop distributed file systems.
  • Perform modelling of Hadoop environment and build Data standards for Data ingestion, Data processing and Data visualization layers for various big data analytic projects.
  • Maintain the Hadoop environment related code in distributed version controlling tools such as GitHub and Bitbucket.
  • Develop and execute pyspark (Python Spark) scripts for Massive Parallel Processing (MPP) of huge data volumes.
  • Migrate (for specific projects) Apache Hive tables into RedShift and develop views that will be used by business teams for business analytics.
  • Analyze and enhance the batch jobs that run on TALEND for performance improvements.
  • Based on the business requirements/ Functional requirements, develop technical specification documents.
  • Perform source and target data analysis and provide Gap analysis reports that will be used for integration and enhancements projects.
  • Develop python scripts to pull data from opensource through API's or AWS Data Exchange
  • Migrate complete batch jobs that run on TIDAL into Control-M
  • Develop various unit test cases and integration test cases that will be used for technical teams and business teams for projects to ensure the appropriateness of the deliverables/projects.
  • Identify and develop performance improvement methods on various technical ETL and business processes that run-in production by tuning HQL (Hive Query Language) queries, Spark SQL's, Spark Submit, SQL queries, Redshift views.
  • Spin-up EMR cluster for processing the data load, install all required libraries, archive the data in s3 before terminating the cluster & spin-off the cluster after successful completion of all loads
  • Develop Automation of EC2 instance Scale Up during peak processing time and scale down after completion of processes
  • Transfer bulk data between Apache Hadoop & Relational databases using SQOOP.
  • Provide Estimations to project management team on the project deliverables.
  • Develop project deliverable work tasks and delegate to Offshore developers.
  • Manage Offshore development team in terms of their daily work activities and deliverables.
  • Conduct Technical walk through sessions and perform overall monitoring of the deliverables progress as a team.

Degree Requirement: Bachelor's degree in computer science, computer information systems, information technology, or a combination of education and experience equating to the U.S. equivalent of a Bachelor's degree in one of the aforementioned subjects.


Similar Jobs you may be interested in ..