TECHNOGEN, Inc. is a Proven Leader in providing full IT Services, Software Development and Solutions for 15 years.
TECHNOGEN is a Small & Woman Owned Minority Business with GSA Advantage Certification. We have offices in VA; MD & Offshore development centers in India. We have successfully executed 100+ projects for clients ranging from small business and non-profits to Fortune 50 companies and federal, state and local agencies.
Job Title: ETL Developer
Work Location (s): Titusville, NJ/Franklin Park, NJ
Duration: Long term
Project Description:
- This project is Revamp/Re-Design existing process to achieve following
- A new Flexible framework system with better performance.
- Integrate multiple datasets into one data model.
- Make availability of Project most recent data to all Business users and downstream applications.
Future Phase: Design and Integrate Rule-Engine based Architecture (in ADAL, Authoritative Data Access Layer) into Client New Arch Framework which enables End-users/Business users to access Rule based data (a Self-Service Portal) for Client data.
Job Duties:
- Capture business rules and requirements, design the associated conceptual, logical, and physical models and create the required documentation to support, communicate and validate the data models.
- Develop and maintain the Data lineage for all the business user's usage related Entities.
- Migrate the existing environment (that's running on Informatica, Oracle & Linux) to Hadoop eco-system on Amazon Web Services (AWS) Cloud Stack components
- Develop scripts that store the source system files in Hadoop distributed file systems.
- Perform modelling of Hadoop environment and build Data standards for Data ingestion, Data processing and Data visualization layers for various big data analytic projects.
- Maintain the Hadoop environment related code in distributed version controlling tools such as GitHub and Bitbucket.
- Develop and execute pyspark (Python Spark) scripts for Massive Parallel Processing (MPP) of huge data volumes.
- Migrate (for specific projects) Apache Hive tables into RedShift and develop views that will be used by business teams for business analytics.
- Analyze and enhance the batch jobs that run on TALEND for performance improvements.
- Based on the business requirements/ Functional requirements, develop technical specification documents.
- Perform source and target data analysis and provide Gap analysis reports that will be used for integration and enhancements projects.
- Develop python scripts to pull data from opensource through API's or AWS Data Exchange
- Migrate complete batch jobs that run on TIDAL into Control-M
- Develop various unit test cases and integration test cases that will be used for technical teams and business teams for projects to ensure the appropriateness of the deliverables/projects.
- Identify and develop performance improvement methods on various technical ETL and business processes that run-in production by tuning HQL (Hive Query Language) queries, Spark SQL's, Spark Submit, SQL queries, Redshift views.
- Spin-up EMR cluster for processing the data load, install all required libraries, archive the data in s3 before terminating the cluster & spin-off the cluster after successful completion of all loads
- Develop Automation of EC2 instance Scale Up during peak processing time and scale down after completion of processes
- Transfer bulk data between Apache Hadoop & Relational databases using SQOOP.
- Provide Estimations to project management team on the project deliverables.
- Develop project deliverable work tasks and delegate to Offshore developers.
- Manage Offshore development team in terms of their daily work activities and deliverables.
- Conduct Technical walk through sessions and perform overall monitoring of the deliverables progress as a team.
Degree Requirement: Bachelor's degree in computer science, computer information systems, information technology, or a combination of education and experience equating to the U.S. equivalent of a Bachelor's degree in one of the aforementioned subjects.