Data Engineer

Columbus, OH Columbus OH 43085

Date : Mar-02-20

Data Engineer

Columbus, OH

Mar-02-20

Work Authorization

US Citizen
GC
H1B
GC EAD, TN EAD

Preferred Employment

Corp-Corp
W2-Permanent
W2-Contract
1099-Contract
Contract to Hire

Job Details

Experience

Midlevel

Rate/Salary ($)

Market

Duration

Long term

Sp. Area

Data Warehousing/ETL

Sp. Skills

x-Other

Consulting / Contract

Required Skills :

Spark, Hive, AWS, Cloud Computing, Github, Hadoop, Python, Scala, SQL, Agile, Apache, Cluster, DB2, Eclipse, GUI, Hbase, HDFS, JAVA, Jenkins, JIRA, Li

Preferred Skills :

Domain :

IT/Software

Work Authorization

US Citizen
GC
GC EAD, TN EAD
H1B

Preferred Employment

Corp-Corp
W2-Permanent
W2-Contract
1099-Contract
Contract to Hire

Job Details

Experience

Midlevel

Rate/Salary ($)

Market

Duration

Long term

Sp. Area

Data Warehousing/ETL

Sp. Skills

x-Other

Consulting / Contract

Required Skills :

Spark, Hive, AWS, Cloud Computing, Github, Hadoop, Python, Scala, SQL, Agile, Apache, Cluster, DB2, Eclipse, GUI, Hbase, HDFS, JAVA, Jenkins, JIRA, Li

Preferred Skills :

Domain : IT/Software

Compunnel
Plainsboro, NJ
Post Resume to
View Contact Details &
Apply for Job

Job Description :

Role: Data Engineer

Location: Colombus, Ohio

Long term

Key Responsibilities

Apply all Phases of Software Development Life Cycle (Analysis, Design, Development, Testing and Maintenance) using Waterfall and Agile methodologies
Proficient in working on Apache Hadoop ecosystem components like Map-Reduce, Hive, Pig, SQOOP, Spark, Flume, HBase and Oozie with AWS EC2 cloud computing
Expertise in using Hive for creating tables, data distribution by implementing Partitioning and Bucketing. Capable in developing, tuning and optimizing the HQL queries
Proficient in importing and exporting the data using SQOOP from HDFS to Relational Database systems and vice-versa
Expert in Spark SQL and Spark DataFrames using Scala for Distributed Data Processing
Develop DataFrame and RDD (Resilient Distributed Datasets) to achieve unified transformations on the data load
Expertise in various scripting languages like Linux/Unix shell scripts and Python
Develop scheduling and monitoring Oozie workflows for parallel execution of jobs
Experience in working with cloud environment AWS EMR, EC2, S3 and Athena and GCP BigQuery
Transfer data from different platform’s into AWS platform
Diverse experience in working with variety of Database like SQL Server, MySql, IBM DB2 and Netezza
Manage the source code in GitHub
Track and delivery requirements in Jira
Expertise in using IDEs and Tools like Eclipse, GitHub, Jenkins, Maven and IntelliJ
Optimize the Spark application to improve performance and reduced time on the Hadoop cluster
Proficient in executing Hive queries using Hive cli, Web GUI Hue and Impala to read, write and query the data
Build distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real-time
Create metrics and apply business logic using Spark, Scala, R, Python, and/or Java
Model, design, develop, code, test, debug, document and deploy application to production through standard processes
Harmonize, transform, and move data from a raw format to consumable and curated views
Apply strong Data Governance principles, standards, and frameworks to promote data consistency and quality while effectively managing and protecting the integrity of corporate data