Big Data Engineer

Saint Paul, MN Saint Paul MN 55175

Date : May-07-20

Big Data Engineer

Saint Paul, MN

May-07-20

Work Authorization

US Citizen
GC
H1B
GC EAD, L2 EAD, H4 EAD

Preferred Employment

Corp-Corp
W2-Permanent
W2-Contract
1099-Contract
Contract to Hire

Job Details

Experience

Senior

Rate/Salary ($)

Market

Duration

LONG TERM

Sp. Area

Cloud, Salesforce, SaaS

Sp. Skills

x-Other

Consulting / Contract

Direct Client Requirement

Required Skills :

Kafka, Big Data, Hadoop, Business Intelligence, ETL, Data Modeling, Data Scientist, Real time, hive, impala, kudu, hue, spark, python, CDC

Preferred Skills :

Domain :

Work Authorization

US Citizen
GC
GC EAD, L2 EAD, H4 EAD
H1B

Preferred Employment

Corp-Corp
W2-Permanent
W2-Contract
1099-Contract
Contract to Hire

Job Details

Experience

Senior

Rate/Salary ($)

Market

Duration

LONG TERM

Sp. Area

Cloud, Salesforce, SaaS

Sp. Skills

x-Other

Consulting / Contract

Direct Client Requirement

Required Skills :

Kafka, Big Data, Hadoop, Business Intelligence, ETL, Data Modeling, Data Scientist, Real time, hive, impala, kudu, hue, spark, python, CDC

Preferred Skills :

Domain :

Cygnus Professionals
Somerset, NJ
Post Resume to
View Contact Details &
Apply for Job

Job Description :

Big Data Engineer

Top Skills Details
1) 4+ Years hands on experience in big data environment :
- Kafka, hive, impala, kudu, hue, spark, python
2) Experience with Tool: stream sets or kafka (or other tools), change data capture (CDC) or ETL
3) Hands on in Building Data ops/Data pipeline

Description
Looking for a heads down Hadoop Developer to Assist primarily in developing data pipelines, with some platform engineer work as well. This person should have around 4+ years of Hadoop engineering experience working in the big data environment: spark, hive, impala

They have been investing greatly into their Data Platform modernization. This role will sit within and enterprise data group, helping to build data pipelines for the finance group to build Management Reports, pulling the data from their hadoop Platform.

Will build data pipelines using either stream sets or kafka (depending on what the specific project need calls for), will also work with change data capture (CDC) and ETL - so they should have hands on experience with data ops/Data pipeline
Enviorment:
- On-prem - Cloudera (leveraging phdata for manage service provider), they are currently transitioning from the build to managed service phase. They are now moving into use case development using the platform.
- Streamsets for ingestion (near-real time for change data capture, and cleansing and minor transformation of data as it comes in to kafka)
- Hive, impala, spark (all the open sourced tools) Kafka
- Warescape for data modeling and automation for DW for BI
- Data vault methodology
- Atscale - used for semantic layer for BI
- For the Data science teams they are levering Cloudera workbench – which provides the scientists for prototyping and workbooking (long team they are hoping to institute a new process for reapable production ready models)