Job Description :
Job Role: Data Engineer with Delta lake Experience

Job Duration: Long term

Job Location: Cambridge, MA

Core Requirements:

Data bricks is a combination of advanced spark cluster and a recent data lake framework called “Delta Lake’

Alternative Requirements

If not Data bricks, we would expect to see candidate with advanced spark processing capabilities and experience involving building DELTA Lakes with Large datasets on AWS cloud infrastructure

Required Skills:

* Must have minimum of 10 years of overall IT experience with at least 6 years of Big Data (Hadoop , Spark ,AWS ) Administration experience. (Developers 5+ years)
* Must have experience with Big Data toolsets: Hadoop, Spark, data bricks, cloud ( AWS)
* Must have 2+ years of spark experience.
* Previous Data bricks experience is required.
* Good to have knowledge on Mathematica, Matlab, Stella.
* Hands-on experience with installation, configuration, and support of Hadoop, spark, cloud integrations ( AWS ) .
* Experience with Data science tools R, Jupyter
* Experience with UNIX experience including scripting
* Must have Experience with at least one Graph Technology like Neo4J , TigerGraph , etc.
* Hands on experience on Cloud Computing, Amazon Web Services, Azure.
* Knowledge on monitoring and auditing AWS tools like cloudwatch, CloudTrail etc.
* Proficient working with AWS security and access management
* Good analytical and problem-solving skills.
* Strong communication skills and able to work independently as well as with the team.

Job Responsibilities:

Coordinating with the Support teams, project teams, CloudOps, infrastructure team and application vendor to plan/implement maintenance outages.

Provide Subject Matter Expertise in Databricks , spark and AWS Cloud platform

Manage AWS cloud accounts and services – Glue, S3, Cloudwatch, CloudTrail

Manage Data transfers from on prem to AWS Cloud

Support Graph use cases – Loading data into Graph databases and Analysis

Support Data science tool sets like R, Jupyter

Manage all aspects of data related activities – Lifecycle, security, access , auditing , monitoring etc.

Troubleshoot issues related to datasets, job failures, integration bugs and performance bottlenecks to all incidents working with multiple cross functional teams like DBA, functional team, cloudOps, Infrastructure and work on problems and drive them to resolutions.

Be able to proactively monitor the overall system (executing daily health check reports - job execution stats, volume processed, execution times, IO times taken etc and take appropriate actions to improve the overall health of the system

Strong ability to troubleshoot performance and configuration related issues including connectivity issues due to firewalls, load balancers etc.

Day to day administration activities - manage users, groups, privileges, roles and permissions, folder creation and management, domain management tasks (including backup and restore), configuring DB connections etc.

Develops disaster recovery and failover strategies for the data integration environment

Responsible for the Code deployments/migration activities.

Client : Bio Gen