Job Description :
Job Title: Datalake/Bigdata AWS Resource/Hadoop
Location: Cambridge (Boston), MA
Duration-6+ months

Datalake/Hadoop Resource
Implementation and Administration of On-prem Data lake environment
Monitoring and managing the Hadoop services on 3 clusters
Installing the New hosts (Head nodes, compute nodes and worker nodes to the existing cluster) and decommission of the hosts from the cluster
Maintenance and Monitoring of the jobs of Production, UAT and Development environments
Code changes and updated code deployments in the UAT and Production environments
Deploying code changes on Rshiny server and Rstudio server as per the user request
Implementation and Monitoring of oozie scheduled jobs
Implementation of patching activities and applying the fixes to the data lake environment provided by the Hortonworks
Working on the job failures mostly Hive and Spark jobs across the data lake environment
Onboarding the new users to the Hadoop data lake environment
Requirements gathering for creating the databases in Hive and providing policy based access management from the Ranger for the new Proof of Concepts (POCs) like Veeva Insights
Supporting the developers for executing the adhoc jobs in Hive environments for the existing POCs like enrollment forecaster etc.
HDFS home directories and Hive schema, table and column level enforcing access bases policies management from Ranger
Implementation of Security and management of Active Directory based Kerberos authentication across data lake clusters
Implementation of SSL for the Ambari and other HDP services in Hortonworks environment across the data lake clusters
Management of Encryption and Decryption of the users data using Ranger-KMS across the clusters of data lake environment
Installation and upgradation of Jupyterhub and python packages to support the developers for implementing the code in on-prem environments
working with HPC team for hardware issues and allocation of physical resources for the data lake environment
Hail- Spark implementation and analysis of UKBIOBANK datasets of genotypes and Phenotypes
Installation of latest version of spark and hail and optimization of Resources for launching datasets with huge size of data
Work with Hortonworks team for the planned upgradation of HDP version from 2.6 to 3.0
Support and maintenance of MongoDB servers in data lake
Source code Repository maintenance in Bitbucket

In addition to the above tasks, the resource will also perform the following AWS activities
Support of Cloudbreak server in AWS for the Hortonworks CB Deployment
Support of software upgrades for Cloudbreak,HDP packages installation in AWS Cluster
Support Data scientists for any technical issues during the execution of Spark-Hail jobs in Cloudbreak AWS cluster
Setup of latest versions of Spark and Hail in AWS spark cluster

