Job Description :
Summary
Our client is a data lake company, is an award-winning provider of enterprise data lake management solutions. Their software enables customers to gain competitive advantage through organized, actionable big data lakes.

Responsibilities:
Responsible for implementation and ongoing administration of Hadoop infrastructure.
Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios, Cloudera Manager Enterprise, HDP Ambari, Pivotal PHD and/or Mapr and other tools.
Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
Screen Hadoop cluster job performances and capacity planning
Monitor Hadoop cluster connectivity and security
Manage and review Hadoop log files.
File system management and monitoring.
HDFS support and maintenance.
Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
Point of Contact for Vendor escalation
Desirable : Be able to troubleshoot issues with hive, hbase,pig,spark /scala scripts to isloate/fix issues if infrastructure related
Skills Required / Desirable:
Minimum 8+ Years of experience is required.
General operational expertise - good Linux troubleshooting skills, understanding of system’s capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks.
Hadoop skills like HBase, Hive, Pig, Sqoop,Spark etc.
Should be able to deploy Hadoop cluster, add and remove nodes, keep track of jobs, monitor critical parts of the cluster, configure name-node high availability, schedule and configure it and take backups.
Good knowledge of Linux .
Familiarity with open source configuration management and deployment tools such as Puppet or Chef and Linux scripting.
Knowledge of Troubleshooting Core Java Applications is a plus.

Qualifications:
BS in Computer Science or Masters or related discipline like Information technology
(Degree in other disciplines can be considered if candidate has other relevant experience)

Preferred :
Cloudera and/or Hortonworks and/or Mapr Hadoop administration certification