Job Description :
Senior Hadoop MapR Administrator / Architect
Location – Dallas, TX
Duration – 6 months +

We are looking for an experienced Hadoop administrator to assist in managing a rapidly growing MapR Distribution Platform in a multi-tenancy environment. This role will work closely with the engineering team and will be responsible for capacity planning, service configuration, cluster expansion, monitoring, tuning, and overall ongoing support of the cluster environment. Responsibility also includes researching and recommending methods and technologies to improve cluster operation and user experience.


Job Responsibilities
Manage the Hadoop distribution on Linux instances, including configuration, capacity planning, expansion, performance tuning and monitoring
Work with data engineering team to support development and deployment of Spark and Hadoop jobs
Work with end users to troubleshoot and resolve incidents with data accessibility
Contribute to the architecture design of the cluster to support growing demands and requirements
Contribute to planning and implementation of software and hardware upgrades
Recommend and implement standards and best practices related to cluster administration
Research and recommend automated approaches to cluster administration
Responsible for Capacity Planning, Infrastructure Planning based on the workloads and future requirements
Install Cloudera-Hadoop for different environments (Dev, Model, Production, Disaster Recovery
Install and configure Kafka to facilitate real-time streaming applications.
Provide support and maintenance and its eco-systems include HDFS, Yarn, Hive, Impala, Spark, Kafka, HBase, MapR Work Bench, Splunk and Power BI.
Work with delivery teams for Provisioning of users into Hadoop.
Implement Hadoop Security like LDAP, Kerberos, Sentry, MapR Key Trustee Server and Key Trustee Management Systems.
Enable Sentry for RBAC (role-based access control) to have a privilege level access to the data in HDFS as per the security policies.
Enable data encryption at rest and at motion with TLS/SSL to meet the security standards.
Support OS patching of cluster nodes.
Design and Implementation of Backup and Disaster Recovery strategy based out of MapR BDR utility for Batch applications and Kafka mirror maker for real-time streaming applications.
Enable the consumers to use the Data in Hive Tables from Power BI desktop as part of the requirement.
Establish the connection to external clients and Impala, so as to enable the consumer group for an easy migration to Hadoop query engines.
Work with Control-M enterprise scheduler to run the Jobs in both Hadoop.
Align with development and architecture teams to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
Perform Capacity Planning of Hadoop Cluster .
Optimize and Performance tuning of the cluster by changing the parameters based on the benchmarking results such as Teragen/Terasort.
Implement GIT version control basing out of NFS shared drive for Hadoop and also integrate it to the Eclipse IDE.
Enable Sub-Version (svn) as version control

Qualifications
A minimum of bachelor''s degree in computer science or equivalent.
5+ years'' experience with administering Linux production environment
3+ years'' experience managing full stack Hadoop distribution (preferably MapR), Including monitoring
3+ years'' experience with implementing and managing Hadoop related security in Linux environments (Kerberos, SSL, Sentry, Encryption)
Strong knowledge of Yarn configuration in a multi-tenancy environment. Candidate should have experience with Yarn capacity scheduler.
Strong working knowledge of disaster recovery related to Hadoop platforms
Working knowledge of automation tools
3+ years administration experience with HBase, Spark, Hive, Impala
Strong written and verbal communication skills
Excellent analytical and problem-solving skills
Must have the ability to identify complex problems and review related information to develop and evaluate options and implement solutions.