Job Description :
Role: Lead Hadoop Developer
Location: NYC, NY
Duration: 12+ months
Job description:
· Total of 10+ years of experience in BI & DW with at least 4 - 6 years of experience in Big Data implementations
· Understand business requirements and convert them into solution designs
· Architecture, Design and Development of Big Data / data Lake Platform.
· Understand the functional and non-functional requirements in the solution and mentor the team with technological expertise and decisions.
· Hands-on experience in working with Hadoop Distribution platforms like HortonWorks, Cloudera, MapR and others.
· Strong hands-on experience in working with Hive, Spark (Java / Scala / Python)
· Experience in designing and building Hadoop data lake
· Produce a detailed functional design document to match customer requirements.
· Responsible for Preparation, reviewing and owning Technical design documentation.
· reviews, and preparing documents for Big Data applications according to system standards.
· Conducts peer reviews to ensure consistency, completeness and accuracy of the delivery.
· Detect, analyse, and remediate performance problems.
· Evaluates and recommends software and hardware solutions to meet user needs.
· Responsible for project support, support mentoring, and training for transition to the support team.
· Share best practices and be consultative to clients throughout duration of the project.
· Take end-to-end responsibility of the Hadoop Life Cycle in the organization
· Be the bridge between data scientists, engineers and the organizational needs.
· Do in-depth requirement analysis and exclusively choose the work platform.
· Full knowledge of Hadoop Architecture and HDFS is a must
· Working knowledge of MapReduce, HBase, Pig, MongoDb, Cassandra, Impala, Oozie , Mahout, Flume, Zookeeper/Sqoop and Hive
· In addition to above technologies , understanding of major programming/scripting
· languages like Java, Linux, PHP, Ruby, Phyton and/or R
· He or she should have experience in designing solutions for multiple large data warehouses with a good understanding of cluster and parallel architecture as well as high-scale or distributed RDBMS and/or knowledge on NoSQL platforms
· Must have minimum 3+ years hands-on experience in one of the Big Data Technologies (I.e. Apache Hadoop, HDP, Cloudera, MapR)
· MapReduce, HDFS, Hive, Hbase, Impala, Pig, Tez, Oozie, Scoop
· Hands on experience in designing and developing BI applications
· Excellent knowledge in Relational, NoSQL, Document Databases, Data Lakes and cloud storage
· Expertise in various connectors and pipelines for batch and real-time data collection/delivery
· Experience in integrating with on-premises, public/private Cloud platform
· Good knowledge in handling and implementing secure data collection/processing/delivery
· Desirable knowledge with the Hadoop components like Kafka, Spark, Solr, Atlas
· Desirable knowledge with one of the Open Source data ingestion tool like Talend, Pentaho, Apache NiFi, Spark, Kafka
· Desirable knowledge with one of the Open Source reporting tool Brit, Pentaho, JasperReport, KNIME, Google Chart API, D3