Job Description :

A Hadoop developer III is responsible for the design, development and operations of systems that store and manage large amounts of data. Most Hadoop developers have a computer software background and have a degree in information systems, software engineering, computer science, or mathematics. We are looking for a Big Data Engineer that will work on the collecting, storing, processing, and analyzing of huge sets of data. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them.


· Understanding the requirements of input to output transformations.

· Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities

· Experience with integration of data from multiple data sources, Implementing ETL process using APACHE NIFI

· Monitoring performance and advising any necessary infrastructure changes

· Defining data retention policies

· Management of Hadoop cluster, with all included services such as Hive, HBase, mapReduce and Sqoop

· Cleaning data as per business requirements using streaming API’s or user defined functions.

· Build distributed, reliable and scalable data pipelines to ingest and process data in real-time, defining Hadoop Job Flows.

· Managing Hadoop jobs using scheduler.

· Apply different HDFS formats and structure like Parquet, Avro, etc. to speed up analytics.

· Work with various hadoop ecosystem tools like Hive, pig, Hbase , spark etc.

· Experience with Spark, SparkR, Python, Scala

· Experience with NoSQL databases, such as HBase, Cassandra, MongoDB

· Reviewing and managing Hadoop log files.

· Assess the quality of datasets for a hadoop data lake.

· Fine tune Hadoop applications for high performance and throughput.

· Troubleshoot and debug any Hadoop ecosystem run time issues.

· Being a part of a POC effort to help build new Hadoop clusters