Job Description :
Should have good knowledge on Hadoop eco systems - HDFS, Hive, Oozie, Sqoop, Kafka, Storm, Spark, Scala
Should have good experience in Spark including Spark streaming, Spark core and Spark SQL
Should have good skills in Scala programming language
Should be strong in Hive concepts including:
o Optimization best practices – Partitioning, Bucketing, Query optimization
o Different type of Joins – Map side, Bucketing join, SMB etc.
Should have good understanding of machine learning concepts
Should be well versed with basic UNIX commands to troubleshoot for an irresponsive process, how to check CPU/memory usage, how to take thread dump
Should be well versed with SDLC phases, release and change management processes
Should have good analytical and problem solving skills