Job Description :
Responsibilities:
Design and implement distributed data processing pipelines using Spark, Hive, Pig, Python, Scala, and other tools and languages prevalent in the Hadoop ecosystem. Ability to design and implement end to end solution.
Ability to design and implement end to end solution in AWS
Build utilities, user defined functions, and frameworks to better enable data flow patterns.
Research, evaluate and utilize new technologies/tools/frameworks centered around Hadoop and other elements in the Big Data space.
Define and build data acquisitions and consumption strategies
Work with teams to resolving operational & performance issues
Work with architecture/engineering leads and other teams to ensure quality solutions are implements, and engineering best practices are defined and adhered to.

Qualification:
MS/BS degree in a computer science field or related discipline
6+ years’ experience in large-scale software development
1+ year experience in Hadoop
Strong Java programming, shell scripting, Python, and SQL
Strong development skills around AWS, Hadoop, EMR, MapReduce, Pig
Strong understanding of Hadoop internals
Good understanding of file formats including JSON, Parquet, Avro, and others
Experience with databases like Oracle
Experience with performance/scalability tuning, algorithms and computational complexity
Experience (at least familiarity) with data warehousing, dimensional modeling and ETL development
Ability to understand and ERDs and relational database schemas
Proven ability to work cross functional teams to deliver appropriate resolution

Nice to have:
Experience with AWS components and services
Experience with open source NOSQL technologies such as HBase, DynamoDB, Cassandra
Experience with messaging & complex event processing systems such as Kafka and Storm
Scala
Machine learning frameworks
Statistical analysis with Python, R or similar