Job Description :

Candidates should possess strong knowledge and interest across big data technologies and have a background in data engineering.

• Build data pipeline frameworks to automate high-volume and real-time data delivery for our Spark and streaming data hub
• Transform complex analytical models in scalable, production-ready solutions
• Provide support and enhancements for an advanced anomaly detection machine learning platform
• Continuously integrate and ship code into our cloud production environments
• Develop cloud based applications from the ground up using a modern technology stack
• Work directly with Product Owners and customers to deliver data products in a collaborative and agile environment

• At least4 years of experience in the following Big Data frameworks: File Format (Parquet, AVRO, ORC), Resource Management, Distributed Processing and RDBMS
• At least 4 years of developing applications with Monitoring, Build Tools, Version Control, Unit Test, TDD, Change Management to support DevOps
• At least 2 years of experience with SQL and Shell Scripting experience
• Experience of designing, building, and deploying production-level data pipelines using tools from Hadoop stack (HDFS, Hive, Spark, HBase, Kafka, NiFi, Oozie, Apache Beam, Apache Airflow etc).
• Experience with Spark programming (pyspark or scala or java).
• Experience troubleshooting JVM-related issues.
• Experience and strategies to deal with mutable data in Hadoop.
• Familiarity with Spark Structure Streaming and/or Kafka Streams.
• Familiarity with machine learning implementation using PySpark.
• Experience in data visualization tools like Cognos, Arcadia, Tableau.
• Experience in Ab Initio technologies including, but not limited to Ab Initio graph development, EME, Co-Op, BRE, Continuous flow)


Similar Jobs you may be interested in ..