Education: Minimum Bachelor’s degree in Computer Science, Engineering, Business Information Systems, or related field. Masters in Computing related to Pyspark and distributed computing is a major plus
Key Responsibilities:
· Develop Big Data applications using -Pyspark on Hadoop, Hive and/or Kafka, HBase, MongoDB
· Build Machine Learning models
· Deployment on Cloud platforms
Experience & Skillset
MUST-HAVE
· Total IT / development experience of minimum 4+ years of Big Data.
· Experience in -Pyspark developing Big Data applications on Hadoop, Hive and/or Kafka, HBase, MongoDB
· Deep knowledge of Pyspark and libraries to develop and debug complex data engineering challenges
· Experience in developing sustainable data driven solutions with current new generation data technologies to drive our business and technology strategies
· Exposure in deploying on Cloud platforms
· Development experience on designing and developing Data Pipelines for Data Ingestion or Transformation using -Pyspark
· Development experience in the following Big Data frameworks: File Format (Parquet, AVRO, ORC), Resource Management, Distributed Processing and RDBMS
· Developing applications in Agile with Monitoring, Build Tools, Version Control, Unit Test, TDD, CI/CD, Change Management to support DevOps
· Development experience with SQL and Shell Scripting experience
GOOD-TO-HAVE
· Banking domain knowledge
· Hands-on experience in SAS toolset / statistical modelling migrating to Machine Learning models
· Digital Marketing Machine Learning models and use cases
· ETL / Data Warehousing and Data Modelling experience prior to Big Data experience
· Deep knowledge on AWS stack for big data and machine learning