Job Title: Big Data Consultant (PySpark or Spark-Scala)
Location:Dallas
Type:Permanent
Experience:4-7 years (relevant)
Notice: Immediate
Job Description:
Must Have: Spark, Spark Core, PySpark, Python, Scala, HDFS, Hadoop, Kafka, Data Ingestion, Data Quality
Education: Minimum Bachelor’s degree in Computer Science, Engineering, Business Information Systems, or related field. Masters in Computing related to scalable and distributed computing is a major plus
Key Responsibilities:
- Develop Big Data applications using PySpark or Scala-Spark on Hadoop, Hive and/or Kafka, HBase, MongoDB
- Build Feature Engineering, Scoring / Machine Learning models
- Deployment on Cloud platforms
Experience & Skillset: MUST-HAVE
- Total IT / development experience of 7+ years
- Experience in PySpark or Spark-Scala developing Big Data applications on Hadoop, Hive and/or Kafka, HBase, MongoDB
- Technical Design and Onsite-Offshore coordination
- Deep knowledge of Spark libraries on Python or Scala to develop and debug complex data engineering challenges
- Experience in developing sustainable data driven solutions with current new generation data technologies to drive our business and technology strategies
- Exposure in deploying on Cloud platforms
- At least 3 years of development experience on designing and developing Data Pipelines for Data Ingestion or Transformation using PySpark or Spark-Scala
- At least 4 years of development experience in the following Big Data frameworks: File Format (Parquet, AVRO, ORC), Resource Management, Distributed Processing and RDBMS
- At least 4 years of developing applications in Agile with Monitoring, Build Tools, Version Control, Shell Scripting, Unit Test, TDD, CI/CD, Change Management to support DevOps
- Prior experience on ETL or SQL or other Data technologies
GOOD-TO-HAVE
- Banking domain knowledge
- Hands-on experience in SAS toolset / statistical modelling migrating to Machine Learning models
- Digital Marketing Machine Learning models and use cases
- ETL / Data Warehousing and Data Modelling experience prior to Big Data experience
- Deep knowledge on AWS stack for big data and machine learning