Job Description :
Machine Learning Programmer,

Saint Louis, MO 

6+ Months Contract

Phone + Skype


This person will be responsible for developing a modeling pipeline in SparklyR/PySpark that is able to run on a Hadoop framework.and use the distributed infrastructure. Additionally, this candidate will support the Data Science team in development and deployment of new predictive models.


Refactor existing modeling pipeline (in R and Python) to run it in Apache Spark
Share developed code with team lead for review. Rework on the code based on inputs if required.
Spark query tuning and performance optimization
Help in deployment of models developed in R/Python using Apache Spark


Degree in Computer Science, Statistics, Applied Mathematics, Data Science, or related field
Work experience in Spark and HDFS
Hands on working experience using Sparklyr (R, R-studio and Spark) and PySpark (Python and Spark)
Sparklyr experience to package R code on Spark cluster
Hands on experience in Java, Hadoop, HIVE, Scala, and MLLib