Machine Learning Lead

Stamford, CT Stamford CT 06927

Date : Sep-04-20

Stamford, CT

Sep-04-20

Work Authorization

US Citizen
GC
H1B
EAD (OPT/CPT/GC/H4)

Preferred Employment

Corp-Corp
W2-Permanent
W2-Contract
1099-Contract
Contract to Hire

Job Details

Experience

Architect

Rate/Salary ($)

Market

Duration

Long term

Sp. Area

AI, ML, NLP, Data Science

Sp. Skills

[ML] Machine Learning

Consulting / Contract

Required Skills :

Machine learning, Python, Agile, Hadoop, SQL, ACCESS, Big Data, Data Scientist, DynamoDB, Google Cloud Platform, JAVA, Kafka, Scala, Shell Script

Preferred Skills :

PySpark, Python, Big Data, Spark

Domain :

Work Authorization

US Citizen
GC
EAD (OPT/CPT/GC/H4)
H1B

Preferred Employment

Corp-Corp
W2-Permanent
W2-Contract
1099-Contract
Contract to Hire

Job Details

Experience

Architect

Rate/Salary ($)

Market

Duration

Long term

Sp. Area

AI, ML, NLP, Data Science

Sp. Skills

[ML] Machine Learning

Consulting / Contract

Required Skills :

Machine learning, Python, Agile, Hadoop, SQL, ACCESS, Big Data, Data Scientist, DynamoDB, Google Cloud Platform, JAVA, Kafka, Scala, Shell Script

Preferred Skills :

PySpark, Python, Big Data, Spark

Domain :

VLink Inc
South Windsor, CT
Post Resume to
View Contact Details &
Apply for Job

Job Description :

Primary Skill - ML Ops, PySpark

Responsibilities:
? Coding and architecting of end-to-end applications on modern data processing technology stack: Hadoop, Hive, AWS Cloud, Spark ecosystem technologies
? Embed with Data science COE / Product to ensure new algorithms / models being built can be supported
? Reviews design, code, data, features implementation performed by other data engineers in support of maintaining data engineering standards
? Working with model developers to improve efficiency, making modeling tradeoffs
? Build continuous integration/continuous delivery, test-driven development, and production deployment frameworks
? Troubleshoot complex data, cleaning, tagging, features, rules issues and perform root cause analysis to proactively resolve product and operational issues
? Productionalize the full pipeline including distributed Machine Learning models (e.g., training/test pipeline, data layer, feature layer, etc
? Connect business context and perspective to define model objective functions, features, business rules, prioritization, measurement, etc.
? Enforce effective cost optimization techniques in cloud, on-prem, and edge environments to minimize the total cost of ownership of the machine learning product services.
? Work on analytics application infrastructure analysis & requirements (e.g., configuration, access, tools, services, compute capacity, etc
Qualification:
? 10+ years of experience in software development in an agile environment.
? 5+ years of Big Data, and Spark experience building and running complex pipelines/applications at scale in production
? 3+ years experience in Machine Learning, AWS Cloud, EMR, Hadoop, Spark, Kafka, Kinesis, Hive, Redshift, DynamoDB, Lambda
? 3+ years experience PySpark, Python, HQL, Shell Scripting, SQL, Java / Scala
? In-depth knowledge of Python data processing and machine learning libraries.
? Experience with and understanding of the Python ML frameworks such as Scikit-learn, TensorFlow and PyTorch
? Experience with API design & development
? Proficient in Spark, Airflow / Oozie, No-SQL, Git/Code Commit
? Passion for writing well structured, testable code with a focus on readability and maintainability.
? Experience implementing ML models and building highly scalable and high availability systems
? Experience operating in distributed environments including cloud (AWS, GCP etc
? Experience building, launching and maintaining machine learning pipelines in production
? Experience working via an agile, sprint-based working style
? Experience working side-by-side with product owners, and translating business needs into analytics solutions
? Proven ability to successfully balance near-term results (e.g., ability to design and execute on a ‘MVP’ model), with long-term goals