Job Description :

 Job Title : Data Scientist

 Duration : 6 months+

 Location: Remote

This position will be based in the Bay area/ Fully remote

Job Description :

We are building a data science team with a mission to discover insightful information hidden in vast amounts of data and help us make smarter predictive and prescriptive analytical decisions relevant to the business problems at hand. Your primary focus shall range from understanding the business problems, doing statistical analysis to experimenting with latest Machine Learning modelling techniques (including classical data mining and the latest deep learning) with an aim to build high quality prediction systems integrated with our products and solutions. 

Examples of the types of tasks that this may involve are :

  • developing data pipeline stages like cleaning, validation, wrangling etc. for a variety of data types like text, images and categorical or custom 
  • design and develop metrics/scoring pipelines using machine learning techniques
  • develop feature extraction pipelines from raw data stored in a variety of formats
  • design and develop feature representation using a variety of data formats like SQL databases, key-value or object storage, or knowledge graphs for better predictions
  • work with standardized libraries like sklearn, NumPy, pandas to implement models for classification and regression tasks
  • work with tensorflow, keras, pytorch etc. to implement various custom and pre-built neural network models like RNNs, CNNs
  • develop internal A/B testing and multi-arm bandit or ensembled models and pipelines
  • work in mixed programming/scripting language environments as per application requirements like python, java, C++
  • work within state-of-the art MLOps/CICD/DevOps platforms based on standardized sparc, kubernetes, kafka based batch, streaming/real-time, or transactional distributed architectures used to host the model training, test, and inference pipelines.
  • Must have experience doing exploratory data analysis and visualization using state of the art python based libraries like pandas, numpy, matplotlib, searborn, plotly, streamlit etc.
  • Must have experience building models/algorithms for training/inference workloads using libraries like sklearn, tensorflow, pytorch


Responsibilities

  • Selecting features, building, and optimizing classifiers using machine learning techniques
  • Data mining and experimental analysis using state-of-the-art methods
    Processing, cleansing, and verifying the integrity of data used for analysis and training/inference
  • Collect/understand business requirements with varying degree of crispness
    Define and design data science techniques and pipelines that address specific business problems
  • Work with datasets of varying degrees of size and complexity including both structured and unstructured data.
  • Developing pipelines to process massive data-streams in distributed computing environments such as sparc, kubernetes/docker microservices
  • Develop proprietary algorithms to build customized solutions that go beyond standard industry tools and lead to innovative solutions.
  • Develop sophisticated visualization of analysis output for business users.
    Provide control/analytics for all output produced to monitor/ensure established indicators/targets are met both during initial development and on an ongoing basis.
  • Identify opportunities for continuous improvement of current algorithms, solutions, and methodologies employed
  • Proactively collaborate with business partners to monitor solution health and changing requirements and develop actionable plans to address the same while optimizing for quality, use, cost, time-to-market amongst other variables.
     

Requirements

  • Bachelor's degree in Statistics, Computer Science, Mathematics, Machine Learning, Econometrics, Physics, Biostatistics or related Quantitative disciplines and 3 or more years' experience in an enterprise data science organization
  • Graduate degree preferred
  • Must have experience doing exploratory data analysis and visualization using state of the art python based libraries like pandas, numpy, matplotlib, searborn, plotly, streamlit etc.
  • Must have advanced expertise with software such as Python as well as expertise with JSON, SQL, experience using other programming languages like R, Java, C++, and expertise in GraphQL is preferred
  • Must have experience working with enterprise data warehouses, data marts, data bases, data lakes, or other distributed or cloud-based data storage systems
    Must have experience working in cross-functional teams and ability to communicate results to non-technical audiences.
  • Familiarity with synchronous/event-based system/data/orchestration architectures for batch, streaming/real-time and/or transactional workloads that employ one or more of the following technologies - Message Queues, Kafka,
  • RESTful microservices, sparc, kubernetes/docker
  • Experience with cloud platform and SAAS environments & tools like Azure, AWS, GCP preferred
  • Familiarity with CICD/DevOps tools such as Bitbucket, Bamboo, Jira, Confluence required
  • Experience doing test driven development, using standard logging, and debugging techniques is required
  • Work experience in Agile (Scrum) development teams required
     
             

Similar Jobs you may be interested in ..