Job Description :
Title: Data Scientist

Location: Columbia SC

Duration: 12 Months

Job Description:

Daily Duties / Responsibilities:

The Data Scientist will help discover information in vast amounts of data in order to make more valuable, data-based decisions to deliver the best solutions for Medicaid stakeholders.

The Data Scientist will apply data mining techniques, do statistical analysis, and build prediction systems that will be integrated with both existing and future products and systems.

We are looking for candidates who are highly organized, can work independently in a fast-paced environment and can develop and manage data environments.

Essential Responsibilities

1. Work alongside appropriate staff, teams, stakeholders and other points of contact (POCs) as required, to understand the goals and objectives of complex information systems

2. Enhance data collection procedures to include information that is relevant for building analytic systems

3. Select features, build and optimize classifiers using machine learning techniques

4. Process, cleanse and verify the integrity of data used for analysis

5. Data-mine using state of the art methods

6. Create automated anomaly detection systems

Program Experience:

Experience must include well documented success in applying statistical skills such as distributions, statistical testing, regression, etc.

Experience with conducting ad-hoc and regularly scheduled analysis and presenting results in a clear manner is ideal.

Experience with ontologies, semantic web modeling, and data modeling would be considered desirable for this position.

Technical Experience:

Experience with any or all the following technologies is desirable for this position:

Medicaid Management Information Systems (or other Health Information Technologies)

Data science and visualization using technology such as R, RStudio, QlikView, and Tableau

AI/ Neural Nets

UML and architectural modeling using tools such as Rational and SPARX

Big Data and NoSQL technologies such as MongoDB, Marklogic, Cassandra, and Hadoop

Fluency in a scripting language such as Python or R-

The ideal candidate has experience in all of the following product categories with at least one of the corresponding vendor technologies:

Product Category Vendor Technology

Data Science & Viz R, RStudio, SPSS Modeler, SPSS, QlikView, MDX, Tableau, Anaconda Spyder, SAS, BIRT, SSAS, SAS EM

AI/ Neural Nets TensorFlow, Keras, Word2vec, Doc2vec, CNNs, ANNs, LSTM, RNNs, GANs, Theano, Torch, Bidirectional LSTM

Ontology, Semantic Web

Modeling Magic Draw Visual Ontology Modeler, Smartlogic, RDF, RDFA, Turtle, SKOS, OWL, OWL2, SPARQL, Linked Data,

Neo4j, Open World Lexicography Assumptions, Ontology Frameworks, Revelytix, Protege

Architectural Modeling Magic Draw, Mega, Troux, IBM Rational, Sparx

UML Modeling MagicDraw Zachman and TOGAF, RUP, ArgoUML, RSA

Glossary, Models BG, ACORD Framework, Automotive All Divisions; Healthcare; Utility; Gas & Oil Process; GRC; Enterprise, Universal

Big Data, NoSQL Hortonworks, Cloudera, Intel, Sqrrl, Cassandra, MarkLogic, Couchbase, Cloudant, Alpine Labs, DataStax, MongoDB

Pivotal Platform HAWQ, Gemfire, Spring XD, MADlib, PivotalR, Greenplum, PostgreSQL, PL/R, Pythonu, plpy
Data Modeling IBM IDA, UML, ERwin, Sandhill, Rational Data Architect, Star Schema, Snowflake, Power-Designor, Navigator,

James Martin, IEF, IEW, 3rd Normal Form, ER/Studio, SA, RDA, ADRM, Big Data Modeling Techniques
ETL, ELT, ETML, EAI Sqoop, pig, Composite, Information Server, Custom, Informatica, ETI, Data Stage EE, SAS ETL, SSIS 2008; Talend

Data Profiling/Quality Exeros/ CA ERwin Data Profiler (now IBM Optim), Evoke AXIO (now Informatica Data Explorer), SAS, Profile Stage,

Information Analyzer, Talend Open Profiler; BODS Data Profiler, EIM, Information Steward; Trillium
Linguistic Algorithms R tm, NLTK, Gensim, SpaCy, Sense2vec, Triplets, Linguistics Analysis Services, NLP, CL, Collocation Analysis,

Generative Patterns, Dependency Grammars, SLING, DRAGNN, SyntaxNet, sonnet

Metadata MITI MIMB,Unicorn, IBM Metadata Workbench, MetaStage, Ron Ross, Platinum Repository, Global IDS
Database Pivotal Hawq, Hortonworks, Kudu, z/OS DB2, UDB DB2, SQL Server, Netezza, Oracle, MySQL, PostgreSQL

Graph Databases/Layers AllegroGraph, GraphLab/Dato, Giraph, Graphx, Neo4j
EDW/DM Methodologies Kimball Conformed Dimensions, Chris Adamson, Inmon CIF
ML, Data Mining R tm, NLTK, SAS EM, IBM Intelligent Miner, SQL Server Data Miner, Predixion
Programming Languages C, C++, C#, Python, SQL, Scala, Julia, J2EE, Perl, Bash, JMS, Ruby on Rails, Clojure, JavaScript

General Duties and Responsibilities:

1. Research and develop statistical learning models for data analysis

2. Implement new methodologies as needed for specific models or analysis

3. Conduct data collection, preprocessing and analysis

4. Collaborate with agency leadership, business partners and other parties/stakeholders to understand agency needs and provide recommendations and possible solutions

Required Skills (Rank In Order Of Importance):

1. 5+ years practical experience with data processing, data visualization and data analytics

2. 5+ years of experience coordinating complex data architecture to align with business needs

3. 5+ years quantitative analysis experience

4. 5+ years debugging experience

Preferred Skills (Rank In Order Of Importance):

1. Prior experience in working with query languages, probability tools, data analytics tools and business intelligence tools

2. Prior Health Information Technology and/or Program experience

3. Prior experience with Medicaid, Social Services, or similar public benefit programs

Required Education/Certifications:

1. College Degree or equivalent work experience required. Preference will be given to, in no particular order:
a. BS degree in Computer Science, Applied Math or similar discipline.

If Interested please provide me below information:

Full Name:

Email ID:




Availability for Interview:

Visa Status:

Visa Expiry date (MM/DD/YYYY):



SSN Last 4 Digits: