Job Description :
Candidates will need to be on-site at Quincy or Boston locations
Job Description:
Set up and support data ingestion, utilizing Apache Spark, into HBase Hadoop databases, as well as perform data movement/development leveraging Apache Spark up to, and through data publication.
Role is also responsible for filtering, tagging, joining, parsing, and normalizing data sets throughout the end-to-end process.
Datasets will typically originate from large RDBMS and unstructured data sources (multi-Terabyte) through the development of highly-efficient reusable code.

Accountable for following leading practices around secure code development to ensure the platform is free of most common coding vulnerabilities. Participate and perform peer code reviews. Create automated unit test scripts and use them
as part of a continuous integration development process. Document the code during development to ensure maintainability. Document changes to the code for traceability, and update the traceability matrix or other requirements tracking tool. Fix any defects
and performance problems discovered during testing.


Broad knowledge of Hadoop tools such as Kafka, Flume, Sqoop and Oozie. Knowledge of data formats and ETL and ELT processes in a Hadoop environment including Hive, Parquet, MapReduce, YARN, HBase and other NoSQL databases. Experience in
dealing with structured, semi-structured and unstructured data in batch and real-time environments. Experience working in AWS environments leveraging EC2, S3, Lambda, RDS, etc. General knowledge of how data science tools such as R Studio, Anaconda, H2o, Tableau
leverage data throughout the big data ecosystem.





Top 5 qualifications:


Quick list of top five skill sets and experience, health insurance domain knowledge and experience is very helpful. AWS experience is helpful, but not important comparing to the other five.



Experienced in cloudera, particularly hue, impala, HDFS, hive, hbase, oozie , spark, yarn (very much in this order) .
Experienced in jupyterhub, jupyter notebook.
Experienced in SQL and large data joins.
Python as programming language, R is nice to have.
Wide knowledge of tools in analytics and ML space, such as tableau, exploratory, RStudio and etc.