Job Description :
Apache Spark and Scala Mandatory

Job Description:

Sr. Data Engineer must have a broad and deep data skillset as well as strong analytical capabilities . In addition to being a hands on individual contributor, the ideal candidate is a productive team player and a mentor to Junior Data Engineers. Additionally, we are looking for strong technical experts.

Actively participate in team technical discussions in all things data
Identify and address issues with data sets from multiple vendors
Identify and address code and data quality issues
Actively participate in code reviews and grooming sessions
Actively participate in technology architecture discussions for product development
Translate business requirements into strategy
Advocate for software best practices within your team as well as across engineering
Be ultra-responsive and capable of making instant decisions, always kicking the ball forward
Work on unique and interesting data challenges around architecting, building and managing pipelines that securely process hundreds of terabytes of data
Work closely with analysts and statisticians to ensure the validity of our processes
Our engineers are expected to wear a number of hats and have the opportunity to touch all parts of the stack. Our stack includes Apache Spark, Scala, Redshift and an ever-growing list of many other cool technologies.

Skillful user of Apache Spark
Experience wrangling terabytes of big, complicated, imperfect data
Experience with AWS products (Redshift, EMR, S3, IAM, RDS, etc)
You have a deep understanding of scalable systems and you have large-scale engineering experience in an Agile development environment
Bachelor's degree in Computer Science or a related field (or 4 additional years of relevant work experience)
A strong understanding of data structures, algorithms, and effective software design
Significant development experience with a major modern language (e.g. Java, Scala, Python, Ruby, C/C++, etc
Significant experience working with structured and unstructured data at scale and comfort with a variety of different stores (key-value, document, columnar, etc as well as traditional RDBMSes and data warehouses
Experience with or interest in AWS Glue, Redshift Spectrum and any other tools that enable data querying at scale
Experience writing unit, functional and integration tests
Comfort with version control systems (e.g. Git, SVN)
Excellent verbal and written communication skills; must work well in an agile, collaborative team environment
Preferred Qualifications

Master's in Computer Science or a related field
Practical experience with supervised machine learning techniques
Strong background with test-driven development
Basic understanding of statistics and experience with statistical packages such as R, Matlab, SPSS, etc