Job Description :
Title : Data Architect / Engineer
Location : New York City, NY
Employment Duration : Full Time

Data Engineer

The Data Platform is looking for an awesome Data Engineer with hands-on experience in developing and maintaining data pipelines in a fast-paced business environment.

The Data Platform is responsible for ensuring the right people have the right data at the right level of granularity across brands such as Nickelodeon, MTV, BET, and Comedy Central. We enable teams driving video, marketing, and advertising to be data driven and we are implementing practical approaches with machine learning to enhance the work people do here. We take an internal open source approach to our work and believe our code needs to be top quality in order to succeed. Be a part of this growing, critical team in a challenging but fun working environment.

Building, maintaining, and optimizing data pipelines and ETL jobs
Collaborating across the team to shape the Data Platform technology stack
Creating technical documentation
Improving automation and test coverage (unit/integration/user acceptance tests, etc
Resolving technically advanced problems
Keeping up to date with modern data engineering technologies

Required Skills:
5+ years of software engineering experience with 2+ years in data engineering plus a Master’s degree in Computer Science (or related field) or equivalent professional experience combined with a Bachelor’s degree
Extensive experience with Python and experience with either Java or Scala
Experience with Apache Spark and/or Apache Hadoop
Understanding of Apache Kafka and its use cases
Experience with build and dependency management tools (Gradle, sbt, Maven, npm)
Ability to build cloud native apps on Amazon Web Services
Good experience with agile development processes like Scrum and Kanban
Insist on automating everything (testing, Continuous Integration, Continuous Delivery)
Preferred experience in traditional business intelligence (e.g. star schemas, OLAP cubes)
Experience in working on the command line in a Unix/Linux environment

Nice to have:
Experience with one or more NoSQL databases such as Cassandra, DynamoDB, MongoDB, Neo4J, or RDF triple stores
Experience with data pipeline tools such as Apache Airflow, Luigi, AWS Data Pipeline
Experience scaling machine learning pipelines and SparkML specifically

Client : Avance Consulting