Job Description :
We are looking for a Sr Data Engineer to join one of our E-commerce clients in their Dumbo office in Brooklyn. This role will require someone who has had experience working with or building platforms at scale data processing and collaborating with teams that use these platforms. This person will need to have experience building applications, and using one of the major cloud providers is a bonus, but not required. The team is primarily writing in Java, Scala, SQL, and use tools like Hadoop, Kafka, Airflow, Avro/Thrift, and GCP comparable tools like Dataproc, Dataflow, and BQ.
The members of the Data Engineering department build infrastructure for collecting, storing, and analyzing huge sets of data in batch and streaming pipelines. They also use this infrastructure to create and support high-quality datasets. The work of this team powers the rest of the company and enables new product development, machine learning and personalization, marketing campaigns, and financial analysis.

Build highly-performant systems that are maintainable and easy to understand by selecting and integrating with the best of current technologies.
Team is responsible for developing and monitoring our batch and streaming environments and improving or fixing them over time.
Write ETL code and advise other teams on how to improve theirs.
Build a lot of APIs and libraries in Java, Scala, or Python.
Responsible for the quality and consistent availability of our core business data.
You are willing to work with and improve code you did not originally write.
You are generous with your time and experience, and can mentor other engineers.
Can take on unconstrained problems and know when to seek help.
Core Capabilities:
o The advantages and limitations of distributed systems
o Using or maintaining batch data processing environments like Hadoop or Dataproc, and stream processing systems like Kafka Streams, Spark, or Dataflow
o Experience writing and scheduling ETL pipelines
o Experience writing SQL queries for exploration and analysis
o Integrating data from multiple sources