Job Description :
Tech Stack
5+ years of experience owning and building data pipelines.
Extensive knowledge of data engineering tools, technologies and approaches
Ability to absorb business problems and understand how to service required data needs
Design and operation of robust distributed systems
Proven experience building data platforms from scratch for data consumption across a wide variety of use cases (e.g data science, ML, scalability etc)
Demonstrated ability to build complex, scalable systems with high quality Experience with specific AWS technologies (such as S3, Redshift, EMR, and Kinesis)
Experience with multiple data technologies and concepts such as Airflow, Kafka, Hadoop, Hive, Spark, MapReduce, SQL, NoSQL, and Columnar databases. a plus
Experience in one or more of Java, Scala, python and bash.

Expected Outcomes:
Design and implement data infrastructure and processing workflows required to build a data lake in AWS to support data science, machine learning, BI and reporting also in AWS Build robust, efficient and reliable data pipelines consisting of diverse data sources
Design and develop real time streaming and batch processing pipeline solutions
Own the data expertise and data quality for the pipelines
Drive the collection of new data and refinement of existing data sources Identify shared data needs across
Build data stores for feature variables required for machine learning

Client : Impetus