Job Description :

Position: Data Engineer

Location: Philadelphia, PA (initially will remote during COVID-19. Once everything is normal have to work onsite)

Duration: 12+months

Job Description:

Candidate needs to have total 10 years of expeirnce with minimum 5 years as a Data Engineer .

Years of Experience:10+

Education Required: Bachelors Degree or Equivalent Work Experience

Top MUST HAVE Skills:

  • Python using Pandas
  • Spark(AWS EMR, Databricks)
  • Cloud Computing (AWS Lambda, EC2, ECS)

Top PREFERRED skills:

  • Scala
  • AWS Neptune
  • QuickSight, Tableau

Soft Skills:

  • Should have worked as a lead before on data engineering projects. Should have had experience with data architecture.
  • 10+ years working as a software engineer.
  • 5+ years working within an enterprise data lake/warehouse environment or big data architecture.
  • Can do attitude!!!
  • Should not be scared of new things and should be able to learn new things quickly and find a path forward

Project Description:

This person will be the technical lead for the project, which is a graph representation of the access network. As part of this project, they would be responsible for extracting data from various disparate data sources through ETL scripts and storing them in S3 and eventually importing them into AWS Neptune database. They would then also be responsible for developing APIs to expose to consume data from the Neptune DB. They will be responsible for design and architecting the E2E solution and providing guidance to junior data engineers to get the work done. They should be very proficient in Python, especially using libraries like Pandas, etc. They should have experience working with Databricks. They should be very familiar using AWS services.

Responsibilities:

  • Developing large scale data pipelines exposing data sources within the company to our team of data analysts and data scientists.
  • Developing REST APIs utilizing AWS lambda and API Gateway.
  • Developing Spark streaming and batch jobs to clean and transform data.
  • Writing build automation to deploy and manage cloud resources.
  • Writing unit and integration tests.

Some of the specific technologies we use:

  • Programming Languages (Python, Scala, Golang, Node.js)
  • Build Environment: GitHub Enterprise, Concourse CI, Jira, Serverless, SAM
  • Cloud Computing (AWS Lambda, EC2, ECS)
  • Spark(AWS EMR, Databricks)
  • Stream Data Platforms: Kinesis, Kafka
  • Databases: S3, MySQL, Oracle,MongoDB, DynamoDB
  • Caching Frameworks (ElasticCache/Redis)
             

Similar Jobs you may be interested in ..