Job Description :

Please send candidates that are able to clear a one-hour coding interview with the manager and 3 of his reports. The technologies they will code in are Python, PySpark, Spark and SQL.

They also need to be able to have an in depth 30-minute conversation with the manager around the data pipelines that they have created themselves on their project. They will need to explain how the data moved along the pipeline. They also need to clear a 30-minute brief technical screening on PySpark/Python and SQL.

I just filled one of these roles a few weeks back and the manager is looking to hire 3 more candidates!

Python/PySpark Data Engineers
Location: 100% Remote
Contract Length: 1 year contract to hire
Able to use or sponsor visas: no (can do pass through)

Visa Type: Any visa is fine as long as the candidate will work in the United States. They will also work off of east coast hours.

1) Top 3 requirements:

a. Azure stack (Databricks, Data Lake, DevOps, Functions)
b. SQL, Python, PySpark, Spark
c. End to end data pipeline build
d. Dimensional modeling and data warehousing
e. Are any of them flexible?

Required:

Strong verbal and written communication skills to effectively articulate messages to internal and external teams.

  • Hands on experience with Azure data platform stack (Azure Databricks, Azure Data Factory, Azure Data Lake storage, Azure DevOps, Azure Functions)
  • Experience with dimensional modeling and data warehousing concepts
  • Experience with designing, building, optimizing, troubleshooting end-to-end big data pipelines using structured (relational and files) and semi-structured data
  • Experience with building metadata driven data processing frameworks
  • Strong experience in Spark, SQL, python, Pyspark, shell scripting.
  • Ability to work in an agile team.
  • Ability to take ownership of a request from initial requirements, design, development and production deployment

Nice to have:

PowerShell, Scala, java
Apache Airflow
Azure EventHubs
Apace Kafka
Streaming data
CosmosDB/NoSQL Database

Q&A -------------------------------------------------------------------------------------------------
Common data model
Workflow - talk to business partners, gather requirements
Use azure stack to complete this work
Current common data model is a dimensional data warehouse
Data factory and functions - not necessarily needed
Azure DevOPs - can have something similar like github
All other azure requirements needed
Dimensional modeling
SQL - coding
Spark - questions on optimization
PySpark - coding
Python - coding
Shell scripting - if on resume, need to be ready to speak to

             

Similar Jobs you may be interested in ..