Job Description :
Candidate must have Azure Databrick expertise with Python / Spark (Scala would be great)
Strong understanding of Data Bricks background architecture, and advanced concepts like Security and productionalization
Ability to operationalize and apply CI/CD (Azure Devops) to the application on Databricks
Strong Python / PySpark knowledge and overall strong knowledge of coding principles (data structures etc. Should have strong experience in using Python for data ingestion rather than analytics.
Preference of using Dataframe APIs over Pandas / SQL
Strong knowledge of pipeline development. Knowledge of using Scala for more mature use cases requiring stable pipelines dealing with huge data volumes.
Must have good understanding of understanding of provisioning of Databricks and Azure resources
Knowledge of CI/CD, Azure ML productionization is nice to have
Candidate must be familiar with Azure security model and Storage accounts
Candidate should have working knowledge in Confluence, Jira, GitHub
Good understanding of coding standards (code modularization, refactoring), testing automation
Client uses Airflow for Orchestration. Familiarity with Airflow will be an added advantage.
Candidate must have at least 3-4 yrs experience in productionized applications
Person should be communicative and digging as well into why / value / business question.
Most importantly, should be able to take business requirements and independently design and deliver the pipelines. Should work with little to no help from client once deployed.