Role Overview:
We are seeking a skilled and proactive Databricks Production Support Analyst to join the DIA (Data, Insights & Automation) track supporting enterprise-wide data platforms. This role involves ensuring the reliability, performance, and availability of Databricks environments as part of a broader data ecosystem.
Key Responsibilities:
Provide L2/L3 production support for Databricks notebooks, clusters, jobs, and pipelines.
Monitor job executions and system performance, ensuring SLAs are met and issues are resolved promptly.
Manage incidents, service requests, and perform root cause analysis for recurring issues.
Support data pipelines developed using Spark, Python, and integrated tools.
Collaborate with data engineering teams to optimize performance and ensure data quality.
Assist with minor enhancements, version upgrades, migration support, and regression testing.
Implement SOP-based bots and AI/ML tools for proactive issue detection and resolution.
Maintain documentation, contribute to the knowledge base, and support onboarding of new team members.
Key Skills & Technologies:
Hands-on experience with Databricks, Spark, and Python in a production support environment.
Strong understanding of data lake architectures and cloud-based data platforms.
Experience in monitoring and managing cluster performance and pipeline health.
Familiarity with Cloudera Hadoop, Tableau, Postgres, and Cloudera Data Science Workbench is a plus.
Exposure to incident management tools like ServiceNow.
Ability to analyze and resolve data quality, access, and performance issues.
Willingness to work in rotational shifts and ensure 24x7 availability as required.
Nice to Have:
Experience with AI/ML-driven incident management and automation tools.
Understanding of low-code platforms like Mendix or Appian and integration with Databricks.