The Kubernetes Operations Lead Specialist Engineer will oversee and manage enterprise-level Kubernetes clusters, ensuring optimal performance, scalability, security, and reliability of containerized platforms. This role involves leading a team responsible for day-to-day Kubernetes operations, automation, infrastructure improvements, and production support across hybrid and multi-cloud environments. The candidate should have senior-level hands-on expertise in Kubernetes, DevOps tooling, CI/CD pipelines, cloud services, and container management.
-
Lead Kubernetes operations, including cluster deployment, configuration, upgrades, scaling, monitoring, and performance tuning.
-
Manage and maintain Kubernetes clusters across cloud and on-prem environments such as AWS, Azure, or GCP.
-
Define and enforce best practices around container orchestration, security, network policies, resource optimization, and workload management.
-
Implement observability solutions leveraging logging, monitoring, and tracing tools such as Prometheus, Grafana, ELK, and others.
-
Oversee incident response, root cause analysis, and post-incident reviews to ensure platform reliability.
-
Design and automate infrastructure operations using Infrastructure-as-Code tools like Terraform, Helm, and Ansible.
-
Collaborate with development, security, and operations teams to support DevOps workflows and CI/CD pipeline integration.
-
Lead capacity planning, resource forecasting, performance assessments, and upgrade planning.
-
Document standards, deployment procedures, runbooks, and operational best practices.
-
Mentor junior engineers and function as a technical escalation point.
-
12+ years of overall IT experience with at least 6+ years in Kubernetes and container orchestration.
-
Hands-on experience administering and operating large-scale Kubernetes clusters.
-
Strong background in cloud platforms such as AWS, Azure, or Google Cloud.
-
Expertise in Docker, Helm charts, Terraform, YAML configurations, and service mesh technologies (e.g., Istio, Linkerd).
-
Proficiency with CI/CD tools such as Jenkins, GitLab CI, Argo CD, or Tekton.
-
Strong knowledge of Linux systems, networking concepts, load balancers, and DNS.
-
Experience with observability and monitoring frameworks including Prometheus, Grafana, ELK, OpenTelemetry.
-
Familiarity with Kubernetes security standards, RBAC, policies, certificate management, and image scanning.
-
Solid scripting experience with Bash, Python, or Go.
-
Strong troubleshooting and performance optimization skills.
-
Kubernetes certifications such as CKA, CKAD, or CKS.
-
Experience with multi-cluster, hybrid cloud, or on-prem Kubernetes distributions such as EKS, AKS, GKE, OpenShift, or Rancher.
-
Knowledge of GitOps methodologies and tools like Argo CD or Flux.
-
Experience with disaster recovery, backup solutions, and high availability architecture.
-
Previous team leadership or technical architect experience.
-
Familiarity with site reliability engineering concepts and automation frameworks.