Job Description :
Position: Site Reliability Engineer
Location: RTP, NC
Type: Full Time
Duration: Long Term

Job Description
Looking for a Senior Engg – Site Reliability Engineering with extensive experience in scaling and operating cloud-native platforms. You''ll work with other SRE team members operating Cisco IT''s multi-tenant OpenShift and dedicated Kubernetes environments.

Collaborate with other core services team members to define roadmap, write clear user stories with well-defined acceptance criteria, design, and build solutions
Develop and deliver automation software required for building & improving the functionality, reliability, availability, and manageability of applications and cloud platforms
Design, architect, and build self-service, self-healing, synthetic monitoring and alerting platform and tools
Automate the development and deployment of infrastructure using Docker, Kubernetes & other orchestration technologies in a hybrid-cloud environment
Champion and drive the adoption of Infrastructure as Code (IaC) practices and mindset
Identifying performance bottlenecks, identifying anomalous system behavior, and determining the root cause of incidents
Engage in capacity planning and demand forecasting and scaling the environment
Managing seamless upgrades of infrastructure and services through automation

Who you are:
You are an excellent Engineer with Platform as a Service (PaaS) design, architecture and development experience building cloud platform and deploying cloud-based microservices application. You have a solid background in and understanding of software systems with the ability to work closely with the rest of the Engineering team from the early stages of design all the way through identifying and resolving production issues. You’re passionate about this role and believe that automation is key to operating large-¬scale systems. You’re flexible and willing to learn new things and mentor others.

Required Skills and Experience:
10+ years of solid hands-on experience building, maintaining, and scaling PaaS and container-hosting platform
Software programming experience in one or more programming languages: Python, Golang, Java
A proven track record with Docker containers with a deep understanding of the current container ecosystem
Demonstrable experience with running containers (Docker/LXC) in a production environment (Kubernetes, Docker Swarm, Rancher, Mesos)
Deep understanding of Kubernetes fundamentals, including scaling for production workloads
Expert skills with Linux (network, OS, process level), networking (network layers, DNS, load balancing), storage, and virtualization
Experience with running multi-cluster environments and solid grasp of multi-tenancy and security implications
Experience with build automation and configuration management systems (e.g. Jenkins, Ansible)
Knowledge of continuous integration (CI) and continuous development (CD) pipelines
Previous experience in supporting large-scale production environments
Ability to analyze and debug complex software and infrastructure issues, and develop tools/systems for task automation
Experience working in an agile development environment
Strong analytical and problem-solving skills
Good communication and teamwork skills
Bachelor’s degree in CS/CE/EE or equivalent is required.