Job Description :

Job Title :  DevOps/SRE (System Reliability Enigneer)

Location : Cupertino, CA - Remote

Experience : 10-16 yrs Exp - L1 level

Key Qualifications

  •  At least 5+ years in a Site Reliability Engineering or DevOps focused role
  • Experience in Ansible
  •  Experience in scripting languages such as Python and Bash
  •  Experience in implementing and coordinating telemetry via monitoring tools such as Splunk/Grafana/Prometheus at various levels (API, runtime, infrastructure, log analysis, etc)
  • Experience in container and container orchestration technologies such as Kubernetes, EKS, Docker
  •  Experience in systems built with open source storage and search technologies such as Cassandra, Postgres, Redis, ElasticSearch
  •  Experience with scale testing, disaster recovery, and capacity planning
  •  Experience designing, building and maintaining infrastructure with a cloud provider such as AWS
  •  Strong Linux system administration and networking knowledge.
  • Strong sense of ownership. At the same time you re a great teammate who communicates clearly and transparently
  • - Self motivated, inquisitive and always looking to learn more

Description

  • As an SRE/DevOps for the Reliability Software team, you will:
  •  Be challenged with high level problem statements and be expected to take ownership and drive solutions
  •  Implement solutions that operate at scale to improve the reliability of the team's data warehouse platform
  • Develop, operate, monitor, and automate team infrastructure tools and services, both on-prem and in AWS
  •  Pioneer and implement monitoring tools for a complete telemetry system
  • Actively participate in capacity planning, scale testing and disaster recovery exercises
             

Similar Jobs you may be interested in ..