Job Description :

Job Title : DevOps/SRE (System Reliability Enigneer) 

Location : Cupertino, CA - Remote

Experience : 10-16 yrs Exp - L1 level

Key Qualifications

  • At least 5+ years in a Site Reliability Engineering or DevOps focused role
  • Experience in Ansible
  • Experience in scripting languages such as Python and Bash
  • Experience in implementing and coordinating telemetry via monitoring tools such as Splunk/Grafana/Prometheus at various levels (API, runtime, infrastructure, log analysis, etc)
  • Experience in container and container orchestration technologies such as Kubernetes, EKS, Docker
  •  Experience in systems built with open source storage and search technologies such as Cassandra, Postgres, Redis, ElasticSearch
  • Experience with scale testing, disaster recovery, and capacity planning
  •  Experience designing, building and maintaining infrastructure with a cloud provider such as AWS
  • Strong Linux system administration and networking knowledge.
  • Strong sense of ownership. At the same time you re a great teammate who communicates clearly and transparently
  •  Self motivated, inquisitive and always looking to learn more

Description

  • As an SRE/DevOps for the Reliability Software team, you will:
  •  Be challenged with high level problem statements and be expected to take ownership and drive solutions
  •  Implement solutions that operate at scale to improve the reliability of the team's data warehouse platform
  •  Develop, operate, monitor, and automate team infrastructure tools and services, both on-prem and in AWS
  •  Pioneer and implement monitoring tools for a complete telemetry system
  •  Actively participate in capacity planning, scale testing and disaster recovery exercises
             

Similar Jobs you may be interested in ..