Job Description :
Site Reliability Engineer
Fremont, CA (West coast candidates are preferred)
6+ month contract
Phone and Skype

Our client is currently looking for a seasoned Site Reliability Engineer (SRE) to join the Platform Engineering team to build the platform and tools that helps keep the organization agile, flexible, and stable while accelerating the advent of sustainable transport.

Responsibilities:

Authoring technical documentation for workflows/processes/best practices.
Manage our on and off-prem kubernetes clusters to support our growing workloads
Take part in a 24x7 on-call rotation.
Influence architectural decisions with focus on security, scalability and high-performance.
Setup and maintain monitoring, metrics & reporting systems for fine-grained observability and actionable alerting.
Set the technical direction for our engineering teams.


Requirements:

5+ years of managing services in a distributed, internet-scale *nix environment.
Ability to prioritize tasks and work independently
Advanced or expert-level Linux administration.
Track record of practical problem solving under pressure.
Excellent communication, and documentation skills.
BS or MS degree in Computer Science or Engineering, or equivalent experience.
Advanced experience with configuration management systems such as Ansible, Puppet or Terraform.
Demonstrable knowledge of TCP/IP, Linux operating system internals, filesystems, disk/storage technologies and storage protocols.
Experience with AWS, or other cloud infrastructure providers.
Experience managing container-based workloads, using Kubernetes or other orchestration software
Proficiency in a high-level language like Python, Go, Ruby and/or Java
Excellent communication skills to collaborate with teams globally
Ability to manage competing priorities, and work well under pressure
Self-driven with an analytical mind with a bias for action
Knowledge of big data platforms such as Hadoop a plus
             

Similar Jobs you may be interested in ..