Job Description :
Site Reliability Engineer
San Jose, CA
Long Term

Engage in and improve the complete service lifecycle —from design, development and deployment, operation and enhancement.
Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Scale systems through automation and evolve systems by pushing for changes that improve reliability and velocity.
Practice sustainable incident response and thorough postmortems.


Minimum qualifications:
Experience in containers such as dockers and container orchestration tools (Kubernetes is a plus)
Experience with detailed analysis and system design of at least of the messaging systems – Kafka (preferred) and Active MQ or other JMS systems.
Experience in one or more of the following: Java, Python, Go.
Interest in designing, analyzing and troubleshooting large-scale distributed systems.
Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
Ability to debug and automate routine tasks.