Job Description :
Site Reliability Engineer - Core
Pleasanton, CA

W2 Contract

Bill Rate: $100

Job Description

Work with teams across the organization to build and maintain monitor-able, performant, reliable and highly-scalable software systems
Designing, building, running and monitoring production infrastructure
Self-starter who enjoys working in high pressure environments, and understands the pressure and pride of maintaining a world class 24/7 production environment.
Desire and the ability to specify goals and constraints, propose alternative solutions to issues, consider risks, and evaluate and choose best course of action.
Partner with Architecture team on best practices for availability and resilience
Identifying and automating manual processes
Continuously evolving our monitoring tools and platform
Promoting and applying best practices for building scalable and reliable services across engineering
Analyze system and application level metrics for Peak capacity planning and for troubleshooting

Responsibilities:
Maintain 99.999% uptime for Gap, Inc family of E-com web sites
Ability to analyze failures, mitigate them on the spot, and work proactively to prevent them in the future
Quarter-back high Sev issues as required
Oversee RCA process
Analyze and recommend best solutions for high availability across all teams

Requirements:
Senior level Linux system administration and Storage
Networking Experience
Supported end-to-end systems and software
Scripting experience. (Python, bash etc)
Supported high scale web environments
Experience supporting tomcat, jboss and other application servers
Experience with Cloud infrastructure
Experience in monitoring and metric tools, like Splunk, Nagios, New Relic or similar.
Knowledge of CI/CD principals.
             

Similar Jobs you may be interested in ..