Job Description :
Java Developer with Production Support Experience

Software Engineer, Site Reliability Engineer
Title: Site Reliability Engineer
Engineering Excellence, Reliability Engineering

Description:
Site Reliability Engineering (SRE) is an engineering discipline that combines software engineering and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. An SRE within the Engineering Excellence team will focus on increasing our tooling and automation and improving our systems availability.

Responsibilities
Build tools to quickly triage issues and discover failures across hardware, software, applications and network
In-depth analysis of service trends and implements adjustments to mitigate risk and prevent issue recurrence
Maintain production systems by measuring and monitoring availability, latency and overall system health
Provide guidance to software engineers related to design patterns that are resistant to failure
Support 24x7 on-call response to critical operational issues

Basic Qualifications
Strong technical knowledge of digital environment full stack including Mobile, Web, APIs, Messaging, Databases, Networks and their interactions
Knowledge and understanding of the SDLC principals and key controls
Experience working with and contributing to open source code or frameworks using Git version control
Strong knowledge of AWS Cloud solutions and product offerings
Experience with container technologies (i.e. Docker, Kubernetes)
Strong understanding of monitoring methodologies and proactive monitoring using APM (i.e. AppDynamics, New Relic) solutions or other monitoring and instrumentation technologies
Required knowledge and understanding of technical architecture, application systems design and integration in a large heterogeneous enterprise environment with hands on experience in SOA, Angular/Node, Java/J2EE, Oracle or MySQL/MariaDB programming methodologies
Experience working in an Agile environment (i.e. Scrum, Kanban)

Preferred Qualifications
3+ years programming in one or more of: Java, Node, Python, Perl or C
2+ years UNIX systems knowledge and/or systems administration background
Interest in designing, analyzing and troubleshooting large-scale distributed systems
Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
Experience debugging, optimizing code and automating routine tasks
             

Similar Jobs you may be interested in ..