Job Description :

For one of our Ongoing Remote Multiyear Project > 5+ years we are looking for a Chaos Engineer w/ Chaos Monkey, Gremlin, Simian Army

 

Remote Position throughout.

Job Description:? 

Candidate will be part of the SRE team and lead technical role to determine Reliability & Chaos Engineering needs of mission critical systems and business processes. Candidate will assess high level architecture and design issues relating to platform, enterprise software, interactions with other systems, Application development, infrastructure, database and middleware teams to ensure stability and reliability of the system. Chaos Engineering will proactively detect issues within the applications, platform, network and databases in a controlled way using Chaos tools like Chaos Monkey, Gremlin, Simian Army. Candidate should have familiarity with Internet protocols such as HTTP, DNS, TCP, and UDP and Linux development environment and well versed with DevOps. Candidate will identify anti-patterns, optimization and support development of self-healing capabilities. 

Responsibilities:

·         Create operational tooling for monitoring, self-healing infrastructures, and chaos testing

·         Design and create controlled chaos in production systems

·         Work across teams identify and fix issues that affect systems reliability and performance

·         Guide and design architectural decisions and direct solutions that will enhance our client’s product reliability

·         Dive into system and latent reliability issues, service performance, and capacity modeling of distributed systems at scale

·         Partner with development team to identify anti-patterns and optimization strategies, create fallback options and help develop self-healing capabilities across the enterprise in a sustainable manner.

 

Skills and Experience Requirements:

·         A passion for creating reliable applications and a systematic problem-solving approach, coupled with a strong sense of ownership and drive

·         7+ years of hands-on experience with cloud-based technologies and tools in configuration management, deployment, monitoring and operations

·         Experience with Chaos Engineering tools such as Chaos Monkey, Gremlin, Simian Army and familiarity with Internet protocols such as HTTP, DNS, TCP, and UDP and Linux development environment

·         Experience in Application Performance Managing /Real User Monitoring, infrastructure monitoring and log analysis tool such as Dynatrace, Nagios, Sensu and Splunk.

·         5+ yrs. of experience with DevOps, Continuous Delivery

·         Expertise in working in partnership with colleagues throughout the firm, and in leading collaborative teams to achieve common goals

·         Experience in an Agile delivery environment

·         Experience as a hands-on software engineer so you understand the core principles of the engineering work

·         Experience supporting customer problems and communication

·         Experience or interest in speaking/presenting at tech conferences.

·         Experience in communication and organization in large, distributed teams

             

Similar Jobs you may be interested in ..