Job Description :
Key Responsibilities:
Play a key role in Client’s digital transformation. Drive company’s Reliability Engineering initiative and be part of a DevOps culture, working with global platform and development teams.
o Reliability Engineering
o Platform Engineering
o Tools and automation
o DevOps mindset
o Cloud computing
o Networking
Own continuous delivery framework and focus on automation.
Automate different pieces of the environment through Ansible, Chef or Puppet
Ensure stability and integrity of high performance and high availability cloud based systems for the organization.
Design, develop and test automation workflows.
Experience building and managing in a cloud environment, preferably AWS or Azure
Key driver suggesting continuous improvement in systems operations through tools and automation
Report on overall health and optimization of cloud services to management
Participate in the definition of the roadmap for cloud services in collaboration with the Platform Engineering and architecture teams.
Drive root cause analyses, and coach others on doing them, in collaboration with software development teams
Analyze and adjust designs to assist in predicting and improving system stability.
Determine areas initially needing testing and develop a plan to obtain standard data for troubleshooting
Review engineering specifications and drawings, proposing design modifications to improve reliability within cost and other performance requirements
Evaluate environment on and off premise for environmental factors, such as numbers and causes of unit failures
Monitor failure data generated by a customer using product to ascertain potential requirement info product improvement
Responsible/assist in incident and problem management of cloud platform services
Works closely with our external partners and industry leaders to ensure we have a keen understanding of where industry trends and technologies are going and factor those into our strategies and roadmaps
Previous experience with Service Now and proficiency in creating workflows
Ensure the solutions the Reliability Engineering team designs and delivers are meeting defined business requirements and will stand the test of time from an operational excellence perspective
Assist in creating roadmap and work towards DevOps model of service engineering
Work closely with IT Security to ensure the solutions we’re designing and delivering meet data security and compliance requirements
Provide regular communication to peers on areas for improvement, progress, milestones, and areas of success
Be available for scheduling for 24/7 oncall rotation to respond to and resolve issues
Ensure documentation and processes are well defined
Experience with Enterprise Systems management tools
Experience maintaining the health of the environment by keeping the systems current with upgrades and patches and by troubleshooting and resolving issues with tools

Key Skills: Minimum Requirements/Qualifications:
In depth understanding and experience in AWS and a cloud first/cloud only initiative
Must have a minimum of 5-7+ years of Linux experience
Must have a minimum of 3-5 years of experience in Cloud Solutions Delivery and Cloud architecture, especially public cloud platforms such as AWS, Azure, GCP. Strong preference for AWS experience
AWS proficiency (ec2, s3, RDS, Route 53, Lambda, IAM, VPC, Security groups) and other services
Scripting languages: Python and overall Linux shell scripting skills
Ability to identify performance bottlenecks utilizing data in the environment while using data to do so
Experience in the Unix administration/engineering
Working knowledge in Docker
Strong understanding of fundamental distributed system principles
Experience in building highly available platforms running in production handling no downtime rollouts
Bachelor’s degree in Information Systems, Computer Science or other technical field is desired
o 5 years of additional direct and applicable professional experience in the IT field may be substituted
Strong interpersonal and excellent documentation skills are a must
Effective problem-solving techniques, such as root cause and analysis, to resolve issues
Takes ownership of work assignments and manages them to completion
Ability to explain and champion technical concepts to a broad technical audience
Excellent customer service and communication skills required
Ability to problem solve and work well in complex, ambiguous situations
Ability to present technical problems and solutions to management in a manner that can consumed and translated into appropriate program improvement requests
Highly technical and analytical with background in systems implementations and migrations.
Hands-on experience in installing, configuring and troubleshooting UNIX /Linux based environments.
Provide organizational support for relationship development to foster teamwork, build relationships, and promote collaboration to cultivate and strengthen a network for the exchange of ideas
Demonstrated proficiency in physical and virtual server hardware maintenance
Experience in AWS Artificial Intelligence/Machine Learning (AI/ML) implementation/configuration/analysis is a bonus.
             

Similar Jobs you may be interested in ..