Job Description :
Title: Service Reliability Engineer
Location: Remote – East Coast preferred
Duration: 6 months

Job summary
Client’s IT Enterprise Application Integration (IT-EAI) team is looking for a Service Reliability Engineer (SRE) to join us in Raleigh, NC. The IT-EAI team hosts multiple services supporting client’s critical business applications, including ERP, CRM, HCM and more. We will be undertaking a data center migration effort, and are looking for an SRE to lead our team’s efforts. You''ll be responsible for all aspects of application deployment to a new OpenShift cluster, including project creation, application deployment, network load balancing, performance management, and more. As an SRE, you’ll need to be able to work in a complicated and fast-paced environment while quickly learning new skills. In addition, you’ll create ways to consistently maintain existing service-level agreements (SLAs) and keep the regionally distributed, cloud-based, and containerized services always available. Candidates with prior remote employment experience may also be considered.

Primary job responsibilities
Plan and execute service migration to a new data center, deprecating use of the existing data center.
Update Jenkins pipelines and OpenShift templates to make use of the new environment
Ensure monitoring tools are actively gathering and analyzing data for the new deployment
Assess and mitigate the impact of increased latency between services and back-end data stores
Manage multi-site active network and load balancing configurations
Create and manage alerts detecting service anomalies, work with the team to resolve defects
Create and maintain standard operating procedures (SOPs) for performing maintenance tasks, applying configuration changes, and remediating problems in our environment

Required skills
7+ years of experience managing Linux servers running Red Hat Enterprise Linux (RHEL), CentOS, or Fedora hosted at a cloud provider like Amazon Web Services (AWS), Google Compute Engine (GCE), or Microsoft Azure
3+ years of experience with enterprise system monitoring; knowledge of New Relic, Splunk, Grafana, or Prometheus is a plus
3+ years of experience with deploying and monitoring container based applications; knowledge of Red Hat OpenShift and/or Kubernetes is highly desired
Experience managing multi-site, highly-available applications or services
Demonstrated ability to quickly and accurately troubleshoot system issues
Solid understanding of standard TCP/IP networking and common protocols like DNS and HTTP
Solid communication skills; experience working directly with and presenting to customers
             

Similar Jobs you may be interested in ..