Job Description :

Development+ SRE

We need developers turned into SRE positions . Someone who understands Engineering/development process well and can do scripting.

The role will be mix of Development and SRE.

Location- Onsite in Riverwoods, IL

Exp- 8-12 years

As a SRE, you will ensure the reliability, scalability, and performance of mission-critical applications and infrastructure. This role requires deep technical expertise, strong coding skills, and collaboration with development and DevOps teams to enhance observability, automation, and production stability in hybrid environments

  • 6+ years in Site Reliability Engineering or related roles.
  • Strong experience with NFT frameworks and performance testing tools.
  • Hands-on expertise in Red Hat OpenShift and container orchestration.
  • Solid understanding of Linux systems administration and hybrid cloud integration.
  • Proficiency in monitoring tools (Prometheus, Grafana, ELK stack) and incident management.
  • Experience with CI/CD tools (Jenkins, GitLab CI, DevOps).
  • Advanced coding skills in Java, Python, and one additional language (Go, Scala, or Shell).
  • Deep knowledge of distributed systems and data platforms (Hadoop ecosystem, Kafka, NiFi).
  • Excellent problem-solving and communication skills.

Key Responsibilities

  • Non-Functional Testing (NFT): Design and execute NFT strategies to validate system performance, resilience, and scalability before production deployments.
  • Production Deployment Management: Oversee and coordinate production releases, ensuring zero-downtime deployments and rollback strategies.
  • OpenShift Expertise: Manage and optimize workloads on Red Hat OpenShift Container Platform (OCP), including cluster configuration and troubleshooting.
  • Hybrid Integration: Integrate on-premises Linux VMs with cloud services, ensuring secure and seamless connectivity in hybrid environments.
  • Monitoring & Incident Response: Implement advanced monitoring solutions, proactively detect anomalies, and lead incident response and root cause analysis.
  • Preventive Measures: Develop automation and guardrails to prevent outages and improve system health.
  • Collaboration: Partner with DevOps and development teams to enhance CI/CD pipelines, improve observability, and embed reliability practices into the SDLC.
  • Distributed Data Processing: Build and maintain data pipelines using Hadoop, HBase, Kafka, and NiFi for large-scale data processing and streaming.
  • Programming & Automation: Write robust, maintainable code in Java, Python, and at least one additional programming language (e.g., Go, Scala, or Shell scripting) for automation, tooling, and system integrations.

Interested do share me your resume at

             

Similar Jobs you may be interested in ..