Job Description :

Hi

Hope you are doing well !!

I have an urgent position. Kindly go through the Job description and let me know if this would be of interest to you.

Title : Site Reliability Engineer (Hybrid)

Duration : 6 Months

Location : San Jose, CA

About the job

Responsibilities & Required Skills/Experience:

  • NVIDIA (DGX) A100/ H100/ H200
  • Cisco UCS-C885A
  • Docker
  • NVIDIA certificated professionals preferred
  • Infrastructure knowledge on above skills
  • DevOps Automation
  • CI/CD systems (e.g., GitLab, GitHub Actions, Jenkins)
  • Terraform, Ansible, Jenkins
  • Python
  • Enterprise Grade Kubernetes cluster (RedHat OpenShift preferred) and/or Google Anthos
  • AI Infrastructure SRE Engineer responsible for

Technical knowledge of high-performance compute, NVIDIA DGX/GPUs and/or Cisco Unified Compute System.

Handle availability, latency, scalability and efficiency of NVIDIA and Cisco UCS infrastructure

by instilling engineering reliability into the development life cycle with a focus on fault tolerant approaches.

Drive capacity planning, performance analysis, instrumentation, and other non-functional systems requirements.

Automate operational capabilities using Python, Ansible, Terraform, Go etc.

Deliver automation through CI/CD pipeline and chatbot etc.

Implement metrics driven processes to ensure service quality targets are met.

If you are interested, please share your updated resume and suggest the best number & time to connect with you

Himanshu Gupta
US IT RECRUITER, DMS VISIONS INC

Ext-104 |

LinkedIn:

4645 Avon Lane, Suite 210, Frisco, TX 75033

             

Similar Jobs you may be interested in ..