Job Description :
Monitoring, Ticketing, Alert Response, Break/Fix:
Monitor infrastructure using monitoring systems
Monitor ticketing queue, work tickets to resolution
Create trouble tickets when not generated by automation, and work tickets to resolution
Coordinate hardware break/fix activities with local staff - DC Techs, Field IT, and/or Smart Hands where applicable
Create vendor service request (SR) and collect diagnostic information as needed
Operations including but not limited to:
Configure datastores, filesystems, mount volumes
Configure server, cluster, and networking
Deploy software and firmware updates
Work change tickets and execute during change windows
Logical decommissioning of clusters and servers
Escalate and communicate as per standard operating procedures
Maintain documentation and update as needed
Work with team to keep documentation related to the above areas maintained
Work with the engineering team to perform a monthly audit verifying documentation is current
Perform project work including but not limited to:
Server / cluster deployments involving complex and/or custom requirements
Testing, validation, and deployment of new server hardware, firmware, hypervisor software
Performance Monitoring and Analysis:
Performance analysis of overall server virtualization environment
Troubleshooting and Break/Fix:
Troubleshoot and resolve issues escalated by L1 team
Escalate to Engineering team and/or Vendor if required
Identify repeat issues and trends, work with engineering to scope monitoring and/or automation to prevent or auto-remediate problems
Vendor Service Request lifecycle management:
Daily follow-ups for all open Service Requests
Work with vendor(s) to resolve Service Requests
Provide a weekly summary of all active (opened/ongoing/closed) Service Requests for the period
Participate in weekly on-call rotation:
Respond to critical alerts and alarms outside regular office hours
Triage and resolve issues
Escalate to engineering or other teams per escalation processes