Job Description :
Linux and HPC: Manager-Systems Operations

Memphis TN - 37544

This technical management position is responsible for the overall administration, operational standards, and security of institutional Linux operating systems and associated server hardware, including High Performance Computing (HPC) infrastructure. The position leads a team of experienced systems and database administrators, and is responsible for the overall architecture of Linux and HPC system deployment, configuration management, automation, lifecycle management, and capacity planning. This position provides a unique opportunity to directly support the world-changing mission of St. Jude Children’s Research Hospital and its growing research computing infrastructure needs. This position reports to the Director of Operations and Research Computing in Information Services.

Resources include a new state-of-the-art data center and network, a robust virtualization infrastructure, and partnership with a research services support division within Information Services to effectively engage and support the St. Jude research community.

The ideal candidate will have demonstrated capability to deliver a high standard of system performance and availability, 24 x 7 operational support, systems management and upgrades, security, and customer service. This professional requires experience with enterprise production operations environments, and use of industry best practices in technology and processes to ensure St. Jude Children's Research Hospital systems are architected and supported to peak efficiency and availability

Minimum Experience:

Eight (8) years of experience in job specific skills that includes six (6) years in systems, network, or operations management and two (2) years of supervisory or leadership experience required.
Progressive systems administration and architect experience with Red Hat Enterprise Linux, management tools, and enterprise-scale deployment is required.
Experience with designing and managing HPC cluster infrastructure with thousands of cores and parallel filesystems is required.
Experience with the Platform LSF workload management system, Spectrum Scale (GPFS), and InfiniBand fabrics in an HPC context are all preferred.
Experience with use and integration of private and public cloud services, Linux container technology, and big data infrastructure and platforms are all preferred.
Experience with deployment and management of configuration management system such as Ansible, Puppet, or Chef at an enterprise scale is preferred.
Experience in team development and project management is preferred.
Experience in a multi-vendor information technology environment is preferred.
Experience in a health care, biomedical research, or academic environment is preferred.

Minimum Education:

Bachelor's Degree is required.Master's Degree preferred.

Client : St. Jude Children's Research Hospital