Job Description :
Job Description/Skills needed:
Client''s Software and Services Group (SSG) High Performance Data Division (HPDD) is looking for an experienced, highly motivated, and talented Software Stress Testing Engineer with System Administration knowledge. In this position, you will be responsible for conducting Stress and Stability testing on High Performance Computing (HPC) parallel file system software and applications in clustered computing environments, which you will be helping to maintain.
You will be working as a member on a virtual team adjusting/tuning parameters and configurations and analyzing, comparing and presenting the Stress and Stability Testing results, over multiple releases, in a multitude of formats.
You will be participating in engineering requirements and design reviews speaking for the needs of Stress and Stability testing.
This position requires you to utilize independent judgment in designing and executing manual and automated tests to verify Stress and Stability requirements and to promote overall quality improvements.
The ideal candidate must have strong written and verbal communication skills and thrive in virtual, cross-geo and cross-functional groups, as well as, be a self-starter who enjoys working in a fast paced and dynamic environment.

Minimum Qualifications:
- Must possess a minimum of a Bachelor of Science degree in Electrical Engineering, Computer Science and/or Engineering in a related field with 3+ years of relevant experience in a development and/or testing role.
- Familiarity of testing and maintaining the environment for High Performance Computing clusters
- 1+ years’ experience in testing Linux-based file systems
- 1+ years’ experience with one or more language like C, Python, Bash
- 1+ years’ experience with Stress / Stability testing and analysis of data and presenting data in a variety of formats for evaluation consumption
- 1+ years’ Systems Administration of HPC clusters including Ethernet/Infiniband networks, multipath and failover storage configurations
- 1+ years’ experience in software build and packaging (Makefile, automake, rpmbuild)
- 1+ years’ experience working with CI and code review tools such as Jenkins, buildbot, github, gerrit
- 1+ years’ experience working with a source code management system such as git, SVN, and CVS

Additional Qualifications Include:
- Experience and understanding of parallel file systems and benchmarks including IOR, mdtest, and xdd
- Experience and understanding of the networking layer
- Linux kernel knowledge and experience
- Familiarity with Lustre, and debugging tools associated with Lustre
- Experience configuring and troubleshooting ZFS
- Experience working with or administrating a resource manager such as Slurm, LSF, PBS/TORQUE
- Experience working with configuration management tools such as Chef, Puppet, Ansible, Saltstack
- Understanding of the Open Source community
- Ability to work with and technically influence, virtual multidiscipline cross-geo teams
             

Similar Jobs you may be interested in ..