In support of the High Performance Computing Modernization Program (HPCMP) Integrated Technical Services (HITS) task order contract, Team SAIC provides program management and technical support necessary to advance the services, capabilities, infrastructure, and technologies in the HPCMP, Department of Defense (DoD) Supercomputing Resource Centers (DSRC) supercomputing centers. Team SAIC is looking for a Storage System Administrator who will assist the user community with the effective and efficient utilization of High Performance Computing (HPC) assets. The successful candidate will work along the side of leading HPC staff and researchers working to solve DoD’s most critical mission challenges.
Experienced Senior Linux Systems Administrator to develop detailed analysis and system architecture requirements for a peta-byte scale digital science data asset management and repository system in a high performance computing environment. Also, will be responsible for the daily administration of Storage environment across a diverse set of applications utilizing HSM (hierarchical storage manager) management software. This includes general storage administration tasks, storage software and hardware support, storage device configuration, support of daily backups, monitoring of storage devices, storage performance tuning and troubleshooting. Develop disaster recovery plans that include redundancy and syncing of data across data centers, as well as participating in development and test evaluations of storage environment changes. Assist with development and evaluation of training plans, and training related to storage administration.
Candidate must have hands-on experience and working knowledge of Red Hat Linux and/or Oracle/Sun Solaris Operating System, a mixed and growing Storage environment consisting of IBM Elastic Storage Server (ESS), Seagate, LSI, or Oracle storage systems, Oracle large scale tape libraries and Enterprise class tape drives, as well as Brocade Fiber Channel switches. This position will work with the existing system administration/storage team to manage, maintain, deploy, troubleshoot and support customers with access to the storage environments. This position will assist with implementing new storage and servers, and provide input for developing local operating procedures and policies as needed or requested. This position will need to be adept at monitoring the ongoing operation of the storage systems and providing utilization reports to identify problems and corrective actions as needed. Responsible for design, installation, configuration, security and maintenance of highly available enterprise data storage and operating systems. Additional duties, but not limited to include:
Monitor performance, errors, and warnings on FC network and attached storage systems
Maintain storage and FC network policies and standards
Troubleshoot performance problems and propose short term fixes and long term solutions
Interface with HPC vendors as necessary
Qualified candidate will support mass storage archive needs in an environment consisting of standalone, virtual machine, and large scale High Performance Computing (HPC) cluster based systems. Solid knowledge of Linux and general Unix operating systems concepts as well as extensive systems and/or storage administration experience is a plus. Candidate should also have the vision and demonstrated ability to analyze complex technical problems and develop innovative solutions that leverage resources and meet customer needs effectively. Excellent customer interaction skills including written and verbal communication will be necessary to successfully address the assignments for this position. Candidate must be team oriented and capable of multi-tasking in a dynamic, demanding environment and must be a self-starter, able to plan, schedule and coordinate assignments with other team members.
Mandatory Skills and Abilities
Experience working in complex HPC cluster architectures and storage implementations.
Possess excellent oral and written communication skills and ability to effectively interact with a highly skilled technical HPC support staff and users, internal staff, and management.
Organizational skills to balance and prioritize work in a dynamic work environment, and persistence to follow-through on tasks in the face of obstacles
Ability to work as part of a multi-faceted team, and leadership skills to guide and mentor the work of less experienced personnel
Solid knowledge of Linux, general Unix operating systems concepts and extensive systems administration experience.
Security implementations using multi-factor authentication, PKI, or Kerberos and Unix OS hardening to DoD STIG standards.
Must provide clear and extensive documentation on system administration procedures for routine and complex tasks.
Fluency in SAN Fabric Administration including zoning, LUN security, and HBA configuration and troubleshooting
Experience and understanding in enterprise backup and data protection technologies
NAS (object) and SAN (block) storage experience
Tape library and drive storage experience
Experience work with Tape Backup or long-term Archive Software – some examples are Oracle HSM, IBM HPSS, HPE DMF, Quantum StorNext
Experience in supporting multiple petabytes of storage a plus
Knowledge of Information Assurance (IA) controls, STIG viewer, Radix, and Risk Management Framework (RMF)
Strong troubleshooting and conflict resolution skills
Solaris skills, as that OS hosts the SAMQFS filesystem. Linux, most likely Red Hat, as the systems which will host the follow on to Solaris and SAMQFS. Operating System knowledge and experience is necessary, in order to understand user issues.
Knowledge of IBM GPFS storage system, and its support and admin maintenance
Education and Experience
Active Interim DoD Top Secret clearance required before start. (active/current Secret clearance at a minimum required to start with ability to obtain a Top Secret clearance)
Storage Architectures: SAN, SAS, FC, SATA, Bandwidth, Performance
DoD 8570.01M IAT Level II (Security+) certification and associated CE certificate
Prior DoD working environment experience a plus