Senior Linux HPC System Administrator
+ **Please wait...**
**Date:** Aug 22, 2021
**Location:** Corning, NY, US, 14831
Requisition Number: 46191
Corning is one of the world’s leading innovators in materials science. For more than 160 years, Corning has applied its unparalleled expertise inspecialty glass, ceramics, and optical physics to develop products that have created new industries and transformed people’s lives.
At Corning, our growth is fueled by a commitment to innovation. We succeed through sustained investment in research & development, a unique combination of material and process innovation, and close collaboration with customers to solve tough technology challenges. We are a four-time National Medal of Technology winner thanks to our technology leadership and R&D environment, which attract and enable the best scientific minds in the world. This pipeline of talent has brought life-changing innovation to your fingertips for more than 160 years.
SCOPE/PURPOSE OF POSITION:
As a member of the Scientific Computing team, you will lead and participate in the deployment, management, and optimization of systems, software, and processes in support of Corning’s Scientific Research High Performance Computing environment. You will work closely with other HPC System Engineers and Administrators, and with Corning’s Modeling and Machine Learning community to identify and provide solutions and technical support that enable Modeling and Scientific Computing objectives to be met.
ROLES AND RESPONSIBILITIES:
• Configures, installs, maintains and upgrades HPC clusters (compute, storage, and network) and applications in support of research computing environments
• Leads and collaborates on projects to enhance functionality in areas such as systems monitoring, scheduling and resource management, configuration management, and backups.
• Recommends and implements improvements to existing HPC system management tools and processes
• Diagnoses, isolates, and resolves complex application and system technical problems
• Provides technical expertise to improve HPC cluster performance and resiliency
• Develops scripts and automation to enhance operational services and service quality
• Builds, installs, and supports scientific software (Commercial and Open Source)
• Supports compute, storage, and network technology evaluations and assessments
• Mentors and trains less experienced members of the HPC Operations team
• Develops, implements, and documents system architectures, new capabilities, and operational standards
• Provides support and training to the user community
• Develops and maintains technical documentation for use by the modeling community
• Interacts with hardware and software vendors and Corning’s Global Sourcing Management team to execute purchases, renewals, and service contracts.
• Builds relationships that foster collaboration and partnerships to drive better services for the technology community
• Bachelor’s degree (B.A/B.S). in Computer Science, Engineering, or related course of study, or equivalent combination of education and relevant experience
• Minimum of 7 years of Linux (RHEL, CentOS) System Administration experience in a large distributed computing environment
• Experience providing support for Linux HPC clusters used for scientific research is preferred.
• Extensive understanding of infrastructure technologies including server, storage, network, database. and virtualization
• Experience configuring, managing, and optimizing large Linux clusters and servers
• Experience configuring, managing, and optimizing distributed and parallel file systems such as Lustre, GPFS, NFS, Ceph.
• Familiarity with high-performance networks such as Infiniband, and with network management
• Strong scripting/programming capabilities with Python, Bash, Perl
• Extensive knowledge of CentOS, RedHat, Ubuntu and experience maintaining, upgrading, and tuning the Linux kernel
• Experience with installation and use of system configuration management and orchestration tools such as Puppet, Ansible, Chef, Cobbler
• Experience with installation and configuration of system management, monitoring/alerting tools (e.g. Ganglia, Nagios, Prometheus, Zabbix)
• Experience building applications from source and ability to troubleshoot compilation issues.
• Demonstrated ability to quantify, analyze and resolve complex system issues, determine root cause, and develop preventive actions
• Demonstrated ability to perform complex performance analysis including system processes, I/O subsystems, networks and other related components.
• Ability to work independently as well as collaboratively within a team, to include the ability to lead moderately complex projects or small project teams
• Excellent written and oral communication skills for interacting with customers, team members, and management
• Proactive and innovative, with ability to foresee and prevent potential problems
• Organizational and time management skills, exceptional follow-through, and ability to manage multiple priorities
• Passion for providing excellent customer service
• Experience integrating systems or designing solutions for HPC workloads
• Experience installing, configuring, and maintaining job management tools (such as PBS, SLURM, Moab, TORQUE, etc.)
• Experience with performance benchmarking using profilers and debuggers to recommend code improvements for scalability and performance.
• Experience configuring, installing and troubleshooting MPI and OpenMP preferred.
• Experience managing virtualization platforms (VMWare, KVM, oVirt)
• Knowledge of containerization platforms and technologies such as Singularity and Kubernetes • Experience with configuration and management of high-performance networks such as Infiniband or Omni-Path.
• Experience with on-prem and public cloud technologies (AWS, Azure, GCP), OpenStack
We prohibit discrimination on the basis of race, color, gender, age, religion, national origin, sexual orientation, gender identity or expression, disability, veteran status or any other legally protected status.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.
**Nearest Major Market:** Corning
+ **Please wait...**