Company Overview
A leading US medical devices organization. Has presence in more than 150countires. Fortune100 organizaton
This job has been closed. You will find bellow the job description as a reminder. It is not possible to apply anymore.
A leading US medical devices organization. Has presence in more than 150countires. Fortune100 organizaton
Extensive experience managing HPC networking at an expert or near expert level involving technologies and fabrics based on InfiniBand Omni path etc.
This includes but not limited to the common tools monitoring sampling telemetry data and best practices around these type of fabrics Specific knowledge around technologies utilized in HPC clusters for example RDMA and NVMe oF Knowledge around HPC network topologies and why they are used for different types of clusters and sizes fat tree 3D Torus Hypercubes etc .
Knowledge and experience around working with clustered distributed filesystems and scale out storage technologies for example Lustre Ceph Isilon etc Previous experience in how to monitor HPC infrastructure the observer problem.
Must have experience to understand sample rates and how to tap telemetry data that supports our mission without affecting the client workloads negatively
Extensive experience managing and troubleshooting HPC environments from an infrastructural perspective meaning everything from local node problems blade chassis intra and interconnect traffic in the topology process job management and the process of tracing backwards to understand what workload affects the cluster in an abnormal way.
Should have previous experience in job scheduling with for example IBM Spectrum LSF used by client Slurm MOAB or equivalent Even though this is out of scope from solution perspective it makes a great difference overall when it comes to manage the HPC infrastructure in a good manner For Capgemini HPC team to engage and work together with client HPC application team this experience and knowledge much preferred At least medium level of computer science and computer architecture knowledge Bonus if person has previous experience working directly with MPI developer teams or did actual programming in HPC area Bonus if person has HPE cluster experience considering most of the client clusters are delivered by HPE Extra bonus if person has experience of hybrid HPC clusters or GPU based cuda clusters knowing the direction of client autonomous cars smart traffic systems