About Brian Smith

This author has not yet filled in any details.
So far Brian Smith has created 5 blog entries.

Double, double packet trouble: too many NICs, too many problems in three Acts

Act 1 - The Problem Here be known our Complicating Incident. Look at that fancy new server just installed in the rack with 8x HDR 200Gb InfiniBand HCAs on it. WOW. That is going to make your data scientist stakeholders very happy. Linux is installed and it's just dying to consume input. This particular node [...]

2023-08-04T14:20:56-05:00August 3rd, 2023|HPC Blog|Comments Off on Double, double packet trouble: too many NICs, too many problems in three Acts

“ib_srp REJ reason 0x3” and you

Most HPC administrators have had to cut their teeth on SCSI RDMA Protocol (SRP) at some point. And most of them have torn their hair out getting SRP to work. SRP is a fine remote block storage protocol that is one of the rocks HPC was built opon -- if a little dated. Troubleshooting [...]

2023-06-17T01:48:59-05:00June 13th, 2023|HPC Blog|Comments Off on “ib_srp REJ reason 0x3” and you

AcuSolve and Scale-Out Computing on AWS (SOCA) – Price and Performance

AWS provides an incredible array of instance type and storage options for HPC workloads. Our previous post discussed the AWS options that System Fabric Works (SFW) chose for Altair AcuSolve™  workloads on Scale-Out Computing for AWS. This entry presents the job turn-around time and price-per-run results found by the SFW study. SFW decided to use [...]

2021-06-16T18:02:07-05:00April 20th, 2020|HPC Blog|0 Comments

AcuSolve and Scale-Out Computing on AWS (SOCA)

Our previous post discussed Scale-Out Computing on AWS, a turn-key cluster solution for executing HPC workloads. This entry reports on a System Fabric Works (SFW) study that  compares job turn-around time (performance) and job price-per-run for a real-world HPC workload on SOCA. In determining a real-world HPC workload for our SOCA cluster, SFW turned to [...]

2021-06-16T18:02:07-05:00April 20th, 2020|HPC Blog|0 Comments

Scale-Out Computing on AWS (SOCA)

AWS recently announced its Scale-Out Computing on AWS (SOCA) platform, an open source, cloud-based cluster and workload management solution that brings cost-effective, high-performance computing (HPC) capabilities to virtually anyone. System Fabric Works (SFW) recently employed SOCA to execute HPC workloads on AWS and would like to share our experience with the community. The SOCA platform [...]

2021-06-16T18:02:07-05:00April 17th, 2020|HPC Blog|0 Comments
Go to Top