AWS recently announced its Scale-Out Computing on AWS (SOCA) platform, an open source, cloud-based cluster and workload management solution that brings cost-effective, high-performance computing (HPC) capabilities to virtually anyone. System Fabric Works (SFW) recently employed SOCA to execute HPC workloads on AWS and would like to share our experience with the community.
The SOCA platform creates a standalone, turn-key HPC cluster that is 100% integrated with AWS. Two key SOCA features are its access to AWS’ vast capacity of resources and that these resources are instantiated upon demand. Put another way, the SOCA cluster has “unlimited” hosts and “unlimited” storage, incurring charges only while allocated for a workload. This results in much lower queue wait time and comparative TCO than can be achieved by most on-prem solutions. All AWS lanes are open, no waiting.
In addition to its pay for what you use, vast capacity, SOCA provides the PBS open source workload manager, a web-based GUI for user and remote desktop management, on-demand remote desktops with optional GPUs, storage management (EFS, EBS and FSx for Lustre) and job accounting in ElasticSearch/Kibana.
The currently supported cluster node operating systems are AmazonLinux2, RedHat Enterprise Linux 7 and CentOS 7. Instance types run the gamut from the spritely t2.micro (1 vCPU, 1 GB RAM) to the mighty x1e.32xlarge (64 CPU, 3904 GB RAM).
The SOCA platform, itself, is free of charge. Base cost for a SOCA cluster is about $75 / month, which is the charge for an always-on m5.large head node. That overhead can be dramatically reduced by shutting down the head node when not in use. Of course, launching jobs on the cluster will incur additional EC2 and other AWS charges.
These capabilities provide a cost-effective, highly-scalable, state-of-the-art HPC cluster that fully takes advantage of the capacity and services of AWS. While SOCA does configure the cluster and services out-of-the-box, good knowledge of AWS fundamentals is required for the administrator. Normal cluster users need no AWS experience and interact with the system through ssh, PBS and/or the optional remote desktops.
All in all, SOCA should be on the radar for anyone looking to extend their organization’s HPC into the cloud, or even to begin an HPC investigation without the up-front cost of on-prem solutions. System Fabric Works has extensive experience with SOCA and can help you on the path to HPC in the cloud.