In 2013, Novartis successfully completed a cancer drug project in AWS. The pharma giant leased 10,000 EC2 instances with about 87,000 compute cores for 9 hours at a disclosed cost of approximately $4,200. They estimated that the cost to purchase the equivalent hardware on-prem and associated expenses required to complete the same tasks would have been approximately $40M.

Clearly, High Performance Computing, or HPC, in the cloud is a game changer. It reduces capex, computing time, and provides a level playing field for all – you don’t have to make a huge investment on infrastructure. Yet, after all these years, cloud HPC hasn’t taken off as one would expect. The reasons for the lack of popularity of HPC in the cloud are many, but one big deterrent is storage.

Currently available AWS and Azure services have throughput, capacity, pricing or cross-platform compatibility issues that make them less than adequate for cloud HPC workloads. For instance, AWS EFS requires a large minimum file system size to offer adequate throughput for HPC workloads. AWS EBS is a raw block device with a 16TB limit, and requires an EC2 compute to front. AWS FsX for Lustre and Windows has similar issues and EBS and EFS.

The Azure Ultra SSD is still in preview. It supports only Windows Server and RHEL currently, and is likely to be expensive too. Azure Premium Files, still in preview, have a 100TB share capacity that could be restrictive for some HPC workloads. Still, Microsoft promises 5GiB per share throughput with burstable IOPS to 100,000 per share with capacity up to 100TB per share.

 

Making Cloud HPC storage work

For effective High Performance Computing in the cloud, it is necessary to have predictable functioning. All components of the solution (Compute, Network, Storage) have to be the fastest available to optimize the workload and leverage the massive parallel processing power available in the cloud. Burstable storage is not suitable – withdrawal of any resources will cause the process to fail.

With SoftNAS cloud, dedicated resources with predictable and reliable functioning become available in a single comprehensive solution. There’s no need to purchase or integrate separate software and configure it. This translates to an ability to rapidly deploy the solution from the marketplace. You can have SoftNAS up and running in an hour from the marketplace.

The completeness of the solution also makes it easy to scale. As a business, you can select the compute and title storage needed for your NAS and scale up as the entire cloud NAS as your needs increase.

Greater customization can be made to suit the specific needs of your business by choosing the type of drive needed, and choose between CIFs and NFS sharing with high availability.

 

HPC in the cloud – A use case

SoftNAS has worked with clients to implement cloud HPC. In one case, a leading oil and gas corporation commissioned us to identify the fastest throughput performance achievable with a single SoftNAS instance in Azure, in order to facilitate migration of their internal E&P application suite.

The suite was being run on-prem using NetApp SAN and HP Proliant current-gen blade servers, and remote customers connected to Hyper-V clusters running GPU-enabled virtual desktops.

Our team ascertained the required speeds for HPC in the cloud as:

  • Sustained write speeds of 500MBps to single CIFS share
  • Sustained read speeds of 800MBps from a single CIFS share

We started the POC using an Azure E64s_v3 VM with 5 x P30 Premium disks in RAID0 pool configurations. Azure Accelerated Networking was enabled. The initial test workstation was NV6s_v2 (GPU enabled).
hpc in the cloud

The top speeds achieved in this configuration were:
hpc in the cloud speed results

As we did not achieve the desired write throughput, we began testing faster instance types. The fastest performance we were able to achieve was on a LS64s_v2 Storage Optimized VM:
hpc in the cloud sizes

Test results for the LS64s_v2:
hpc in the cloud speed results

 

HPC in the Cloud PoC – our learnings

  • While the throughput performance criteria were achieved, the LS64s_v2 bundled nVME disks are ephemeral, not persistent. In addition, the pool cannot be expanded with additional nVME disks, just SSD. These factors eliminate this instance type from consideration.
  • Enabling Accelerated Networking on any/all VMs within an Azure solution is critical to achieve the fastest performance possible.
  • It appears that Azure Ultra SSDs could be the fastest storage product in any Cloud. These are currently available only in beta in a single Azure region/AZ and cannot be tested with Marketplace VMs as of time of publishing. On Windows 2016 VMs, we achieved 1.4GBps write throughput on a DS_v3 VM as part of the Ultra SSD preview program.
  • When testing the performance of SoftNAS with client machines, it is important that the test machines have network throughput capacity equal or greater to the SoftNAS VM and that accelerated networking is enabled.
  • On pools comprised of nVME disks, adding a ZIL or read cache of mirrored premium SSD drives actually slows performance.

 

Achieving Cloud HPC Success

SoftNAS is committed to leading the market as a provider of the fastest Cloud storage platform available. To meet this goal, our team has a game plan.

  • Testing/benchmarking the fastest EC2s and Azure VMs (ex. i3.16xlarge, i3.metal etc.) with the fastest disks.
  • Fast adoption of new Cloud storage technologies (ex. Azure Ultra SSD)
  • For every POC, production deployment, or internal test of SoftNAS, measure the throughput and IOPS, and document the instance & pool configurations. This info needs to be accessible to our team so we can match configurations to required performance.