Graviton3 performance
Since AWS announced Graviton3 at the end of last year, we had been eagerly looking forward to test driving it. After three weeks, the conclusion is that Graviton3 takes Arm microarchitecture in the cloud to a new level in regard to HPC performance and is now able to compete mano-a-mano against x86_64 processors.
Before discussing some of our preliminary results, a few introductory remarks about the processor itself follow: Graviton3 uses Arm Neoverse-V1 microarchitecture running on ARMv8 ISA (not on ARMv9 as speculated in some discussion groups). It is built with 5 nm technology and runs at 2.6 GHz, which a very slight increase versus Graviton2 (2.5 GHz). It uses DDR5 memory & PCI-Express 5.0 while also doubling the capacity of double-precision instructions per cycle (DP-IPC) versus its predecessor, which theoretically could boost performance for HPC apps by 100%. Further technical details are discussed at https://perspectives.mvdirona.com/2022/05/graviton3-ec2-c7g-general-availability/ by one of the AWS Team members.
The following graphs reflect performance on a c7g.16xlarge instance, which is similar to the widely tested c6g.16xlarge instance but with the upgraded processor. The results also include measurements with hpc6a.48xlarge (AMD EPYC3) and c6i.32xlarge (Intel Ice Lake) instances for comparison purposes. The first measurements focus on memory bandwidth and the HPCG benchmark, which is predominantly memory bound. Then, the discussion turns to app performance.
Memory performance
WRF & CMAQ performance
Air quality modeling is now available at AWS and Azure
The latest versions of CMAQ, WRF-CMAQ and CAMx are now available on images from the AWS and Azure Marketplaces. These apps come precompiled and optimized for different processors and architectures (x86_64 and AArch64). The images are available to any organization or person with valid AWS or Azure accounts. CMAQ includes the ISAM and DDM3D models in addition to the standard compilation. The images also incorporate several pre and postprocessing tools such as NCL, SMOKE, IDV or VERDI among others.
We have performed several benchmarks with the images to evaluate performance with different public cloud hardware. For CMAQ, the first benchmark is the standard U.S. Southeast (2016) benchmark with a domain size of 100x80x35 and 218 tracked species. The figure shows wall (computational) times from the benchmark versus the number of cores for different AWS/Azure options and the EPA cluster.
The results with the new AMD EPYC3 processor (codenamed “Milan”) are particularly impressive as wall times are below 200 s for single instances.
In addition to the stand-alone CMAQ, we have also benchmarked a similar case using WRF-CMAQ with short-wave feedback. The computational times are approximately five times the previous measurements as the figure shows:
When compared versus the EPA cluster, the gains for WRF-CMAQ with cloud hardware are also very significant. Another CMAQ benchmark that is sometimes used to assess large system covers the continental U.S. (CONUS1) with a 499x299x35 grid and 219 tracked species. The figure shows the results for the CONUS1 benchmark with AWS IaaS.
Updated WRF benchmarks
AWS introduced Graviton3 at re:Invent 2021
During the course of re:Invent (2021) in Las Vegas, AWS CEO Adam Selipsky announced the future availability of AWS’s new Graviton3 processor. This announcement came two years after the launch of Graviton2 and in the middle of the battle for server processor supremacy traditionally dominated by the x86_64 architecture. Graviton3 is based on an AArch64 (arm) architecture and will power the first family of 7th generation EC2 instances. The new processor promises to boost almost every aspect of performance versus Graviton2, and to position itself as the strongest competitor to the new AMD and Intel server processors, particularly as a more economical option.
As usual AWS has been tight-lipped discussing this project and there is only so much that we know about the new processor. In a similar fashion to Graviton2, it has been developed by Annapurna Labs using 5 nm technology and comes in a single socket with 64 cores running at 2.6 GHz, which means a very slight increase versus Graviton2 (2.5 GHz). For HPC apps, critical upgrades include doubling the capacity of double-precision operations per cycle, using PCI-Express 5.0, and being the first processor to include DDR5 memory, which has about 50 percent more bandwidth than the DDR4 memory commonly used by the older generation of server processors. Other technical enhancements seem to be the upgrade to ARMv8.5 ISA and doubling the L3 cache. Some initial quantifications estimate a performance improvement close to 60% in double-precision operations, which bodes well for HPC apps.
At the time of this newsletter, AWS still has not communicated us our preview window, but we will provide an update as soon as we have tested the new instances.