HPC Vega Architecture
Below is a table summarizing the type and quantity of the major hardware components of the proposed solution for the Vega system:
Computing
GPU partition
Category |
Component |
Quantity |
Description |
Infrastructure |
Rack |
2 |
XH2000 DLC rack with PSUs, HYC and IB HDR switches |
Compute |
GPU node |
60 |
4x Nvidia A100, 2x AMD Rome 7H12, 512 GB RAM, 2x HDR dual port mezzanine, 1x 1.92TB M.2 SSD |
CPU partition
Category |
Component |
Quantity |
Description |
Infrastructure |
Rack |
10 |
XH2000 DLC rack with PSUs, HYC and IB HDR switches |
Compute |
CPU node Standard |
768 |
256x blades of 3 compute nodes (2x AMD Rome 7H12 (64c, 2.6GHz, 280W) 256GB RAM 1x HDR100 single port mezzanine 1x 1.92TB M.2 SSD) |
Compute |
CPU node Large Memory |
192 |
64x blades of 3 compute nodes (2x AMD Rome (64c, 2.6GHz, 280W) 1TB RAM 1x HDR100 single port mezzanine 1x 1.92TB M.2 SSD) |
Storage
Category |
Component |
Quantity |
Description |
Storage |
Flash-based building block |
10 |
2U ES400NVX (per device: 23x 6.4 TB NVMe, 8 InfiniBand HDR100, 4 embedded Lustre VMs, 1 OST and MDT per VM). |
LCST - Large Capacity Storage tier
Category |
Component |
Quantity |
Description |
Storage |
Storage node |
61 |
Supermicro SuperStorage 6029P-E1CR24L with 2x Intel Xeon Silver 421R, 12c, 2.4GHz, 100W, 256GB RAM DDR4 RDIMM 2933MT/s, 1x 240GB SSD, 2x 6.4TB NVMe, 24x 16TB HDD, 2x 25GbE Mellanox ConnectX-4 DP, 1x 1GbE IPMI |
Internal Ceph Network |
Ethernet switch |
8 |
Mellanox SN2010. Per Switch: 18x 25GbE + 4x 100GbE ports |
Login and Virtualization
Category |
Component |
Quantity |
Description |
CPU login |
Login nodes |
4 |
Atos BullSequana X430-A5 with 2x AMD EPYC 7H12, 256GB RAM DDR4 3200MT/s, 2x 7.6TB U.2 SSD, 1x 100GbE DP ConnectX5, 1x 100Gb IB HDR ConnectX-6 SP |
GPU login |
Login nodes |
4 |
Atos BullSequana X430-A5 with 1x NVIDIA Ampere A100 PCIe GPU and 2x AMD EPYC 7452 (32c, 2.35GHz, 155W), 256GB RAM DDR4 3200MT/s, 2x 7.6TB U.2 SSD, 1x 100GbE DP ConnectX5, 1x 100Gb IB HDR ConnectX-6 SP |
Service |
Virtualization/Service nodes |
30 |
Atos BullSequana X430-A5 with 2x AMD EPYC 7502 (32c, 2.5GHZ, 180W) 512GB RAM DDR4 3200MT/s, 2x 7.6TB U.2 SSD, 1x 100GbE DP ConnectX5, 1x 100Gb IB HDR ConnectX-6 SP |
Network and Interconnect Infrastructure
Category |
Component |
Quantity |
Description |
Interconnect Network |
IB switch |
68 |
40-port Mellanox HDR swich, Dragonfly+ topology |
Interconnect Connections |
IB HDR100/200 ports on IB card |
1230 |
960 Compute, 60 (x2) GPU, 8 Login, 30 Virtualization, 10 (x8) HCST and 8 (x4) Skyway Gateways with Mellanox ConnectX-6 (single or dual port) |
IPoIB Gateway |
IB/Ethernet Data Gateway |
4 |
Mellanox Skyway IB to Ethernet Gateway Appliance (per gateway: 8x IB and 8x 100GbE ports) |
Ethernet Data Network |
Top-Level Switches |
2 |
Cisco Nexus N3K – C3408-S, 192 ports 100GE activated |
WAN Connectivity |
IP Routers |
2 |
Cisco Nexus N3K – C3636C-R, 5x 100GbE to WAN (provided end of 2021) |
Top Management Network |
10GbE switch |
2 |
Mellanox 2410 switches (per switch 48x 10GbE ports) |
In/Out of Band Management Network |
1GbE switch |
4 |
Mellanox 4610 switches (per switch 48x1GbE + 2x 10GbE ports) |
Rack Management Network |
WELB switch |
24 |
Two per rack integrated switches WELB (sWitch Ethernet Leaf Board) with three 24-port Ethernet switch instances and one Ethernet Management Controller (EMC) |
