NVIDIA HGX Platform

Accelerating advanced AI in every data center.

Purpose-Built for AI and High-Performance Computing

AI, complex simulations, and massive datasets require multiple GPUs with extremely fast interconnections and a fully accelerated software stack. The NVIDIA HGX™ platform brings together the full power of NVIDIA GPUs, NVIDIA NVLink™, NVIDIA networking, and fully optimized AI and high-performance computing (HPC) software stacks to provide the highest application performance and drive the fastest time to insights for every data center.

Unmatched End-to-End Accelerated Computing Platform

The NVIDIA HGX B300 NVL16 integrates NVIDIA Blackwell Ultra GPUs with high-speed interconnects to propel the data center into a new era of accelerated computing and generative AI. As a premier accelerated scale-up platform with up to 11x more inference performance than the previous generation, NVIDIA Blackwell-based HGX systems are designed for the most demanding generative AI, data analytics, and HPC workloads.

NVIDIA HGX includes advanced networking options—at speeds up to 800 gigabits per second (Gb/s)—using NVIDIA Quantum-X800 InfiniBand and Spectrum™-X Ethernet for the highest AI performance. HGX also includes NVIDIA BlueField®-3 data processing units (DPUs) to enable cloud networking, composable storage, zero-trust security, and GPU compute elasticity in hyperscale AI clouds. 

AI Reasoning Inference: Performance and Versatility

Projected performance subject to change. Token-to-token latency (TTL) = 20ms real time, first token latency (FTL) = 5s, input sequence length = 32,768, output sequence length = 1,028, 8x eight-way HGX H100 GPUs air-cooled vs. 1x HGX B300 NVL16 air-cooled, per GPU performance comparison​; served using disaggregated inference.

Real-Time Large Language Model Inference​

HGX B300 NVL16 achieves up to 11x higher inference performance over the previous NVIDIA Hopper™ generation for models such as Llama 3.1 405B. The second-generation Transformer Engine uses custom Blackwell Tensor Core technology combined with TensorRT™-LLM innovations to accelerate inference for large language models (LLMs).

AI Training: Performance and Scalability

Projected performance subject to change. 8x eight-way HGX H100 vs.1x HGX B300 NVL16, per-GPU performance comparison.

Next-Level Training Performance

The second-generation Transformer Engine, featuring 8-bit floating point (FP8) and new precisions, enables a remarkable 4x faster training for large language models like Llama 3.1 405B. This breakthrough is complemented by fifth-generation NVLink with 1.8 TB/s of GPU-to-GPU interconnect, InfiniBand networking, and NVIDIA Magnum IO™ software. Together, these ensure efficient scalability for enterprises and extensive GPU computing clusters. 

Accelerating HGX With NVIDIA Networking

The data center is the new unit of computing, and networking plays an integral role in scaling application performance across it. Paired with NVIDIA Quantum InfiniBand, HGX delivers world-class performance and efficiency, which ensures the full utilization of computing resources.

For AI cloud data centers that deploy Ethernet, HGX is best used with the NVIDIA Spectrum-X™ networking platform, which powers the highest AI performance over Ethernet. It features Spectrum-X switches and NVIDIA SuperNIC for optimal resource utilization and performance isolation, delivering consistent, predictable outcomes for thousands of simultaneous AI jobs at every scale. Spectrum-X enables advanced cloud multi-tenancy and zero-trust security. As a reference design, NVIDIA has designed Israel-1, a hyperscale generative AI supercomputer built with Dell PowerEdge XE9680 servers based on the NVIDIA HGX 8-GPU platform, BlueField-3 SuperNICs, and Spectrum-4 switches.

NVIDIA HGX Specifications

NVIDIA HGX is available in single baseboards with four or eight Hopper GPUs, eight NVIDIA Blackwell GPUs, or sixteen NVIDIA Blackwell Ultra GPUs. These powerful combinations of hardware and software lay the foundation for unprecedented AI supercomputing performance.

  HGX B300 NVL16 HGX B200
Form Factor 16x NVIDIA Blackwell Ultra GPU 8x NVIDIA Blackwell GPU
FP4 Tensor Core** 144 PFLOPS | 105 PFLOPS 144 PFLOPS | 72 PFLOPS
FP8/FP6 Tensor Core* 72 PFLOPS 72 PFLOPS
INT8 Tensor Core* 2 POPS 72 POPS
FP16/BF16 Tensor Core* 36 PFLOPS 36 PFLOPS
TF32 Tensor Core* 18 PFLOPS 18 PFLOPS
FP32 600 TFLOPS 600 TFLOPS
FP64/FP64 Tensor Core 10 TFLOPS 296 TFLOPS
Total Memory Up to 2.3 TB 1.4 TB
NVLink Fifth generation Fifth generation
NVIDIA NVSwitch™ NVLink 5 Switch NVLink 5 Switch
NVSwitch GPU-to-GPU Bandwidth 1.8 TB/s 1.8 TB/s
Total NVLink Bandwidth 14.4 TB/s 14.4 TB/s
  HGX H200
  4-GPU 8-GPU
Form Factor 4x NVIDIA H200 SXM 8x NVIDIA H200 SXM
FP8 Tensor Core* 16 PFLOPS 32 PFLOPS
INT8 Tensor Core* 16 POPS 32 POPS
FP16/BF16 Tensor Core* 8 PFLOPS 16 PFLOPS
TF32 Tensor Core* 4 PFLOPS 8 PFLOPS
FP32 270 TFLOPS 540 TFLOPS
FP64 140 TFLOPS 270 TFLOPS
FP64 Tensor Core 270 TFLOPS 540 TFLOPS
Total Memory 564 GB HBM3 1.1 TB HBM3
GPU Aggregate Bandwidth 19 GB/s 38 GB/s
NVLink Fourth generation Fourth generation
NVSwitch N/A NVLink 4 Switch
NVSwitch GPU-to-GPU Bandwidth N/A 900 GB/s
Total Aggregate Bandwidth 3.6 TB/s 7.2 TB/s
  HGX H100
  4-GPU 8-GPU
Form Factor 4x NVIDIA H100 SXM 8x NVIDIA H100 SXM
FP8 Tensor Core* 16 PFLOPS 32 PFLOPS
INT8 Tensor Core* 16 POPS 32 POPS
FP16/BF16 Tensor Core* 8 PFLOPS 16 PFLOPS
TF32 Tensor Core* 4 PFLOPS 8 PFLOPS
FP32 270 TFLOPS 540 TFLOPS
FP64 140 TFLOPS 270 TFLOPS
FP64 Tensor Core 270 TFLOPS 540 TFLOPS
Total Memory 320 GB HBM3 640 GB HBM3
GPU Aggregate Bandwidth 13 GB/s 27 GB/s
NVLink Fourth generation Fourth generation
NVSwitch N/A NVLink 4 Switch
NVSwitch GPU-to-GPU Bandwidth N/A 900 GB/s
Total Aggregate Bandwidth 3.6 TB/s 7.2 TB/s

Learn more about the NVIDIA Blackwell architecture.