A brief comparison of NVIDIA A100 H100 L40S and H200

NADDOD Gavin InfiniBand Network Engineer Feb 19, 2024

In 2024, in the data center market, NVIDIA graphics cards continue to be in high demand and scarce. The released A100, H100, L40S, and the upcoming H200 are all highly sought after in the market.

 

In 2020, NVIDIA launched the A100 based on the Ampere architecture. In 2022, they released the H100 based on the Hopper architecture, followed by the L40S in 2023.

 

In 2024, NVIDIA is set to release the H200, although it hasn't been officially launched yet, some specifications have already been made public. As a result, the following table has been created.

Project A100 H100 L40S H200
Architecture Ampere Hopper Ada Lovelace Hopper
Release Time 2020 2022 2023 2024
FP64 9.7 TFLOPS 34 TFLOPS None 34 TFLOPS
FP64 Tensor Core 19.5 TFLOPS 67 TFLOPS  None 67 TFLOPS
FP32 19.5 TFLOPS 67 TFLOPS 91.6 TFLOPS 67 TFLOPS
FP32 Tensor Core 312 TFLOPS 989 TFLOPS 183 | 366* TFLOPS 989 TFLOPS*
BFLOAT16 Tensor Core 624 TFLOPS 1979 TFLOPS 362.05 | 733* TFLOPS 1979 TFLOPS*
FP16 Tensor Core 624 TFLOPS 1979 TFLOPS 362.05 | 733* TFLOPS 1979 TFLOPS*
FP8 Tensor Core Not applicable 3958 TFLOPS 733 | 1466* TFLOPS 3958 TFLOPS
INT8 Tensor Core 1248 TOPS 3958 TOPS 733 | 1466* TFLOPS 3958 TFLOPS
INT4 Tensor Core None None 733 | 1466* TFLOPS Data not available
GPU Memory 80 GB HBM2e 80 GB 48GB GDDR6,With ECC 141 GB HBM3e
GPU Memory Bandwidth 2039 Gbps 3.35 Tbps 864 Gbps 4.8 Tbps
Decoder Not applicable 7 NVDEC 7 JPEG Not applicable 7 NVDEC 7 JPEG
Highest TDP 400W 700W 350W 700W
Multi-instance GPU Up to 7 MIGs @ 10 GB Up to 7 MIGs @ 10 GB None Up to 7 MIGs @ 16.5 GB
Dimensions SXM SXM 4.4 " (H) x 10.5”(L), dual slot SXM**
Internet technology

NVLink: 600 GB/s


PCle Gen4: 64 GB/s

NVLink: 900 GB/s


PCle Gen4: 128 GB/s

PCle Gen4 x16: 64GB/s bidi rectional

NVIDIA NVLink®: 900GB/s

 

PCle Gen5: 128GB/s

Server Platform Options NVIDIA HGX"" A100 and NVlDlA-Certified Systms with 4,8,or 16 GPUS NVIDIA DGX™ A100 with 8 GPU5 NVIDIA HGX H100 and NVIDlA-Certified Systems"m with 4 or 8 GPUs NVIDIA DGX H100 with 8 GPUs None NVIDIA HGX™ H200 and NVIDlA-Certified Systemsm with 4 or 8 GPUs
NVIDIA AI Enterprise Included Add-on None Add-on
Number of CUDA cores 6912 16986 18176 None

A100

 

A100

The A100, introduced in 2020, marked the first utilization of the Ampere architecture for GPUs, bringing significant performance improvements.

 

Before the release of the H100, the A100 outshined all other GPUs. Its performance gains were attributed to enhanced Tensor cores, increased CUDA core count, improved memory, and the fastest 2 Tbps memory bandwidth.

 

A100 Up to 3X Higher Al Training on Largest Models

The A100 supports Multi-Instance GPU (MIG) functionality, which allows a single A100 GPU to be partitioned into multiple independent smaller GPUs, greatly improving resource allocation efficiency in cloud and data center environments.

 

Although surpassed by newer models, the A100 remains an excellent choice for training complex neural networks, deep learning, and AI workloads, thanks to its Tensor cores and high throughput, which deliver outstanding performance in these domains.

 

The A100 excels in AI inference tasks and demonstrates advantages in various applications such as speech recognition, image classification, recommendation systems, data analysis, big data processing, scientific computing, as well as high-performance computing scenarios like genomic sequencing and drug discovery.

H100

 

H100

The H100 is capable of handling the most challenging AI workloads and large-scale data processing tasks.

 

The H100 features upgraded Tensor cores, resulting in a significant improvement in the speed of AI training and inference. It supports computations in double precision (FP64), single precision (FP32), half precision (FP16), and integer (INT8) formats.

 

H100 Up to 4X Higher Al Training on GPT-3

Compared to the A100, the H100 offers a six-fold increase in FP8 computation speed, achieving up to 4 petaflops. It also boasts a 50% increase in memory capacity, utilizing HBM3 high-bandwidth memory with a bandwidth of up to 3 Tbps. The external connectivity speed is nearly 5 Tbps. Additionally, the new Transformer engine enhances model transformer training speed by up to six times.

 

H100 to A100 Comparison -Relative Performance

While the H100 and A100 share similarities in usage scenarios and performance characteristics, the H100 outperforms the A100 in handling large-scale AI models and more complex scientific simulations. The H100 is a superior choice for real-time responsive AI applications such as advanced conversational AI and real-time translation.

 

In summary, the H100 offers significant performance improvements compared to the A100 in terms of AI training and inference speed, memory capacity and bandwidth, as well as processing large and complex AI models. It is suitable for AI and scientific simulation tasks that demand higher performance.

L40S

 

L40S

The L40S is designed to handle next-generation data center workloads, including generative AI, large-scale language model (LLM) inference and training, 3D graphics rendering, scientific simulations, and more.

 

Compared to previous-generation GPUs like the A100 and H100, the L40S offers up to a 5x improvement in inference performance and a 2x improvement in real-time ray tracing (RT) performance.

 

In terms of memory, it is equipped with 48GB of GDDR6 memory and includes support for ECC, which is crucial for maintaining data integrity in high-performance computing environments.

 

The L40S features over 18,000 CUDA cores, which are parallel processors essential for handling complex computational tasks.

 

While the H100 focuses more on decoding, the L40S places greater emphasis on visualization and encoding capabilities. Despite being slower than the H100, the L40S is relatively more accessible in terms of availability and price in the market.

 

In conclusion, the L40S offers significant advantages in handling complex and high-performance computing tasks, particularly in the fields of generative AI and large-scale language model training. Its efficient inference performance and real-time ray tracing capabilities make it a compelling option for data centers.

H200

 

H200

The H200 will be the latest addition to the NVIDIA GPU series and is expected to start shipping in the second quarter of 2024.

 

H200 Up to 2X the LLM Inference Performance

The H200 is the first GPU to offer 141 GB of HBM3e memory and a bandwidth of 4.8 Tbps, which is nearly twice the memory capacity and 1.4 times the bandwidth of the H100.

 

HGX H200 4-GPU vs Dual x86 Relative Performance

In terms of high-performance computing, the H200 achieves up to 110 times acceleration compared to CPUs, resulting in faster results.

 

When handling Llama2 70B inference tasks, the H200 demonstrates twice the inference speed of the H100 GPU.

 

The H200 will play a key role in edge computing and Internet of Things (IoT) applications, specifically in the domain of Artificial Intelligence of Things (AIoT).

 

Expect the H200 to deliver the highest GPU performance in applications such as training and inference of the largest models (exceeding 17.5 billion parameters), generative AI, and high-performance computing.

 

In summary, the H200 will provide unprecedented performance in the fields of AI and high-performance computing, particularly in handling large-scale models and complex tasks. Its high memory capacity and bandwidth, along with exceptional inference speed, make it an ideal choice for processing cutting-edge AI workloads.

NADDOD: Leading Provider of High-Quality NVIDIA GPU Interconnect Solutions

 

NADDOD InfiniBand NDR Product

NADDOD is a leading provider dedicated to delivering high-quality interconnect solutions for NVIDIA GPUs. We specialize in offering high-performance, high-speed data transfer, and reliable interconnect solutions to meet the growing computational demands.

 

Our products support various optical modules, including InfiniBand, Ethernet 800G/400G/200G/100G, as well as AOC and DAC technologies. These advanced interconnect products enable NVIDIA GPUs to achieve faster and more reliable data transfer, providing users with exceptional performance and flexibility.

 

Whether in data centers, high-performance computing, or other fields, NADDOD's interconnect solutions cater to the needs of our customers. We are committed to continuous innovation and technological advancement, constantly improving the performance and quality of our products to ensure the best user experience.

 

By choosing NADDOD, you gain access to high-quality interconnect solutions for NVIDIA GPUs and professional technical support. We work closely with you to provide customized solutions that meet your specific requirements.

 

In addition to offering third-party high-quality optical modules, we also stock a wide range of original NVIDIA products, providing you with more options. Contact us now to learn more details!

 

Naddod - Your trusted supplier of optical modules and high-speed cables!