A Brief Comparison of NVIDIA A100, H100, L40S, and H200: GPUs in the Data Center Market

In 2024, in the data center market, NVIDIA graphics cards continue to be in high demand and scarce. The released A100, H100, L40S, and the upcoming H200 are all highly sought after in the market.

In 2020, NVIDIA launched the A100 based on the Ampere architecture. In 2022, they released the H100 based on the Hopper architecture, followed by the L40S in 2023.

In 2024, NVIDIA is set to release the H200, although it hasn't been officially launched yet, some specifications have already been made public. As a result, the following table has been created.

Project	A100	H100	L40S	H200
Architecture	Ampere	Hopper	Ada Lovelace	Hopper
Release Time	2020	2022	2023	2024
FP64	9.7 TFLOPS	34 TFLOPS	None	34 TFLOPS
FP64 Tensor Core	19.5 TFLOPS	67 TFLOPS	None	67 TFLOPS
FP32	19.5 TFLOPS	67 TFLOPS	91.6 TFLOPS	67 TFLOPS
FP32 Tensor Core	312 TFLOPS	989 TFLOPS	183 \| 366* TFLOPS	989 TFLOPS*
BFLOAT16 Tensor Core	624 TFLOPS	1979 TFLOPS	362.05 \| 733* TFLOPS	1979 TFLOPS*
FP16 Tensor Core	624 TFLOPS	1979 TFLOPS	362.05 \| 733* TFLOPS	1979 TFLOPS*
FP8 Tensor Core	Not applicable	3958 TFLOPS	733 \| 1466* TFLOPS	3958 TFLOPS
INT8 Tensor Core	1248 TOPS	3958 TOPS	733 \| 1466* TFLOPS	3958 TFLOPS
INT4 Tensor Core	None	None	733 \| 1466* TFLOPS	Data not available
GPU Memory	80 GB HBM2e	80 GB	48GB GDDR6，With ECC	141 GB HBM3e
GPU Memory Bandwidth	2039 Gbps	3.35 Tbps	864 Gbps	4.8 Tbps
Decoder	Not applicable	7 NVDEC 7 JPEG	Not applicable	7 NVDEC 7 JPEG
Highest TDP	400W	700W	350W	700W
Multi-instance GPU	Up to 7 MIGs @ 10 GB	Up to 7 MIGs @ 10 GB	None	Up to 7 MIGs @ 16.5 GB
Dimensions	SXM	SXM	4.4 " (H) x 10.5”(L), dual slot	SXM**
Internet technology	NVLink: 600 GB/s PCle Gen4: 64 GB/s	NVLink: 900 GB/s PCle Gen4: 128 GB/s	PCle Gen4 x16: 64GB/s bidi rectional	NVIDIA NVLink®: 900GB/s PCle Gen5: 128GB/s
Server Platform Options	NVIDIA HGX"" A100 and NVlDlA-Certified Systms with 4,8,or 16 GPUS NVIDIA DGX™ A100 with 8 GPU5	NVIDIA HGX H100 and NVIDlA-Certified Systems"m with 4 or 8 GPUs NVIDIA DGX H100 with 8 GPUs	None	NVIDIA HGX™ H200 and NVIDlA-Certified Systemsm with 4 or 8 GPUs
NVIDIA AI Enterprise	Included	Add-on	None	Add-on
Number of CUDA cores	6912	16986	18176	None

A100

The A100, introduced in 2020, marked the first utilization of the Ampere architecture for GPUs, bringing significant performance improvements.

Before the release of the H100, the A100 outshined all other GPUs. Its performance gains were attributed to enhanced Tensor cores, increased CUDA core count, improved memory, and the fastest 2 Tbps memory bandwidth.

A100 Up to 3X Higher Al Training on Largest Models

The A100 supports Multi-Instance GPU (MIG) functionality, which allows a single A100 GPU to be partitioned into multiple independent smaller GPUs, greatly improving resource allocation efficiency in cloud and data center environments.

Although surpassed by newer models, the A100 remains an excellent choice for training complex neural networks, deep learning, and AI workloads, thanks to its Tensor cores and high throughput, which deliver outstanding performance in these domains.

The A100 excels in AI inference tasks and demonstrates advantages in various applications such as speech recognition, image classification, recommendation systems, data analysis, big data processing, scientific computing, as well as high-performance computing scenarios like genomic sequencing and drug discovery.

H100

The H100 is capable of handling the most challenging AI workloads and large-scale data processing tasks.

The H100 features upgraded Tensor cores, resulting in a significant improvement in the speed of AI training and inference. It supports computations in double precision (FP64), single precision (FP32), half precision (FP16), and integer (INT8) formats.

H100 Up to 4X Higher Al Training on GPT-3

Compared to the A100, the H100 offers a six-fold increase in FP8 computation speed, achieving up to 4 petaflops. It also boasts a 50% increase in memory capacity, utilizing HBM3 high-bandwidth memory with a bandwidth of up to 3 Tbps. The external connectivity speed is nearly 5 Tbps. Additionally, the new Transformer engine enhances model transformer training speed by up to six times.

H100 to A100 Comparison -Relative Performance

While the H100 and A100 share similarities in usage scenarios and performance characteristics, the H100 outperforms the A100 in handling large-scale AI models and more complex scientific simulations. The H100 is a superior choice for real-time responsive AI applications such as advanced conversational AI and real-time translation.

In summary, the H100 offers significant performance improvements compared to the A100 in terms of AI training and inference speed, memory capacity and bandwidth, as well as processing large and complex AI models. It is suitable for AI and scientific simulation tasks that demand higher performance.

L40S

The L40S is designed to handle next-generation data center workloads, including generative AI, large-scale language model (LLM) inference and training, 3D graphics rendering, scientific simulations, and more.

Compared to previous-generation GPUs like the A100 and H100, the L40S offers up to a 5x improvement in inference performance and a 2x improvement in real-time ray tracing (RT) performance.

In terms of memory, it is equipped with 48GB of GDDR6 memory and includes support for ECC, which is crucial for maintaining data integrity in high-performance computing environments.

The L40S features over 18,000 CUDA cores, which are parallel processors essential for handling complex computational tasks.

While the H100 focuses more on decoding, the L40S places greater emphasis on visualization and encoding capabilities. Despite being slower than the H100, the L40S is relatively more accessible in terms of availability and price in the market.

In conclusion, the L40S offers significant advantages in handling complex and high-performance computing tasks, particularly in the fields of generative AI and large-scale language model training. Its efficient inference performance and real-time ray tracing capabilities make it a compelling option for data centers.

H200

The H200 will be the latest addition to the NVIDIA GPU series and is expected to start shipping in the second quarter of 2024.

H200 Up to 2X the LLM Inference Performance

The H200 is the first GPU to offer 141 GB of HBM3e memory and a bandwidth of 4.8 Tbps, which is nearly twice the memory capacity and 1.4 times the bandwidth of the H100.

HGX H200 4-GPU vs Dual x86 Relative Performance

In terms of high-performance computing, the H200 achieves up to 110 times acceleration compared to CPUs, resulting in faster results.

When handling Llama2 70B inference tasks, the H200 demonstrates twice the inference speed of the H100 GPU.

The H200 will play a key role in edge computing and Internet of Things (IoT) applications, specifically in the domain of Artificial Intelligence of Things (AIoT).

Expect the H200 to deliver the highest GPU performance in applications such as training and inference of the largest models (exceeding 17.5 billion parameters), generative AI, and high-performance computing.

In summary, the H200 will provide unprecedented performance in the fields of AI and high-performance computing, particularly in handling large-scale models and complex tasks. Its high memory capacity and bandwidth, along with exceptional inference speed, make it an ideal choice for processing cutting-edge AI workloads.

NADDOD: Leading Provider of High-Quality NVIDIA GPU Interconnect Solutions

NADDOD is a leading provider dedicated to delivering high-quality interconnect solutions for NVIDIA GPUs. We specialize in offering high-performance, high-speed data transfer, and reliable interconnect solutions to meet the growing computational demands.

Our products support various optical modules, including InfiniBand, Ethernet 800G/400G/200G/100G, as well as AOC and DAC technologies. These advanced interconnect products enable NVIDIA GPUs to achieve faster and more reliable data transfer, providing users with exceptional performance and flexibility.

Whether in data centers, high-performance computing, or other fields, NADDOD's interconnect solutions cater to the needs of our customers. We are committed to continuous innovation and technological advancement, constantly improving the performance and quality of our products to ensure the best user experience.

By choosing NADDOD, you gain access to high-quality interconnect solutions for NVIDIA GPUs and professional technical support. We work closely with you to provide customized solutions that meet your specific requirements.

In addition to offering third-party high-quality optical modules, we also stock a wide range of original NVIDIA products, providing you with more options. Contact us now to learn more details!

Naddod - Your trusted supplier of optical modules and high-speed cables!

A brief comparison of NVIDIA A100 H100 L40S and H200

A100

H100

L40S

H200

NADDOD: Leading Provider of High-Quality NVIDIA GPU Interconnect Solutions