Compute Architecture
NVIDIA H100 Architecture

H100 Architecture

The NVIDIA H100 GPU, based on the Hopper architecture, builds upon the A100's capabilities with several key advancements: Fourth-generation Tensor Cores: These cores support FP8 precision, enabling even faster training and inference performance for transformer models. FP8 precision allows for reduced memory usage and increased performance while still maintaining accuracy for large language models (LLMs). Transformer Engine: Designed to accelerate the training of large language models (LLMs). This specialized engine optimizes the computations involved in transformer models, leading to significant performance gains. Enhanced sparsity support: Delivers double the math throughput compared to the A100, further improving efficiency. This enhanced support allows for more aggressive pruning of neural networks, reducing the number of computations and improving performance. HBM3 memory: Offers higher bandwidth and capacity compared to HBM2e, enabling faster data processing. This increased memory bandwidth and capacity are crucial for handling the ever-growing size of AI models and datasets.