Article
The NVIDIA Blackwell Chip Behind Next-Gen AI Computing
The new NVIDIA Blackwell AI chip, introduced during its GTC 2024 keynote earlier this year, represents a monumental leap in AI hardware capabilities. Developed to meet the intensive computational requirements of modern artificial intelligence applications, Blackwell is more than just a chip; it is a next-generation GPU (Graphics Processing Unit) architecture tailored to support large language models (LLMs) and data-heavy AI workloads.
“Over the past three decades, we have pursued accelerated computing to enable transformative breakthroughs in AI and deep learning,” said Jensen Huang, founder and CEO of NVIDIA, announcing Blackwell. “Generative AI is the defining technology of our time. Blackwell is the engine for this new industrial revolution, enabling AI innovation across every industry.”
This architecture advances NVIDIA’s capabilities beyond its previous Hopper architecture, with notable improvements in efficiency, power, and adaptability for AI-driven environments.
Blackwell’s revolutionary design and capabilities
The Blackwell architecture introduces a complete redesign aimed at boosting AI workload performance. It is a sophisticated GPU-based system-on-chip (SoC) crafted to process demanding AI tasks such as deep learning, natural language processing (NLP), and real-time image recognition. Blackwell’s flexibility makes it suitable for both hyperscale data centres and edge applications. The architecture’s robust design includes:
- High Bandwidth Memory (HBM): This enhanced memory subsystem supports rapid data transfer between the processor and memory, crucial for large AI models that require instant access to vast datasets. Thus, it improves both training and inference speeds.
- Advanced Interconnect Network: Blackwell’s high-speed interconnect network optimises data flow within the chip, significantly reducing latency and addressing bottlenecks for high-demand AI tasks.
- Optimised Tensor Cores: Blackwell’s specialised tensor cores for matrix operations—essential to deep learning—can handle complex computations more efficiently, making it ideal for machine learning applications.
- Power Management: Given the substantial energy demands of AI, Blackwell incorporates advanced power management features to reduce power consumption, enabling sustainable operation in data centres and edge devices.
Cutting-edge features and technologies driving AI innovation
The Blackwell architecture introduces six key innovations that support accelerated AI training and real-time LLM inference for models with up to 10 trillion parameters:
- Custom TSMC 4NP process: The Blackwell GPU architecture is built on a custom TSMC 4NP process, incorporating 208 billion transistors connected via a 10TB/second chip-to-chip link. The two integrated GPU dies (individual silicon chips within GPU) create a unified system, establishing Blackwell as the most potent AI chip.
- Second-generation transformer engine: This engine, equipped with micro-tensor scaling support and dynamic range management algorithms, doubles Blackwell’s compute capabilities for transformer-based models, which are frequently used in NLP and large-scale machine learning.
- Fifth-Generation NVLink®: Blackwell’s latest NVLink® offers a 1.8TB/s bidirectional throughput per GPU, enabling efficient data transfer across up to 576 GPUs—a feature crucial for managing large language models and other extensive AI applications.
- RAS (Reliability, Availability, and Serviceability) Engine: Blackwell’s RAS engine supports continuous operation through AI-based preventative maintenance. This diagnostic feature enhances system reliability and reduces downtime, making it essential for large-scale AI deployments.
- Confidential computing: To address privacy concerns, Blackwell incorporates advanced security features that safeguard AI models and sensitive data, making it particularly useful in privacy-sensitive sectors like healthcare and finance.
- Dedicated decompression engine: This feature accelerates data processing for data science and analytics applications, enhancing database queries and accelerating workflows in data-intensive industries.
Industry adoption and future prospects
Microsoft is the first cloud provider to integrate NVIDIA’s Blackwell AI chip into its Azure AI infrastructure, enhancing capabilities for large-scale language models and real-time AI applications. Azure sets a new standard for cloud-based AI infrastructure performance and efficiency with advanced cooling systems and high-speed InfiniBand networking.
Given their reliance on AI-driven applications, leading cloud providers like Amazon Web Services (AWS), Google Cloud, and Meta are likely to consider Blackwell for their high-performance computing and AI needs. Blackwell’s efficiency and computational power make it an attractive option for enterprises looking to improve energy savings and AI performance.
The increasing adoption of AI raises environmental considerations due to its energy footprint. Blackwell addresses this challenge through power-efficient design features, including HBM (High-Bandwidth Memory) and advanced power management, which allow organisations to reduce energy consumption and lower operational costs in data centres.
Expanding the lineup with advanced superchips
The Blackwell architecture is not limited to standalone GPUs; NVIDIA’s lineup includes a series of advanced Superchips designed for the most demanding AI tasks. These Superchips combine Blackwell GPUs with NVIDIA’s latest CPUs, as well as high-speed interconnects, to create solutions capable of handling vast datasets and massive language models with greater efficiency and scalability. At the forefront of this lineup is the GB200 Grace Blackwell Superchip, engineered for peak performance in data-intensive applications and hyperscale AI deployments.
NVIDIA Blackwell chip: powering advanced AI
The NVIDIA GB200 Grace Blackwell Superchip combines two Blackwell GPUs with the Grace CPU, linked by a 900GB/s connection for rapid data transfer. With network speeds up to 800Gb/s, this superchip easily supports complex AI workloads and data-intensive applications.
GB200 NVL72 rack-scale system for large AI models
NVIDIA’s GB200 NVL72 system combines 36 Grace Blackwell Superchips, delivering exaflop-level performance with 30TB of memory. This setup achieves 30 times the performance of NVIDIA’s H100 GPUs while reducing energy consumption and operational costs by up to 25 times. It’s an ideal choice for hyperscale AI inference workloads. The HGX B200 server board also offers a scalable option, supporting up to eight GPUs at network speeds up to 400Gb/s, making it suitable for a range of high-performance AI applications.
Distilled
NVIDIA’s Blackwell architecture is poised to be critical in advancing AI research and applications. From powering large language models to enabling real-time AI functionalities, Blackwell delivers the performance, efficiency, and scalability needed for next-generation AI.