At CES 2026, Nvidia CEO Jensen Huang revealed the company's most advanced AI platform to date. The Vera Rubin platform represents a fundamental shift in how artificial intelligence infrastructure operates at scale. The platform consists of six new chips designed for mass production in the second half of 2026, targeting the next generation of AI workloads including reasoning models, mixture-of-experts architectures, and always-on AI factories.
The Rubin platform marks AI's entry into an industrial phase, where systems continuously convert power, silicon, and data into intelligence at scale. This article breaks down everything you need to know about Nvidia's Vera Rubin platform, from its six-chip architecture to its impact on the AI industry.
What Makes Vera Rubin Different
The Vera Rubin platform breaks from traditional server designs. Instead of treating individual components separately, Nvidia engineered the entire rack as a single coherent system. GPUs, CPUs, networking, security, power delivery, and cooling are architected together through extreme co-design rather than optimized in isolation.
This approach targets a specific problem: modern AI workloads don't just need raw computing power. They need sustained performance across compute, memory, and communication phases. Reasoning models execute multi-step inference over extremely long contexts. Mixture-of-experts models route tokens dynamically across different neural network paths. These workloads create bottlenecks that single-component upgrades can't solve.
The Six Chips Powering Vera Rubin
Rubin GPU: The Compute Engine
The Rubin GPU features two reticle-sized dies and delivers 50 petaFLOPS of NVFP4 inference performance and 35 petaFLOPS of NVFP4 training performance. This represents a 5x improvement for inference and 3.5x improvement for training compared to Blackwell.
| Specification | Blackwell GPU | Rubin GPU |
|---|---|---|
| Compute Dies | 2 | 2 |
| NVFP4 Inference | 10 PFLOPS | 50 PFLOPS |
| FP8 Training | 5 PFLOPS | 17.5 PFLOPS |
| Memory Type | HBM3e | HBM4 |
| Memory Capacity | 192 GB | 288 GB |
| Memory Bandwidth | 8 TB/s | 22 TB/s |
Each Rubin GPU package includes eight stacks of HBM4 memory delivering 288GB of capacity and 22 TB/s of bandwidth. The nearly 3x increase in memory bandwidth directly addresses the decode and long-context challenges that bottleneck current AI systems.
The Rubin GPU also introduces a third-generation Transformer Engine with hardware-accelerated adaptive compression. This technology maintains model accuracy while boosting computational density, enabling the chip to process more operations per watt.
Vera CPU: Data Movement Orchestrator
The Vera CPU features 88 custom Olympus ARM cores with spatial multi-threading technology that enables 176 threads. Nvidia designed these cores specifically for AI factory workloads rather than using off-the-shelf ARM designs.
| Feature | Grace CPU | Vera CPU |
|---|---|---|
| Cores | 72 Neoverse V2 | 88 Custom Olympus |
| Threads | 72 | 176 |
| Memory Bandwidth | 512 GB/s | 1.2 TB/s |
| Memory Capacity | 480 GB | 1.5 TB |
| NVLink-C2C | 900 GB/s | 1.8 TB/s |
The Vera CPU provides 1.8 TB/s of NVLink-C2C coherent memory interconnect, allowing CPUs and GPUs to share a unified address space. Applications can treat LPDDR5X system memory and HBM4 GPU memory as a single pool, reducing data movement overhead.
The CPU functions as a data engine that keeps GPUs productive. It handles orchestration, data staging, scheduling, and agentic workflows without creating bottlenecks. This design delivers 2x the data processing performance compared to Grace.
NVLink 6 Switch: Rack-Scale Fabric
NVLink 6 delivers 3.6 TB/s of bidirectional bandwidth per GPU, doubling scale-up bandwidth over the previous generation. Each Vera Rubin NVL72 rack contains nine NVLink 6 switches providing 260 TB/s of total scale-up bandwidth.
The switches create an all-to-all topology where any GPU can communicate with any other GPU in the rack with consistent latency. This uniform connectivity eliminates hierarchical bottlenecks that plague traditional server designs.
NVLink 6 also integrates SHARP in-network compute technology. Each NVLink 6 switch tray delivers 14.4 TFLOPS of FP8 in-network compute, enabling collective operations to execute directly inside the fabric. This offload can reduce communication traffic by up to 50% for certain workloads.
ConnectX-9 SuperNIC: Intelligent Endpoints
Each compute tray contains four ConnectX-9 SuperNIC boards delivering 1.6 Tb/s of network bandwidth per Rubin GPU. These network interface cards don't just move data—they enforce programmable congestion control, traffic shaping, and packet scheduling at the endpoint.
AI workloads generate highly correlated traffic patterns. When training mixture-of-experts models, large numbers of GPUs often inject data simultaneously, creating congestion spikes. ConnectX-9 prevents these spikes by shaping traffic before it enters the network.
The cards also provide hardware-accelerated encryption for IPsec and Platform Security Protocol, enabling secure GPU-to-GPU communication without performance penalties.
BlueField-4 DPU: Infrastructure Processor
The BlueField-4 DPU integrates a 64-core Grace CPU with ConnectX-9 networking, creating a dedicated processor for operating the AI factory itself. This chip handles networking, storage, telemetry, and security services independently of the main compute processors.
| Feature | BlueField-3 | BlueField-4 |
|---|---|---|
| Bandwidth | 400 Gb/s | 800 Gb/s |
| Compute Cores | 16 ARM A78 | 64 ARM Neoverse V2 |
| Memory Bandwidth | 75 GB/s | 250 GB/s |
| Memory Capacity | 32 GB | 128 GB |
By offloading infrastructure tasks to dedicated hardware, BlueField-4 ensures that CPUs and GPUs remain focused on AI execution. The DPU also enables the Inference Context Memory Storage platform, a new infrastructure tier for efficiently storing and retrieving key-value cache data across inference requests.
Spectrum-6 Ethernet Switch: Scale-Out Networking
Spectrum-6 delivers 102.4 Tb/s of total bandwidth through 512 x 200 Gb/s Ethernet ports using co-packaged optics. Unlike traditional Ethernet switches that use pluggable transceivers, Spectrum-6 integrates silicon photonics directly with the switching silicon.
This co-packaged optics approach delivers approximately 5x better power efficiency and dramatically improved signal integrity. Optical loss drops from approximately 22 dB to approximately 4 dB, achieving up to 64x better signal integrity.
The switch also implements advanced congestion control and adaptive routing specifically designed for AI traffic patterns, maintaining high effective bandwidth under synchronized, bursty loads.
Vera Rubin NVL72: The Complete System
Each Vera Rubin NVL72 rack offers 3.6 exaFLOPS of NVFP4 inference performance, 2.5 exaFLOPS of NVFP4 training performance, 54 TB of LPDDR5X memory, and 20.7 TB of HBM4. The system connects 72 Rubin GPUs and 36 Vera CPUs through NVLink 6 switches.
| Specification | Value |
|---|---|
| Rubin GPUs | 72 (144 reticle-sized dies) |
| Vera CPUs | 36 |
| NVFP4 Inference | 3.6 ExaFLOPS |
| NVFP4 Training | 2.5 ExaFLOPS |
| HBM4 Memory | 20.7 TB |
| LPDDR5X Memory | 54 TB |
| HBM4 Bandwidth | 1.6 PB/s |
| Scale-up Bandwidth | 260 TB/s |
The rack uses warm-water, single-phase direct liquid cooling with a 45-degree Celsius supply temperature. This approach eliminates traditional air cooling, dramatically reducing energy consumption for thermal management.
Nvidia redesigned the internal architecture for serviceability. The cable-free modular tray design enables up to 18x faster assembly compared to previous generation architectures. Components can be serviced without draining the entire rack.
Performance Improvements Over Blackwell
The Vera Rubin platform delivers substantial generational improvements across multiple dimensions:
| Metric | vs. Blackwell | Impact |
|---|---|---|
| Inference Performance | 5x higher | Lower cost per token |
| Training Performance | 3.5x higher | Faster model development |
| Memory Bandwidth | 2.8x higher | Better long-context handling |
| Scale-up Bandwidth | 2x higher | Improved MoE efficiency |
| CPU Performance | 2x higher | Better orchestration |
Nvidia claims Vera Rubin offers 10x reduction in inference token cost and 4x reduction in number of GPUs needed to train mixture-of-experts models compared to Blackwell GB200.
These improvements target specific AI workload characteristics. The 5x inference performance gain addresses the shift toward reasoning models that generate more tokens per query. The 2x increase in scale-up bandwidth tackles the communication bottlenecks in mixture-of-experts architectures.
Confidential Computing at Rack Scale
Vera Rubin NVL72 extends confidential computing beyond individual devices to create a unified, rack-scale trusted execution environment spanning CPUs, GPUs, and interconnects. This represents third-generation confidential computing from Nvidia.
The platform encrypts all data in motion across:
- CPU-to-GPU communication via NVLink-C2C
- GPU-to-GPU communication via NVLink
- Device I/O using PCIe IDE and TDISP protocols
Organizations can cryptographically verify system integrity through Nvidia's remote attestation services. This capability enables secure operation of proprietary models and sensitive data in shared or cloud environments without trusting the infrastructure provider.
Energy Efficiency and Power Management
Approximately 30% of power in AI factories is lost to conversion, distribution, and cooling before reaching the GPUs. Vera Rubin addresses this through multiple innovations.
The rack incorporates approximately 6x more local energy buffering than Blackwell Ultra. This storage absorbs rapid power transients directly at the source, smoothing synchronized workload power swings.
Warm-water direct liquid cooling captures heat far more efficiently than air cooling. Higher operating temperatures reduce chiller energy consumption and enable dry-cooler operation with minimal water usage.
Rack-level power smoothing works with software-defined controls to maintain stable power delivery. Controlled ramps, enforced limits, and local energy storage reduce peak demand without throttling performance.
Software Stack and Developer Experience
The Vera Rubin platform maintains full CUDA backward compatibility. Existing models, frameworks, and workflows run seamlessly while automatically benefiting from hardware improvements.
Nvidia provides optimized libraries including cuDNN, CUTLASS, FlashInfer, and the new Transformer Engine. These components tightly couple with Rubin's Tensor Cores, HBM4 memory, and NVLink 6 interconnect.
The NeMo Framework offers end-to-end workflows for building, training, aligning, and deploying large models. Megatron Core supplies the underlying distributed training engine with advanced parallelism strategies.
For inference, the platform integrates with SGLang, TensorRT-LLM, vLLM, and Dynamo. The software stack includes NVLink-enabled communication, disaggregated inference, and KV-cache offloading to storage.
Mission Control software handles cluster-level operations including validation, diagnostics, telemetry, autonomous recovery, and workload management.
The Rubin Ultra Roadmap
Rubin Ultra, targeted for 2027, will feature four reticle-sized chips offering up to 100 PFLOPS of FP4 performance and 1 TB of HBM4e memory across 16 HBM sites. This represents a doubling of compute capability and more than 3x increase in memory capacity.
The Rubin Ultra platform will use a new Kyber rack architecture capable of handling 600 kilowatts of power. The NVL576 configuration will feature 576 Rubin Ultra GPUs delivering 15 ExaFLOPS of FP4 inference and 5 ExaFLOPS of FP8 training.
Market Impact and Availability
Nvidia Rubin is in full production, with Rubin-based products available from partners in the second half of 2026. Major cloud providers including Oracle, CoreWeave, and others have committed to deploying Vera Rubin systems.
The platform directly responds to increasing competition from AMD's Helios rack systems and other AI accelerator providers. AMD's Helios promises floating point performance roughly equivalent to Vera Rubin NVL72, creating pressure on both companies to deliver superior total cost of ownership.
Wall Street analysts remain divided on the AI infrastructure cycle. Wedbush's Dan Ives sees a path to $6 trillion market capitalization for Nvidia, while DA Davidson's Gil Luria warns the data center market may be approaching a peak.
The success of Vera Rubin depends partly on continued AI infrastructure investment. If enterprises and cloud providers maintain spending momentum, the platform's 5-10x performance improvements could drive significant GPU refreshes throughout 2026-2027.
Key Takeaways
The Nvidia Vera Rubin platform represents a comprehensive rethinking of AI infrastructure. Six new chips work together as a unified system rather than independent components. The rack-scale architecture eliminates bottlenecks that limit performance in current systems.
Performance improvements target specific AI workload characteristics: 5x better inference for reasoning models, 2x higher interconnect bandwidth for mixture-of-experts, and 2.8x more memory bandwidth for long-context processing.
Energy efficiency advances through warm-water cooling, rack-level power management, and co-packaged optics reduce the parasitic power losses that waste 30% of data center electricity.
Full-stack confidential computing and extensive reliability features enable secure, always-on operation at unprecedented scale.
The platform launches in the second half of 2026, with the more powerful Rubin Ultra variant following in 2027. Whether it maintains Nvidia's market dominance depends on continued AI infrastructure investment and execution against increasing competition.
