NVIDIA Blackwell: Here’s everything you need to know
Ever since NVIDIA ventured into ray tracing with the Turing architecture, the company has consistently pushed the boundaries of GPU design. The pace of innovation accelerated with Ampere and reached new heights under Ada Lovelace, which introduced technologies such as advanced RT Cores, more capable Tensor Cores, and the headline feature—DLSS 3, with its AI-driven frame generation. Now, NVIDIA is moving further along its trajectory with the unveiling of its Blackwell architecture, anchored by the RTX 50 series of GPUs. Blackwell is poised to redefine how gaming, professional applications, and AI-driven tasks are performed, largely through improvements in manufacturing processes, chiplet-based designs, and expansive neural rendering features.
As with any leap in GPU architecture, the transition from Ada Lovelace to Blackwell is far from a trivial upgrade. From a structural standpoint, Blackwell is built on TSMC’s refined 4NP node, a deliberate choice that targets both higher transistor density and improved power efficiency. This new approach is partly motivated by the increasingly demanding workload of modern games, which often combine complex ray-traced elements with advanced artificial intelligence (AI) routines for everything from in-game physics to facial animations. Creative professionals also benefit from more GPU horsepower, given the growing reliance on GPU acceleration in tasks such as 3D modelling, video editing, and AI-based content creation. Consequently, Blackwell’s enhancements are not merely incremental but are intended to address the broadening landscape of GPU applications.
One of the most notable aspects of Blackwell is its chiplet-based design—a departure from the monolithic approach that dominated previous GPU generations. This design leverages multiple reticle-limited dies linked by an ultra-high-speed interconnect, enabling more efficient scaling. The architecture also introduces new AI-driven capabilities, particularly in the realm of neural rendering, which includes the debut of RTX Neural Shaders. These developments are closely intertwined with the upgraded Tensor and RT Cores, culminating in a GPU that can handle a wider range of tasks at higher speeds.
A long way since Turing
Turing introduced dedicated hardware for ray tracing (RT Cores) and AI-focused acceleration (Tensor Cores). Ampere refined this model with second-generation RT Cores and third-generation Tensor Cores, offering improvements in raw throughput and efficiency. Ada Lovelace then built upon these foundations to deliver further computational density and features such as DLSS 3, which underscored NVIDIA’s ongoing focus on AI-driven rendering.
However, GPU architectures have had to broaden their scope as user needs evolve. Games are increasingly detailed, featuring higher-resolution assets and more complex lighting. Professional applications rely on GPUs for rendering, scientific simulations, and machine learning training, among other tasks. As such, each new architecture aims to serve a dual purpose: improving gaming performance whilst catering to data centres, research institutions, and content creators who demand robust floating-point and AI throughput.
With Blackwell, NVIDIA has taken a multi-pronged approach that aims to tackle these various domains through a single, cohesive framework. While gaming remains the most public-facing aspect of GPU performance, the architecture must also excel in machine learning, high-resolution video encoding/decoding, and 3D content creation. Balancing these requirements is no small feat, particularly in an era when GPU manufacturers are reaching the physical limits of transistor scaling. That is where TSMC’s new 4NP process and chiplet-based designs enter the picture, promising more efficient use of silicon and better yields despite increasing complexities.
Better Manufacturing Process
TSMC’s 4NP Node
Central to Blackwell’s design is TSMC’s 4NP node, a customised refinement over the 4N process used by Ada Lovelace. While the naming convention may suggest only a minor iteration, it actually represents a set of targeted improvements that serve NVIDIA’s specific goals. Higher transistor density means more hardware units can be packed into the same die area, leading to faster throughput for both gaming and compute tasks. Moreover, the improved process node addresses some of the thermal challenges that often accompany additional cores. By optimising voltage and clock rates, Blackwell aims to run cooler without sacrificing performance.
Energy efficiency is an increasingly important factor in GPU architecture, and 4NP offers enhancements that resonate with broader industry sustainability targets. While the jump from 4N to 4NP might not drastically alter overall TDP figures at the high end, it provides incremental gains in performance-per-watt—an especially crucial metric for data centres, where power efficiency translates to significant operational cost savings. In gaming setups, improved energy efficiency is beneficial, as it can reduce the heat output and power draw, improving system stability and user comfort.
Chiplet-Based Design
Perhaps one of the most significant change with regards to Blackwell’s manufacturing strategy is its shift to a chiplet-based architecture. Traditionally, GPUs have been designed around a single, monolithic die. However, as dies become more complex, yield rates can suffer. A single defect can render an otherwise large portion of the wafer unusable, driving up production costs. By contrast, splitting the GPU into multiple reticle-limited dies connected by a high-speed interconnect allows NVIDIA to produce each chiplet more reliably. Furthermore, it enables them to combine these chiplets to form more powerful GPUs, without incurring the same risk of manufacturing defects that a monolithic approach would face.
NVIDIA’s design for Blackwell involves two reticle-limited dies, interconnected by a 10 TB/s chip-to-chip link. This is a remarkable number when compared to previous generation solutions. The high bandwidth ensures that splitting a GPU into multiple dies does not introduce significant latency penalties, allowing for near-seamless communication across the GPU. In turn, this design can be scaled up or down for different product segments, whether for a mid-range card aimed at gamers or a top-tier workstation GPU intended for large-scale simulations. We didn’t get detailed information regarding die sizes, that will be revealed close to the review dates.
Transistor Budget and Implications
Due to the optimisations from TSMC’s 4NP node and the chiplet-based design, the flagship RTX 5090 GPU houses over 208 billion transistors, an enormous leap from the 76 billion found in the RTX 4090 under Ada Lovelace. This expanded transistor budget translates to a greater number of CUDA cores, RT Cores, Tensor Cores, and other specialised units, all working in concert. Such an increase allows for new algorithmic possibilities, whether in AI, gaming, or mixed reality applications.
The additional transistor headroom also makes it feasible to implement deeper caches, advanced memory controllers, and more robust logic blocks for ray tracing. These architectural changes should not be viewed solely through the lens of raw framerate boosts, though they undoubtedly will have a bearing on gaming performance. They also open up new avenues for complex simulations, real-time path tracing, and neural rendering, tasks that demand a substantial computational foundation.
Neural Rendering and DLSS 4
Neural Rendering
Neural rendering is one of the stand-out features that NVIDIA has been promoting since the earlier days of DLSS. In essence, it uses AI to enhance and accelerate traditional rendering pipelines. With Blackwell, NVIDIA is not merely refining the approach but expanding its scope. The new RTX Neural Shaders are designed to integrate small neural networks directly into the GPU’s pipeline, effectively allowing AI-based models to handle a range of tasks, from texture compression to radiance caching. The outcome is a more intelligent rendering process that can predict and fill in details without relying solely on brute-force computations.
This shift towards neural rendering is particularly valuable for realistic materials and lighting, where physically accurate models can be computationally expensive. Using neural networks to approximate certain aspects of the pipeline frees up the GPU’s core shading units for other tasks, thereby enhancing overall performance. Beyond gaming, these neural shaders are expected to be valuable in professional applications such as real-time architectural visualisation, film production, and engineering simulations, where photorealism is crucial and time is often limited.
DLSS 4 and Multi-Frame Generation
Building on the success of DLSS (Deep Learning Super Sampling), Blackwell introduces DLSS 4, which boasts Multi-Frame Generation. While DLSS 3 allowed the creation of additional frames based on AI predictions, the new version is capable of generating up to three supplementary frames for every fully rendered frame. This substantial jump can yield as much as an 8x performance increase in scenarios where GPU-bound tasks would otherwise limit framerates.
DLSS 4 also incorporates advanced motion vectors and transformer-based AI models. The algorithm analyses multiple rendered frames alongside in-game data to create new frames that appear more cohesive and consistent, even during rapid movement. This approach ensures not only higher frame rates but also improved motion clarity. By using AI to interpolate frames, DLSS 4 eases the burden on the GPU, allowing it to devote more cycles to tasks like ray tracing calculations or advanced physics simulations. Of course, any AI-based approach may introduce its share of artefacts, but NVIDIA asserts that improvements in neural network training and on-the-fly corrections significantly minimise such anomalies.
Neural Faces and Characters
One of the more eye-catching capabilities within Blackwell’s suite of features is RTX Neural Faces. By employing generative AI models trained on a wide array of facial structures and expressions, RTX Neural Faces can dynamically produce facial animations that respond more naturally to in-game events or user input. This could be transformative for character-driven narrative games, where emotional nuance is essential for immersion. Additionally, the architecture leverages linear-swept spheres for hair simulation, aiming to render hair strands that react realistically to movement and environmental factors like wind or rain.
The technology opens up exciting possibilities for developers and digital artists alike. With the ability to generate expressions and hair simulations on the fly, game studios can reduce the time spent on manually fine-tuning animations, allowing them to focus on storytelling and game mechanics. For 3D animators working in film or television, neural faces could streamline the production pipeline, especially for secondary characters who need convincing facial expressions without requiring expensive motion capture or manual keyframing.
Advanced Tensor and RT Cores
5th Generation Tensor Cores
Underpinning Blackwell’s ambitious AI-related features are its fifth-generation Tensor Cores, which deliver up to 2.5 times the AI processing performance of their predecessors. These cores are optimised for matrix multiplication and other operations integral to neural networks. Tensor Cores first appeared in the Volta architecture, primarily for scientific computing, and have since evolved into integral parts of mainstream GPUs, largely thanks to the popularity of DLSS and other AI-driven processes.
In Blackwell, the improvement is not just about raw throughput. NVIDIA has refined how Tensor Cores handle sparsity—a technique where zero or near-zero values in matrices are skipped to accelerate computation. The new architecture better recognises which values can be pruned, allowing the GPU to focus on significant calculations and thereby improve efficiency. This is crucial in tasks like generative AI, large language models, and real-time inference, areas in which the GPU is frequently operating on enormous datasets with many redundant values.
4th Generation RT Cores
Ray tracing has become an essential aspect of modern graphics, and Blackwell’s fourth-generation RT Cores take it a step closer to real-time path tracing. Although Ada Lovelace already showcased advanced ray tracing capabilities, the Blackwell generation refines these capabilities, increasing ray-triangle intersection rates and improving how the system handles complex scenes. A noteworthy feature is Mega Geometry, which accelerates updates to the bounding volume hierarchy (BVH). Large open-world or cluster-based scenes stand to benefit from this advancement, as they usually require frequent BVH updates. There’s always better interoperability with industry leading technologie such as Unreal Engine’s Nanite technology.
With these new RT Cores, developers have more leeway to incorporate detailed reflections, refractions, and global illumination in real-time contexts. Coupled with the AI-powered insights from Tensor Cores, the GPU can deliver better denoising and faster convergence for ray-traced images. This synergy is especially beneficial in professional workflows such as architectural design visualisations or VFX productions, where physically accurate lighting is a must. Moreover, for gamers who revel in visually striking experiences, these enhancements help ensure that advanced ray tracing features do not cripple performance, even at higher resolutions.
Memory Enhancements
Transition to GDDR7
Memory bandwidth often proves to be a bottleneck in GPU performance, particularly at ultra-high resolutions with demanding game assets. Recognising this, NVIDIA has moved from GDDR6 and GDDR6X memory to GDDR7 in Blackwell. GDDR7 offers data rates of up to 32 Gbps, a substantial jump over previous generations. This means data moves more swiftly between the GPU cores and VRAM, translating into higher frame rates and smoother multitasking.
Power Savings with PAM3 Signalling
A significant innovation in GDDR7 is the use of PAM3 (Pulse-Amplitude Modulation with three levels) signalling, which allows for more efficient data transmission than standard two-level signalling schemes. By requiring fewer transitions to represent data, PAM3 cuts down on both latency and energy consumption, allowing GPUs to achieve higher data rates while keeping power draw within manageable limits. This is crucial not just for the gaming enthusiast but also for data centres, where power considerations can significantly impact operational costs.
Practical Effects on Performance
For gamers, the immediate benefit of GDDR7 is the reduction in memory bottlenecks, which can be most noticeable in high-resolution or high-refresh-rate gaming. Texture streaming becomes more seamless, and performance dips due to VRAM swaps become less frequent. In content creation, GDDR7’s higher bandwidth shortens load times for large 3D scenes or detailed video editing tasks. Combined with the advanced cache hierarchy in Blackwell, the improved memory subsystem ensures that the GPU’s computational prowess is not hindered by slow data access.
Blackwell in Action: Gaming and Content Creation
Gaming Performance
4K 240Hz Gaming
One of the marquee goals for the Blackwell architecture is to facilitate ultra-high-definition gaming at refresh rates once considered unattainable. The RTX 5090, in particular, targets 4K gaming at up to 240Hz when leveraging DLSS 4 and Multi-Frame Generation. While hitting these figures may depend on the game’s engine and settings, it highlights NVIDIA’s confidence in Blackwell’s capacity to handle demanding graphical workloads. Gamers looking to upgrade to higher refresh-rate 4K monitors could find that Blackwell pairs well with their ambitions, reducing reliance on older resolution standards.
Latency Reduction with Reflex 2
Competitive gaming has always been about more than raw frames per second; system latency is equally crucial. NVIDIA Reflex made headlines for its ability to minimise the delay between input and on-screen response. Blackwell advances this with Reflex 2, coupled with Frame Warp technology, which claims to reduce system latency by up to 75%. These improvements benefit fast-paced esports titles, where every millisecond can make a difference. Gamers who are already hardware enthusiasts may see Reflex 2 as a key reason to choose a new GPU—improving the feeling of responsiveness, particularly in shooters and fighting games that hinge upon split-second reactions.
Enhanced Visuals through DLSS Ray Reconstruction and Neural Shaders
DLSS Ray Reconstruction uses AI to reconstruct ray-traced data more accurately, enabling better shadow and reflection fidelity without the usual performance cost. Meanwhile, Neural Shaders handle tasks such as radiance caching, texture manipulation, and advanced material rendering. For games implementing these features, the combined effect is a visual experience that comes closer to realism, whether through detailed surfaces or life-like lighting transitions. These enhancements do not merely boost performance; they also contribute to an elevated aesthetic appeal, offering a compelling case for enthusiasts to upgrade to the RTX 50 series.
Content Creation
9th Generation NVENC/NVDEC
The enhancements to NVIDIA’s dedicated encoders (NVENC) and decoders (NVDEC) in Blackwell are particularly noteworthy for content creators and professionals who rely on quick video processing. The ninth-generation NVENC adds support for 4:2:2 encoding/decoding and AV1 Ultra High Quality (UHQ) mode. This is beneficial for video editors who work with footage in higher chroma subsampling formats and require minimal colour loss. Additionally, the AV1 UHQ mode offers better compression efficiency at a given bitrate compared to older codecs, leading to improved video quality for livestreams, broadcasts, or cloud gaming.
Generative AI Tools
Beyond gaming, the Blackwell architecture is specifically tuned for AI-driven creative tasks, reflecting a growing industry trend. Tools that generate textures, 3D models, or even animate sequences can benefit from the heightened performance of the fifth-generation Tensor Cores. For instance, artists can feed rough sketches into generative AI tools to create detailed concept art in real-time. Video editors can use AI-based upscaling and denoising features to improve low-quality footage. Musicians working with AI-driven composition tools may also notice faster workflows, as the GPU can handle large neural networks without undue strain. Essentially, the expanded AI capabilities in Blackwell lower the barrier for creators who want to integrate machine learning into their processes but do not necessarily have an extensive data centre or HPC environment.
Benchmarks and Real-World Testing
Blackwell’s improvements are evident in synthetic benchmarks that measure raw computation, memory throughput, and AI processing capabilities. In tests specifically designed to evaluate DLSS 4 performance, the RTX 5090 outperforms its RTX 4090 counterpart by up to 1.7x. These gains derive from both the larger transistor budget and the optimisations made possible by the refined 4NP node. When looking at machine learning tasks such as text generation, the RTX 5080 can achieve a token generation rate of 245 tokens per second, a 40% increase over comparable Ada Lovelace GPUs. These figures indicate that real-world applications ranging from gaming to AI research will see tangible benefits from Blackwell’s design. For the actual numbers, you will have to wait until the reviews are live.
What truly distinguishes Blackwell, however, is how deeply AI has been woven into every corner of the architecture. From DLSS 4’s Multi-Frame Generation to neural shading and generative tools for content creation, the RTX 50 series exemplifies a broader industry trend: the convergence of graphics and AI. As developers innovate and exploit these capabilities, the lines between rendering and inference will blur, eventually unlocking experiences that are more visually stunning and interactive than we have seen before.
In the end, the RTX 50 series is not just about raw power. It is about how power, efficiency, and AI can be balanced to fuel next-generation experiences—both in entertainment and in myriad professional fields. As the technology matures and developers unleash Blackwell’s full potential, it is likely we will witness a fresh wave of software and games that make even bolder use of real-time ray tracing, AI-driven character animations, and perhaps breakthroughs in VR and AR. From that perspective, Blackwell is more than a mere step in the GPU roadmap; it is a springboard that could define the direction of real-time graphics and AI for years to come.
Mithun Mohandas
Mithun Mohandas is an Indian technology journalist with 10 years of experience covering consumer technology. He is currently employed at Digit in the capacity of a Managing Editor. Mithun has a background in Computer Engineering and was an active member of the IEEE during his college days. He has a penchant for digging deep into unravelling what makes a device tick. If there's a transistor in it, Mithun's probably going to rip it apart till he finds it. At Digit, he covers processors, graphics cards, storage media, displays and networking devices aside from anything developer related. As an avid PC gamer, he prefers RTS and FPS titles, and can be quite competitive in a race to the finish line. He only gets consoles for the exclusives. He can be seen playing Valorant, World of Tanks, HITMAN and the occasional Age of Empires or being the voice behind hundreds of Digit videos. View Full Profile