The next-generation Nvidia Blackwell GPU architecture and RTX 50-series GPUs are coming, right on schedule. While Nvidia hasn’t officially provided any timeframe for when the consumer parts will be announced, there have been plenty of rumors and supposed leaks of data. We spoke with some people earlier this year, and the expectation was that we’d see at least the RTX 5090 and RTX 5080 by the time the holiday season kicks off in October or November, but more recent rumors plus the delay of Blackwell B200 may have pushed things back. Whenever they launch, we expect the Blackwell GPUs will join the ranks of the best graphics cards.
Nvidia provided many of the core details for its data center Blackwell B200 GPU. While the AI and data center variants will inevitably differ from consumer parts, there are some shared aspects between past consumer and data center Nvidia GPUs, and we expect that to continue. That means that we at least have some good indications of certain aspects of the future RTX 50-series GPUs.
There are still a lot of unknowns, with leaks that appear more like people throwing darts at the wall instead of having actual inside information. We’ll cover the main rumors along with other details, including the release date, potential specifications, and other technology. Over the coming months, we can expect additional details to come out, and we’ll be updating this article as information becomes available. Here’s everything we know about Nvidia Blackwell and the RTX 50-series GPUs.
Of all the unknowns, the release date — at least for the first Blackwell GPUs — may be the easiest to pin down. Based on what we’ve personally heard, the RTX 50-series could launch by the end of the year, meaning the fall of 2024. Nvidia tends to be good on timing new GPU releases, and getting the top RTX 5090 and 5080 out before the November and December holiday shopping period makes the most sense.
There’s plenty of historical precedent here as well. The Ada Lovelace RTX 40-series GPUs first appeared in October 2022. The Ampere RTX 30-series GPUs first appeared in September 2020. Prior to that, RTX 20-series launched two years earlier in September 2018, and the GTX 10-series was in May/June 2016, with the GTX 900-series arriving in September 2014. That’s a full decade of new Nvidia GPU architectures arriving approximately every two years, and we see no reason for Nvidia to change tactics now.
It’s not just about the two-year consumer GPU cadence, either. Nvidia first revealed core details of the Hopper H100 architecture in March 2022 at its annual GPU Technology Conference (GTC), with Ada Lovelace arriving in October 2024. And in May 2020, it first revealed its Ampere A100 architecture, followed by the consumer variants a few months later. The same thing happened in 2018 as well, with Volta V100 and Turing, and in 2016 there was the Tesla P100 and Pascal. So, in the past four generations, we’ve learned first about the data center and AI GPUs, with the consumer GPUs revealed and launched later in the same year. Now that Nvidia has revealed the Blackwell B200 architecture, again at GTC, and it’s a reasonably safe bet we’ll hear about the consumer variants this fall.
Except Blackwell B200 has been pushed back into 2025, according to the latest news. With that change, it’s entirely possible everything else has been pushed back as well. Renowned leaker kopite7kimi thinks consumer cards will be announced at CES 2025 in JanuaryJaipur Investment. That would be a delay compared to earlier expectations as well as historical precedent, and Nvidia has never launched a new GPU architecture at CES as far as we can recall. However, with little competition for the RTX 4090 right now and a bigger push to get the data center parts out the door, a late 2024 launch certainly isn’t set in stone.
Another factor continues to be AI workloads, and we could see professional cards using the same GPUs as the consumer models arrive first. Nvidia’s current RTX Ada Generation professional GPUs typically cost three to four times as much as consumer cards using the same chips, with double the memory. It’s not difficult to imagine a scenario where Nvidia opts to prioritize AI and data center models over consumer cards, consider the R&D costs associated with creating a new architecture.
We don’t know the exact names or models Nvidia plans for the next generation Blackwell parts. We’re confident we’ll have RTX 5090, RTX 5080, RTX 5070, and RTX 5060 cards, and probably some combination of Ti and/or Super variants. Some of those variants will undoubtedly come out during a mid-cycle refresh about one year after the initial salvo. We’re also curious about whether or not Nvidia will have an RTX 5050 GPU — it skipped that level on desktops with the 40-series and 20-series, though the latter had the GTX 1660 and 1650 class GPUs.
Given the past patterns, we expect the top-tier RTX 5090 and 5080 to arrive first, either late this year or in early 2025. Then we’ll see a 5070-class card (maybe with a Ti or Super suffix), followed by the 5060-class about six months after the first GPUs. Whenever the first Blackwell GPUs arrive, we can expect to see the typical staggered release schedule.
One of the surprising announcements at GTC 2024 was that Blackwell B200 will use the TSMC 4NP node — “4nm Nvidia Performance,” or basically a tuned/tweaked variation of the N4P node. While it’s certainly true that process names have largely become detached from physical characteristics, many expected Nvidia to move to a refined variant of TSMC’s cutting-edge N3 process technology. Instead, it opted for a refinement of the existing 4N node that has already been used with Hopper and Ada Lovelace GPUs for the past two years.
Going this route certainly offers some cost savings, though TSMC doesn’t disclose the contract pricing agreements with its various partners. Blackwell B200 also uses a dual-chip solution, with the two identical chips linked via a 10 TB/s NV-HBI (Nvidia High Bandwidth Interface) connection. Perhaps Nvidia just didn’t think it needed to move to a 3nm-class node for this generation.
And yet, that opens the door for AMD and even Intel to potentially shift to a newer and more advanced process node, cramming more efficient transistors into a smaller chip. Nvidia took a similar approach with the RTX 30-series, using a less expensive Samsung 8N process instead of the newer and better TSMC N7Mumbai Stock Exchange. It will be interesting to see if this has any major impact on how the various next-generation GPUs stack up.
Of course, it’s also possible that Blackwell B200 variants will use TSMC 4NP while consumer chips use a different node. Much of that depends on how much of the core architecture gets shared between the data center and consumer variants and whether Nvidia thinks it’s beneficial to diversify. There’s precedent here for having different nodes and even manufacturers, as Ampere A100 used TSMC N7 while the RTX 30-series chips used Samsung 8N. GTX 10-series Pascal GP107 and GP108 were also made on Samsung’s 14LPP, while GP102, GP104, and GP106 were made on TSMC 16FF.
It’s long been expected that the consumer and professional (i.e., not strictly data center) Blackwell GPUs will move to GDDR7 memory. All indications from GTC 2024 are that GDDR7 will be ready in time for the next generation of GPUs before the end of the year. In fact, Samsung and SK hynix showed off GDDR7 chips at GTC, and Micron confirmed that GDDR7 is also in production.
The current generation RTX 40-series GPUs use GDDR6X and GDDR6 memory, clocked at anywhere from 17Gbps to 23Gbps. GDDR7 has target speeds of up to 36Gbps, 50% higher than GDDR6X and 80% higher than vanilla GDDR6. SK hynix says it will even have 40Gbps chips, though the exact timeline for when those might be available wasn’t detailed. Regardless, this will provide a much-needed boost to memory bandwidth at all levels.
We don’t know if Nvidia will actually ship cards with memory clocked at 36Gbps. In the past, it used 24Gbps GDDR6X chips but clocked them at 22.4Gbps or 23Gbps — and some 24Gbps Micron chips were apparently down-binned to 21Gbps in the various RTX 4090 graphics cards that we tested. So, Nvidia could take 36Gbps memory but only run it at 32Gbps. That’s still a healthy bump to bandwidth.
At 36Gbps, a 384-bit GDDR7 memory interface can provide 1728 GB/s of bandwidth. That’s 71% higher than what we currently get on the RTX 4090. A 256-bit interface would deliver 1152 GB/s, compared to the 4080 Super’s 736 GB/s — a 57% increase. 192-bit cards would have 864 GB/s, and even 128-bit cards would get up to 576 GB/s of raw bandwidth. Nvidia might even go so far as to create a 96-bit interface with 432 GB/s of bandwidth.
We also expect that Nvidia will keep using a large L2 cache with Blackwell. This will provide even more effective memory bandwidth — every cache hit means a memory access that doesn’t need to happen. With a 50% cache hit rate as an example, that would double the effective memory bandwidth, though note that hit rates vary by game and settings, with higher resolutions in particular reducing the hit rate.
GDDR7 also potentially addresses the issue of memory capacity versus interface width. At GTC, we were told that 16Gb chips (2GB) are in production, but 24Gb (3GB) chips are also coming. The larger chips with non-power-of-two capacity probably won’t be ready until 2025, but those will be more important for lower-tier parts. That’s another point in favor of an early 2025 announcement, incidentally, because it means the top models could come with 50% more VRAM capacity.
Still, there’s no pressing need for consumer graphics cards to have more than 24GB of memory, though we could see a 32GB RTX 5090 (with a 512-bit interface). Even 16GB is generally sufficient for gaming, with a 256-bit interface. Professional GPUs on the other hand are often used for large 3D models as well as AI workloads where having more VRAM would be a major boon. A 512-bit interface with 3GB chips on both sides of the PCB could yield a professional RTX 6000 Blackwell Generation as an example with 96GB of memory.
More importantly, the availability of 24Gb chips means Nvidia (along with AMD and Intel) could put 18GB of VRAM on a 192-bit interface, 12GB on a 128-bit interface, and 9GB on a 96-bit interface, all with the VRAM on one side of the PCB. We could even see 24GB cards with a 256-bit interface, and 36GB on a 384-bit interface — and double that capacity for professional cards. Pricing will certainly be a factor for VRAM capacity, but it’s more likely a case of “when” rather than “if” we’ll see 24Gb GDDR7 memory chips on consumer GPUs.
The Blackwell architecture will almost certainly contain various updates and enhancements over the previous generation Ada Lovelace architecture, but right now the summary of what we know for certain can be summed up with two words: not much. But every generation of Nvidia GPUs has contained at least a few architectural upgrades, and we can expect the same to occur this round.
Nvidia has increased the potential ray tracing performance in every RTX generation, and Blackwell seems likely to continue that trend. With more games like Alan Wake 2 and Cyberpunk 2077 pushing full path tracing — not to mention the potential for modders to use RTX Remix to enhance older DX10-era games with full path tracing — there’s even more need for higher ray tracing throughput. There will probably be other RT-centric updates as well, just like Ada offered SER (Shader Execution Reordering), OMM (Opacity Micro-Maps), and DMM (Displaced Micro-Meshes). But what those changes might be is as yet unknown.Hyderabad Investment
What we do know is that the data center Blackwell B200 GPU has reworked the tensor cores yet again, offering native support for FP4 and FP6 numerical formats. Those will be primarily useful for AI inference, and considering the consumer GPUs will do double duty with the professional cards, it’s a safe bet that all Blackwell chips will support FP4 and FP6 as well. (Ada added the same FP8 support as Hopper to its tensor cores, as a related example.)
What other architectural changes might Blackwell bring? If we’re correct that Nvidia is sticking with TSMC 4NP for the consumer parts, we wouldn’t anticipate massive alterations. There will still be a large L2 cache, and the enhanced OFA (Optical Flow Accelerator) used for DLSS 3 frame generation will of course stick around. It might even get some tweaks to improve it, though we’ll have to wait and see.
One potential hint at what could happen with the fastest solutions comes from the Blackwell B200. Nvidia created NV-HBI to link two identical chips together into one massive GPU. This isn’t SLI but rather a chiplet-style approach with massive inter-chip bandwidth so that the two chips functionally behave as a single GPU. Could NV-HBI show up on consumer GPUs as well? We think that’s a reasonable possibility — probably not on the lower-spec chips but certainly on the largest chip and highest tier models.
Raw compute, for both graphics and more general workloads, will almost certainly increase by a decent amount, though probably more along the lines of a 30% boost rather than a 50% increase. RTX 4080 offers 40 TeraFLOPS of FP32 compute compared to the 3080’s 30 TeraFLOPS, for example — a 33% increase — while the 4090 offers 83 TeraFLOPS compared to the 3090’s 40 TeraFLOPS — a much larger 107% increase. Perhaps Nvidia will “go big” on the RTX 5090 as well while making smaller improvements elsewhere, but we’ll have to wait and see.
How much will the RTX 50-series GPUs cost? Frankly, considering the current market conditions, there’s little reason to expect Nvidia to reduce prices relative to the current RTX 40-series GPUs. Nvidia will price the cards as high as it feels the market will accept. With potentially higher AI performance and the increased demand from the non-gaming sector, we might be lucky if the next generation carries the same pricing structure as the current generation.
At the same time, we hope that generational pricing won’t increase. $1,000 for the “step down” RTX 4080 Super means that particular level of GPU now costs 43% more than it did in the RTX 2080 Super days. Of course, we also had the “$699” RTX 3080 10GB and “$1,199” RTX 3080 Ti in between, when prices were all kinds of messed up thanks to the prevalence of GPU cryptomining coupled with the effects of Covid-19. Thankfully, while it’s currently technically profitable to mine certain cryptocurrencies with a GPU, WhatToMine puts the estimated income at far less than $1 per day for an RTX 4090 — meaning it would take over ten years to break even at current rates and prices. (No one should be doing that, as the GPU is more likely to die before breaking even.)
The budget GPU sector has also basically died off. Integrated graphics have reached the point where they’re “fast enough” for most common workloads, even including modest gaming — that’s particularly true for mobile processors, with desktop options typically being far less potent. The last new GPUs to truly target the budget sector were AMD’s rather unimpressive RX 6500 XT and RX 6400 — Nvidia hasn’t made a new sub-$200 GPU since the GTX 1650 Super launched in 2019 (unless you want to count the travesty that was the GTX 1630).
That means for dedicated desktop graphics cards we’re now living in a world where “budget” means around $300, “mainstream” means $400–$600, “high-end” is for GPUs costing close to $1,000, and the “enthusiast” segment targets $1,500 or more. Or at least, that appears to be Nvidia’s take on the situation. AMD’s GPUs tend to be a bit more affordable, particularly when looking at street prices, but Nvidia has maintained a higher pricing structure for at least the past four years.
How good/bad will prices be when Blackwell GPUs arrive? Don’t be surprised if everything costs more than the prior generation, particularly for custom AIB partner models that come with a factory overclock. Whether prices remain high will likely depend as much on whether the AI bubble bursts or not.
Given everything we’ve said so far, it should hopefully be clear that there’s very little official information on Blackwell currently available. The Nvidia hack in 2022 gave us the Blackwell name and some potential codenames, but that was over two years ago, and a lot can change in that time. Plus, the details on Blackwell were pretty thin in the first place.
However, as with every major GPU architecture update, plenty of rumors and supposed leaks are floating around. Some suggest they have inside knowledge, others appear to be guesses. Just to cite a few recent examples, one ‘leak’ said we should expect Blackwell GB202 to have a 384-bit memory interface in November 2023, while a more recent leak in March 2024 says Blackwell GB202 will have a 512-bit interface. The 512-bit interface has recently firmed up as the most likely solution, based on other ‘leaks,’ but some of that might be wishful thinking rather than factual.
Something else to chew on is the NV-HBI dual-chip solution for the Blackwell B200 that we mentioned earlier. Perhaps the top-tier Blackwell GB202 will take the same approach and have two GB203 chips linked via NV-HBIVaranasi Stock. That would allow Nvidia to keep the actual die size of the fastest chips in check while simultaneously providing for much higher levels of performance.
We’ll include both potential variants of GB202 in our speculative specs table for now, along with estimated names and specs elsewhere. The large number of question marks should make it clear that we do not have any hard information at present.
Again, take the above information with a massive helping of salt — seriously, just dump out the whole salt shaker! We’ve basically plugged in some numbers that seem plausible and stuffed them into the usual Nvidia formula with a given number of SMs, which then gives the CUDA, RT, and tensor core counts based on the usual 128 CUDA, 1 RT, and 4 tensor cores per SM. There are also (traditionally) four TMUs (Texture Mapping Units) per SM. Nvidia can tweak the enabled SM counts quite easily, so final specs may not be nailed down until a few months before launch.
A lot of the potential specs come from recent rumors that could be mere guesses. While the massive GB202 die seems to still be a thing, it’s interesting that it’s more than double the SM counts of the supposed GB203. That’s a very big gap, almost too big to be true. I’m still partial to the idea of GB202 using two fused together GB203 dies, given what we’ve seen wtih Blackwell B200. Other aspects are basically placeholders using whatever Nvidia currently has with the RTX 40-series cards. This mostly applies to L2 cache size, power requirements, and pricing, for example. We make no claims to have insider knowledge of the actual specs right now, and as far as we’re aware, no one reputable has leaked any truly official core counts either.
For the time being, clock speed estimates are a static 2.5 GHz on the GPU clock and 36Gbps on the GDDR7 clock — with 20Gbps on the apparently still GDDR6 GB207 dieSurat Investment. That’s according to recent ‘leaks’ as well. We’re really hoping to see 3GB chips on all the GPUs with a 192-bit or narrower memory interface, to provide a boost in VRAM capacity. #fingers-crossed
We’ll update the above table over the coming months and even years as the rumors develop. Eventually, we’ll have official part names and specifications. We’ll almost certainly end up with far more than five different graphics cards as well, but there’s no sense in guesstimating where those might land at present. Just note that there are ten different RTX 40-series desktop GPUs and twelve different RTX 30-series desktop variants (counting the 3060 12GB / 8GB and 3050 8GB / 6GB as different models).
After the 16-pin meltdown fiasco that plagued the first wave of RTX 4090 cards, many people probably want Nvidia to abandon the new PCI-SIG standard. We’ll bet our proverbial GPU hats that it doesn’t happen, though the change to the modified ATX 12V-2×6 connector has hopefully put any potential problems to rest.
What’s interesting is that the RTX 40-series wasn’t the first generation of GPUs to come with a 16-pin connector. The RTX 30-series used 12-pin adapters (without the extra four sense pins of 12VHPWR) starting clear back in 2020. We didn’t hear a bunch of stories about melting 3090 and 3080 adapters, but then most of those cards had TGPs well under 400W. The RTX 3090 Ti GPUs were the first to use the newer 16-pin connector, but again with no rash of reported meltdowns. With RTX 40-series making widespread use of 16-pin, that means Blackwell will be the third generation of Nvidia GPUs to at least partially adopt the standard.
One of the key elements with the 4090 melting problems seems to be pulling 450W or more through a single relatively compact connector. We can’t help but wonder how high Nvidia might push power requirements with Blackwell, but it’s difficult to imagine anything over 600W. Even so, using two 16-pin connectors that each offer 300W would be the more sensible approach in our book than trying to do that with a single connector. We’ll have to see what happens.
There have long been rumors of a new Titan-class card, first for Ada and now for Blackwell. Such a GPU might be the first Nvidia-made card to come with dual 16-pin connectors, and perhaps a quad-slot cooler as well. And if you don’t have an ATX 3.0 power supply, you’ll still have to use the chonky and ugly 8-pin to 16-pin adapters.
Simla Investment