Saturday, February 6, 2021, 09:39 PM - Tech and Security
Posted by Norbert
Posted by Norbert
The RTX 3090 is an amazing graphics card for gaming and video rendering, but an issue has cropped up within the cryptocurrency mining community since the release of pretty much all 30-series GPU models. The GDDR6X memory, while lightning-fast, runs hot. VERY hot. Overclocked for mining, the temperature of the VRAM chips easily tops the 110c limit and begins to throttle the core clock, effectively tanking what should be 120mh/s, to 80mh/s repeatedly.
I had been unable to maintain the full 120mh/s this card should be capable of without running the GPU and case fans at 90% constantly. One percent lower and it would begin to throttle my core clock, even though the displayed temperature for the card was a cool 45c. After some forum diving, it's become obvious to the community at large that the VRAM is overheating, and the graphics card was using a previously unreported "GPU Memory Junction Temperature" to throttle the card even though the reported temperatures seemed fine. Upon feeling the GPU's backplate, this was verified immediately. It was too hot to touch, and I clocked it at 70c with a temp sensor.
Apparently, the VRAM on the backside of the PCB contacts the backplate and uses it as passive cooling with factory-applied thermal pads. Unfortunately, it doesn't help much because it's made of glossy, smooth metal and there isn't enough surface area to dissipate the heat that's created. What's worse, is that the VRMs are also padded and using the backplate as a heatspreader, effectively sharing heat with the VRAM modules. When gaming or in most production environments there are no issues with this setup, but mining, it's a pretty bad failure with VRAM temps virtually unmitigated by this "solution" at all. It's not a matter of if, but how many seconds until it WILL throttle because of what can only be described as a thermal feedback loop between the VRMs and the VRAM modules sharing the same heatspreader.
After much googling, I found an image of the backside of my particular PCB (below) and plotted where the VRAM chips and the VRMs were. This is the Asus TUF 3090, but most layouts should be similar.
I didn't want to take it apart myself and possibly damage the existing thermal pads, nor did I trust myself to reassemble the card as some teardown resources I had come across showed the majority of the cooling solution was bolted to the backplate. The entire thing would need to be disassembled and then put back together, so a simpler solution was needed.
I had to create far more surface area to present to the passing air, so I bought 3 packages of 8 tiny copper heatsinks for a total of 24. The way they're cut, the surface area they add is many times the surface area they cover. They come with pre-applied thermal adhesive and all I had to do was peel the stickers off the back and stick them on. While there are many other heatsinks for sale made of all manner of materials, copper will be the most effective as it has a very high thermal conductivity. 231 BTU/(hr·ft⋅°F), which is nearly twice that of aluminum's 136. Copper is heavier than aluminum though, so if you aren't already, you'll want to use a GPU support to prevent any sag that may damage your motherboard's PCI-E socket.
Also, the type of heatsink you use is important. I chose the short heatsinks I did because they didn't have fins. Fins require directional air movement and if you hit them with air from the wrong angle, like from the side, you're losing alot of heat transfer. By not blowing between the fins, they end up just trapping heat instead. The design I chose works from any angle, and with a fan positioned directly over them, it's difficult to tell exactly where the air will go, so it's best to make any direction viable.
After some fretting and much time spent with a flashlight between the PCB and the backplate so I could see the actual locations of the relevant components, I was able to strategically position them directly over the VRAM modules and the strips of VRMs to the left and right of the center die. They immediately heated up to an uncomfortable temperature while I was still tweaking and adjusting their positions so the heat transfer is very efficient. I'd recommend working quickly, or with the pc/miner turned off.
You also want a high CFM fan to move air over the heatsinks with a large enough gap in between to allow the hot air to escape effectively if you're in a push configuration, blowing air down onto the heatsinks. If the fan is too close, all it will do is create turbulence and trap the heat. In a pull configuration, however, pulling air up and away from the backplate, you want the fan as close to the heatsinks as possible, if not resting directly on top of them. This way you're drawing air through them rather than from directly above them and then exhausting it straight up and out of the case. The heatsinks will never get hot enough to damage the plastic of the fan housing.
In testing, a pull configuration is far more efficient for the rest of the pc as well. Blasting air directly onto the heatsinks creates a 360 degree heat exhaust that heats up M.2 drives, chipset, and even the side wall of the case. Those hotspots build up and begin to radiate right back at the gpu. I'd also say a good percentage of the heat is then recycled right back into the GPU's own cooling fans, heating up the PCB even further. Up and away seems to be the best option.
Good case exhaust fans are a must as well, preferably out the top, so the heat you're removing from the backplate isn't recycled. Here is the final setup for this solution in a push configuration.
As said earlier, in pull, the gap would be much smaller and the fan would be right on the heatsinks. This is the setup I would recommend for a closed case.
As for the results, I was able to lower my GPU fan speed to 80% from 90, lower my case fan speed to 60% from 90, and up my memory overclock to 1200 from 1000. The biggest change though, was reducing the overall racket my case fans had to make to keep this card from combusting. The 3090's fans at 90% can become high pitched and quite loud, and just a reduction to 80% made a very noticeable difference. My original temperatures for the "GPU Memory Junction Temperature" as reported by the newest version of HWiNFO64 v6.42 were sitting at 110c and the GPU would throttle until it hit 108, then heat up again and repeat. But now with this heatsink/fan mod, the temps hover around 98c. That's a huge difference by any measurement, but it becomes moreso when you take into account the fan speed reduction and the increased overclock from 1000 to 1200. With the original fan speeds and overclock applied, the VRAM temps sit at 90c with the new backplate cooling and I'd say a 20c drop is quite the achievement from what's essentially a $30 fix with the additional fan included.
As you can see, after mining for almost 12 hours with the new overclock and fan settings, the maximum temperature reached was just 100 now with the average being several degrees lower. The T-sensor reading is from the underside of the backplate.
This may not solve your overheating issues, but it's certainly an easily implemented idea for anyone else who has a 3090 and sees it thermal throttling at room temperature. This can also be done without voiding your warranty, and as expensive and rare as GPUs have become, you really want to keep that intact.
This article has been featured in several Youtube videos and referenced in forums across the web in the past few months. Thank you everyone! Here are a few videos featuring this method.
Exploring the Nvidia RTX 3090 VRAM Overheating Issue by Kevin Muldoon
Keeping Your RTX 3000 Series Graphics Card VRAM Temps Cool While GPU Mining by Commando Brown