GPU overheating after recent updates

GPU overheating after recent updates

in Account & Technical Support

Posted by: fluffdragon.1523

fluffdragon.1523

The game has been crashing my system for about 7 days now. In particular, I went from having no atypical performance on Jan 31st to the game completely crashing my system within 10-15 minutes of regular gameplay on Feb 1st and thereafter (original failure was around midnight).

The crashes are not accompanied by a BSOD, pop-up to alert to a C++ memory or access violation, nor any kind of Windows or software notification of a driver, disk, or other failure.

I’ve run enough diagnostics and repairs — and effectively reinstalled GW2 twice, now — to ascertain that it is absolutely not Windows mucking things up. All files are repaired, all data checks out, and all drivers are up to date with the latest stable versions. In fact, the Nvidia 332.21 drivers were working flawlessly for about five days prior.

Suspecting the GPU or the game itself, I finally got around to doing an actual hardware test this morning. The results were concerning:

Idling on the hill above the Ascalon Settlement landmark of Gendarran Fields, my frame rate at max settings (limit of 60 with VSync) came to stabilized at approximately 50fps. This was also accompanied by the temperature of my GPU reaching a plateau at 96°C, while ambient case temps and CPU remained no greater than 50°C throughout.

I am currently running what should be a base model EVGA NVidia GTX 660 card, which lists its maximum safe operating temperature as 97°C [ Source ].

During the aforementioned test, the card read at 100°C, and resulted in the system seizing within a second or two. I suspect an internal panic occurred that shutdown the card — and simultaneously took out the system.

I would like to specify the following:

  • No settings have been changed since I resumed playing in early January.
  • I am using the Frame Limiter set to 60fps, have enabled VSync, and have the adaptive power saving mode enabled in the NVidia control panel — none of these options have been altered.
  • I had been playing using these same settings throughout the month with no problems nor indications of degraded performance.
  • There is no indication of card failure, nor associated graphical artifacts, flaws, or glitches that would accompany prolonged overheating or failure.
  • No other game seems able to cause my GPU to act this way.

Serious considerations:

  • The case has been cleaned about 3 times during the days affected. Before this morning’s test I even aired out the GPU itself.
  • The GPU heatsink fins are visibly clear, and the fan shows no accumulated residue after cleaning.
  • Fan activity appears normal, and spikes in temperature on the GPU are linear to the framerate — a framerate of ~30fps corresponds to ~65°C, while 50fps corresponds to +94°C.
  • There are no load or disk failures, no driver conflicts, and the crashes only occur once inside the game; while the character selection menu (or loading screen) runs at 50fps, GPU temp and activity rarely exceed 50-60°C.
  • No other game I’ve tested with insofar is capable of reproducing these conditions.

This is the part that I do not understand, that in less than 24 hours I went from a perfectly working game (if heavy load) to something that’s incapable of running for more than 10 to 15 minutes while idling in a relatively empty area without causing overheating.

As of now, I have downloaded the EVGA Precision X tool and given it a more aggressive fan rate curve. I will be testing to see if there are issues with the physical hardware.

GPU overheating after recent updates

in Account & Technical Support

Posted by: dodgycookies.4562

dodgycookies.4562

It could be your psu’s self protection shut down. which could be caused by the gpu boost tech nvidia has. either the temp target was set too high or is being ignored for the power target? or perhaps you are using a fps target and the card is trying to meet that despite the cpu bottleneck? because crashes that arent accompanied by anything else sounds like your card while trying to overclock itself is overloading your psu and its shutting its self down.

Nvidia cards should revert to safemode when it hits a thermal limit so even if your card is super overheating it should never just stop working.

first i would uninstall and remove all the overclock software you have (including saved settings), then reinstall the driver newest beta driver to see if it got corrupted randomly.

Also post your system specs (especially psu) and cpu/gpuz pics?

[ICoa] Blackgate

GPU overheating after recent updates

in Account & Technical Support

Posted by: fluffdragon.1523

fluffdragon.1523

Update #1:

Additional testing has been done. Looks like neither Precision X nor Afterburner are capable of overriding the hard cap of 74% fan speed. Screen capture is provided of testing the system after forcing the card to 50% power target. Even after only a few minutes, we’ve reached GPU temps of 80°C.

The second screenshot illustrates the absurd heat gain when running at standard specs of 100% power target. After less than one-third of the time, temperatures exceeded 94°C.

Once again, I would like to specify that the maximum safe operating temperature of the GPU is 97°C.

These findings are the same as those found while monitoring using the independent SpeedFan utility.

Attached is a renewed copy of the dxdiag output for review. In clarification to a previous poster’s query, I’ve built this system on a 700W PSU that has experienced no problems in operation nor running the game before the Feb 1st failure. Furthermore, no overclocking has been used at any time.

GPU overheating after recent updates

in Account & Technical Support

Posted by: SolarNova.1052

SolarNova.1052

This is likely a simple hardware issue.

Since its EVGA u need not worry about warranty, they have THE BEST warranty and RMA service of all computer component manufactures.
So…

Take the card out, remove the shroud, remove the heat sink and fan, complexly disassemble the cooling system basically. Then clean it out (there may be hidden dust, special if its a blower type cooler, not so much if its EVGA’s ACX cooler).
hen remove the TIM on the GPU and clean the GPU itself, then re apply some new TIM and re-assemble and attached the cooler, making sure to remember to plug the fan control cable back into the card.
Test it out, if it still overheats, contact EVGA for RMA.

As to ‘why’ its started recently, its likely that ether its always been running hot and only recently has it degraded to the point where its hitting a hard shutdown temperature (100c), or it was running perfectly fine and cool before and has recently developed a fault. What ever the reason just remember that no game can directly cause a card to overheat, it can only do so indirectly via GPU load, and even then it still points to a hardware fault as all graphics cards are designed to keep within operating temperatures when at full load. The only software that can cause overheating is fan control and voltage control software.

3930k 4.6ghz | NH-D14 Cooler | P9x79 Pro MB | 16gb 1866mhz G.Skill | 128gb SSD + 2×500gb HDD
EVGA GTX 780 Classified w/ EK block | XSPC D5 Photon 270 Res/Pump | NexXxos Monsta 240 Rad
CM Storm Stryker case | Seasonic 1000W PSU | Asux Xonar D2X & Logitech Z5500 Sound system |

(edited by SolarNova.1052)

GPU overheating after recent updates

in Account & Technical Support

Posted by: dodgycookies.4562

dodgycookies.4562

when your power limit is set to 50% the power usage is still 77%. The max power usage on that msi chart is says its 115% which will overheat your card especially if it’s a reference version with the blower style fan.

i still think its a problem with the gpu boost tech, which could be fixed with a clean driver install, but if not rma to evga.

also dont be so sure its not a psu problem either. psus do degrade and not all 700w units are of the same quality and reliablilty.

[ICoa] Blackgate

GPU overheating after recent updates

in Account & Technical Support

Posted by: fluffdragon.1523

fluffdragon.1523

Update #2

Did a test with the latest NVidia Beta drivers (Rev 334.67), which has the same problems. It’s definitely not the drivers.

As for removing the enclosure on the GPU — and then dismantling the GPU itself — wouldn’t that void the warranty, assuming one is still active after some 2+ years?

I mean, I’m totally willing to make sure that there isn’t any dust or broken heat-pipes in there, but I’d rather not be out the $200 for a replacement if I could still RMA the thing.

when your power limit is set to 50% the power usage is still 77%. The max power usage on that msi chart is says its 115% which will overheat your card especially if it’s a reference version with the blower style fan.

i still think its a problem with the gpu boost tech, which could be fixed with a clean driver install, but if not rma to evga.

also dont be so sure its not a psu problem either. psus do degrade and not all 700w units are of the same quality and reliablilty.

You’ll also notice that the maximum fan speed percentage is arbitrarily 100%, which the GPU is incapable of doing. The graphs are for reference, not absolutes.

Attachments:

GPU overheating after recent updates

in Account & Technical Support

Posted by: dodgycookies.4562

dodgycookies.4562

at this point i would just open a ticket and ask for a rma, fairly painless process with evga. most evga cards should have 3 if not 5 year warranty coverage so you should be good there.

removing the cooler and fan doesn’t void the warranty as long as you don’t damage anything. the only condition is that the product must be sent back in “out of box” condition. Anything you do to the card in between that doesnt damage it is ok (like removing the cooler for a waterblock) but for rma you must reinstall the cooler when you send it back.

if you bought the advanced rma plan you won’t even have downtime.

http://www.evga.com/support/warranty/

[ICoa] Blackgate

GPU overheating after recent updates

in Account & Technical Support

Posted by: OGDeadHead.8326

OGDeadHead.8326

  • No other game I’ve tested with insofar is capable of reproducing these conditions.

It’s an hardware issue for sure, the most common cause would be dust. Also check your voltage.

Download and run OCCT and stress that card to see how it behaves outside games – http://www.ocbase.com/

Win10 pro | Xeon 5650 @ 4 GHz | R9 280x toxic | 24 Gig Ram | Process Lasso user

GPU overheating after recent updates

in Account & Technical Support

Posted by: fluffdragon.1523

fluffdragon.1523

Update #3 (from phone):

Heatsink and copper heat pipe are clear. Fins undamaged. Fan is free of dust and only negligible accumulation inside. Compressed air cleaned that easily.

Attached photo is of opened GPU. Looks like the thermal paste was bad to begin with and doesn’t fully contact the chip. Will try to replace.

RE: stress testing

Ran the Furmark benchmark a while earlier while simulating both prior scenarios (50% v. 100% power). Behaved almost exactly as GW2 does.

Standing idle temp is on average 38°C when idling on the Windows desktop.

Attachments:

GPU overheating after recent updates

in Account & Technical Support

Posted by: OGDeadHead.8326

OGDeadHead.8326

Any pictures of your entire case with the card installed?

Also, any difference if you run with your case open?

Win10 pro | Xeon 5650 @ 4 GHz | R9 280x toxic | 24 Gig Ram | Process Lasso user

GPU overheating after recent updates

in Account & Technical Support

Posted by: OGDeadHead.8326

OGDeadHead.8326

Apparently, 74% fan speed max is normal for these cards.

You could try undervolting the card as well.

Win10 pro | Xeon 5650 @ 4 GHz | R9 280x toxic | 24 Gig Ram | Process Lasso user

(edited by OGDeadHead.8326)

GPU overheating after recent updates

in Account & Technical Support

Posted by: SolarNova.1052

SolarNova.1052

Does the card have a single fan off-centre ? if so its likely a Nvidia default cooler, meaning its a blower style, u wont be able to see where the dust actually builds up unless u remove the plastic shroud and can se ALL the heatsink. As som1 else pointed out, it wont void the warrenty, EVGA are Very good with this, that u can do what everu like with the card so long as it is returned the way it was (minus the fault), u could even liquid nitrogen cool this thing, bench the hell out of it, ruin the GPU from overstressing and send it back and get an RMA (atleast u can with their Top cards (Classified and KingPin editions)). EVGA rock

i.e like this. (the picture u gave is a little blurry but if the Part Number is 02G-P4-2662-KR which I think I can make out, then it is indeed a blower style and the shroud would likely be concealing dust.
See this : http://www.youtube.com/watch?v=74DLrJSE3BI

If dust isn’t the issue and replacing the TIM doesn’t help….. RMA.

Attachments:

3930k 4.6ghz | NH-D14 Cooler | P9x79 Pro MB | 16gb 1866mhz G.Skill | 128gb SSD + 2×500gb HDD
EVGA GTX 780 Classified w/ EK block | XSPC D5 Photon 270 Res/Pump | NexXxos Monsta 240 Rad
CM Storm Stryker case | Seasonic 1000W PSU | Asux Xonar D2X & Logitech Z5500 Sound system |

(edited by SolarNova.1052)

GPU overheating after recent updates

in Account & Technical Support

Posted by: fluffdragon.1523

fluffdragon.1523

Update #4:

Reapplied new thermal paste and gave it a test run. Results are still nearly identical. What I find interesting is in the graph provided: even when the card kicks in the fan and attempts to throttle back to decrease heat, it almost immediately ramps back up again. Why this is, I don’t know, but the fan certainly isn’t idling back.

Each dip comes with a brief decrease in framerate, which makes sense if the controller is telling it to decrease power to lower temperature. The interesting part is that frame rate almost immediately reaches the limiter thereafter, and temperature readings spike back up.

This latest test was done with the Frame Limiter set to 30fps.

I’d also like to point out that it doesn’t seem that the frame limiter is being honored in the character selection screen…

Does the card have a single fan off-centre ? if so its likely a Nvidia default cooler, meaning its a blower style, u wont be able to see where the dust actually builds up unless u remove the plastic shroud and can se ALL the heatsink. (…..)

Yeah, that’s the one. I took the whole thing apart as suggested previously, and the interior was almost perfectly clean. Took some compressed air to it to blow out what little bits of dust had collected.

I’m thinking you guys might be right and I’ll need to RMA the thing. I’ve also noted that the fans aren’t kicking in fully until temperatures reach +85 C, which quite honestly is insane when that’s about 10 C below the safe limit.

Attachments:

(edited by fluffdragon.1523)

GPU overheating after recent updates

in Account & Technical Support

Posted by: TinkTinkPOOF.9201

TinkTinkPOOF.9201

If dust isn’t the issue and replacing the TIM doesn’t help….. RMA.

This.

Over heating is always going to be a HW issue assuming no OCing involved or really high ambient temps. Also, as stated Evga is very good about mods, all they care about is that the part be returned for RMA as it was sold. When I first started watercooling my cards I called their support and asked them if I were to remove the HS and put a waterblock on the card would it void the warranty, they said no, it would not, so long as the card was returned with the OEM HS, there would be no issue.

6700k@5GHz | 32GB RAM | 1TB 850 SSD | GTX980Ti | 27" 144Hz Gsync

GPU overheating after recent updates

in Account & Technical Support

Posted by: dodgycookies.4562

dodgycookies.4562

next time you buy a card, unless you plan to sli i would suggest a non reference style coolers with multi fans that blow into the case (ie acx windforce twinfrozr). You pay a slight premium. but imo its worth it for the lower temps as long as your case has good airflow.

i still think its a problem with gpu boost (which auto ocs…alot ie 200+ mhz in some cases ) maybe on the card bios or voltage regulator but eh shrug in the end its still a rma.

[ICoa] Blackgate

GPU overheating after recent updates

in Account & Technical Support

Posted by: Muhandias.7453

Muhandias.7453

I am fairly certain that this is not a hardware specific issue to one person as I have also noticed a spike in temperature on my GPU since the edge of the mists patch. It is only tied to this game, no other action on my computer even comes close to breaking 90°C on the graphics card.

I am using an AMD Radeon HD 5850 with the fan manually set to 100% in the Catalyst Control Center and yet the overheating is constant while playing Guild Wars 2.

(edited by Muhandias.7453)

GPU overheating after recent updates

in Account & Technical Support

Posted by: GOSTARG.5760

GOSTARG.5760

Hey there i had the same problem as u .. after the update the game crashes my system and was saying that the power supply was overheating or something .. so it rebooted my pc , and giving me an another startupscreen what ive never seen on my pc build ..
im not a wizkid as far at deep system specs . Ive had it right after the update .

So what i did is .. i started the game at its lowest specs run it and no crashes ..
While ingame after 10 min i set the game video specs to it highest .. closed the game rebooted my pc and started the game again .

And it never happend since .. i hope this helps for u
Ps : the game got crashing for 4 times after the update for me before i did what i said to u Greetz mark
holland ; (

(edited by GOSTARG.5760)

GPU overheating after recent updates

in Account & Technical Support

Posted by: GOSTARG.5760

GOSTARG.5760

Oh and i bought the game last sunday .. plays perfectly . After update it crashed my pc 3 times did the thing above .. solved for me amd 6800 cool edition and a i7 3770 costum build pc

GPU overheating after recent updates

in Account & Technical Support

Posted by: SolarNova.1052

SolarNova.1052

Again, software like games will NOT ‘directly’ cause hardware to overheat, no matter how many patches. The game access the GPU through the API* (DirectX9 in this case) it does not ‘directly’ control the card in any way. The only way it can cause higher heat is through higher usage, in which case it is NOT the game at fault if the hardware starts overheating, that’s a hardware issue….the card itself cannot handle running at high load…which it should be able to do if it was in full working order. Ether that or you have VERY bad case airflow.

*application programming interfaces

3930k 4.6ghz | NH-D14 Cooler | P9x79 Pro MB | 16gb 1866mhz G.Skill | 128gb SSD + 2×500gb HDD
EVGA GTX 780 Classified w/ EK block | XSPC D5 Photon 270 Res/Pump | NexXxos Monsta 240 Rad
CM Storm Stryker case | Seasonic 1000W PSU | Asux Xonar D2X & Logitech Z5500 Sound system |

GPU overheating after recent updates

in Account & Technical Support

Posted by: fluffdragon.1523

fluffdragon.1523

Hey guys, RMA is in progress and I’m waiting on the RMA number and shipping information as I type this.

I got around to putting Skyrim back on my machine for testing, and yeah, within a minute the GPU was bursting up to dangerous levels. It’s absolutely a hardware fault at this point. It’s possible the heat-pipe or something is damaged in a place I just plain can’t see, especially considering the thorough application of new thermal paste added the night before and the fact that the heatsink itself is otherwise clean and clear of obstructions. That, or one of the controller chips is haywire (which is equally likely, considering weird voltage behavior and fan operation).

Again, software like games will NOT ‘directly’ cause hardware to overheat, no matter how many patches. The game access the GPU through the API* (DirectX9 in this case) it does not ‘directly’ control the card in any way. The only way it can cause higher heat is through higher usage, in which case it is NOT the game at fault if the hardware starts overheating, that’s a hardware issue….the card itself cannot handle running at high load…which it should be able to do if it was in full working order. Ether that or you have VERY bad case airflow.

*application programming interfaces

The above is true, no doubt here. It also explains why of a sudden only games that would normally cause a heavy operating load would suddenly bring the GPU to dangerous levels — the hardware had simply failed in the meantime.

It’s really the only source of the problem in my case, especially consider that everything else is in working order and had been working just fine up until the hardware failed.

Anyway, thanks for all the help, everyone!