I’ve spent weeks searching for an answer and trying different fixes, but at best I’ve reduced the frequency of it happening and even that I’m dubious of, since it seems so random.
-
journalctl has absolutely nothing at all from when it happens, except one time where it managed to log that the kernel lost contact with the GPU in the seconds before the system went down - after undervolting and underclocking the GPU that message hasn’t happened since.
-
there’s no crash log from it either.
-
memtest declared there were no problems with the RAM.
-
I’ve been watching sysinfo and corectrl like a hawk and CPU, RAM, and VRAM usage is all well within normal levels when it happens, temperatures are low across the board.
-
the same system has been 100% stable and completely fine running under heavier load for hours at a time in windows.
-
I’ve followed AMD’s instructions for making sure the GPU drivers are what they should be for this, and the kernel is a version that’s supposed to be correct and stable for those drivers as well.
-
specific compatibility settings that other people found to fix literally this exact problem may have, at most, reduced the frequency of the crashes but again, they’re so erratic it’s almost impossible to determine cause and effect here.
-
I’ve tried disabling the integrated graphics both in the BIOS and through settings, because that can apparently cause instability, but that hasn’t helped.
I don’t know what else to look at or try at this point.
It’s one of the most annoying thing I’ve ever dealt with, and that purely becaus of how badly it’s documented