> since if the "main" kernel crashes or is supposed to get upgraded then you have to hand hardware back to it.

Isn't that similar to starting up from hibernate to disk? Basically all of your peripherals are powered off and so probably can not keep their state.

Also you can actually stop a disk (member of a RAID device), remove the PCIe-SATA HBA card it is attached to, replace it with a different one, connect all back together without any user-space application noticing it.

I trust hardware to mostly be reasonable when starting from off, but we're discussing the case where it's on and stays on but gets handed from one kernel to another and I don't trust it nearly as well in that case. I think the comparison is kexec rather than hibernate, and while it often works, kexec can result in misbehaving hardware.

Many peripherals have a mechanism to reset the device, to get it back to a known good state. Generally device drivers will do this when they receive a message they don't understand from the device, or a command sent to the device times out without response.

Here's my graphics chip getting reset:

  [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
  amdgpu 0000:c6:00.0: amdgpu: MODE2 reset
  amdgpu 0000:c6:00.0: amdgpu: GPU reset succeeded, trying to resume
[deleted]