Chips age and fail with age. You can check hot-carrier injection, bias-temperature instability and electromigration as they are the main aging mechanisms. All if these are a linear function of time but exponentieal of temperature. 90-100C these chips are running at are really tough, so they are likely to fail at couple of percent to 10% range in 2-3 years depending on the margins they have in the design.
The solder joints are notorious to fail at a high rate too.
If those don't go the caps and coils will eventually.
those are easy and cheap to replace
Depends, the SMD caps spread across the board the tiny ones do start to fail and go out of spec over time. they are a right pain to replace and hard to spot one that has gone out of spec to cause the chip to start crashing.
Can you not just move the epxensive part (the gpu itself) to a new carrier board in that situation? Also isn't most of the cost of the GPU itself the design of the board, not actually making one, esp if you can move the heat sinks around?
"just"
BGA Reflow rework is not rocket science, How do you think the PCBA gets assembled in the first place? Its much easier if you dont care about the boards at all and with the huge die sizes on these accelerator chips its worth it to do a board swap
Not if you account for labour.
Caps also have a rapid aging with temp.