A good reason to health check the kubelet process and restart it when the checks fail.
What kind of health checks? In my case, the kubelet process was staying alive and responsive to queries, I believe due to:
# cat /proc/$(pgrep kubelet)/oom_score_adj -999 (from OOMScoreAdjust=-999 in /etc/systemd/system/kubelet.service)
At the metrics level, you can compare old vs new release. Have been bitten before by resource requirements dramatically change (regardless of whether it's a bug or functionality change)
What kind of health checks? In my case, the kubelet process was staying alive and responsive to queries, I believe due to:
With this score, the Linux OOM killer wouldn't touch it, but any of my Pods were fair game.At the metrics level, you can compare old vs new release. Have been bitten before by resource requirements dramatically change (regardless of whether it's a bug or functionality change)