Hacker News

A good reason to health check the kubelet process and restart it when the checks fail.

What kind of health checks? In my case, the kubelet process was staying alive and responsive to queries, I believe due to:

  # cat /proc/$(pgrep kubelet)/oom_score_adj
  -999
  
  (from OOMScoreAdjust=-999 in /etc/systemd/system/kubelet.service)

With this score, the Linux OOM killer wouldn't touch it, but any of my Pods were fair game.

nijave an hour ago [ - ]

At the metrics level, you can compare old vs new release. Have been bitten before by resource requirements dramatically change (regardless of whether it's a bug or functionality change)