Hacker News

Bender 4 hours ago [ - ]

Managed over 50k servers with zero swap. Set overcommit ratio to 0, min_free configured based on a Redhat formula and had application teams keep some memory free. Adjust oom scores at application startup especially for database servers where panic is set to 0.

Servers ranged from 144GB ram to 3TB ram and that memory is heavily utilized. On servers meant to be stateless app and web servers panic was set to 2 to reboot on oom which mostly occurred in the performance team that were constantly load testing hardware and apps and a few dev machines were developers were not sharing nicely. Engineered correctly OOM will be very rare and this only gets better with time as applications have more controls over memory allocation and other tools like namespaces/cgroups. Java will always leak, just leave more room for it.

anyfoo 2 hours ago [ - ]

There's a chance that those servers might run more efficiently with some swap space, for the reasons mentioned many times in this thread. Swap space is not just for overcommitting.

Bender 2 hours ago [ - ]

The theories are repeated a often but I have never seen any empirical data to back it up assuming one is setting the options I mentioned. These anecdotes usually come from servers with default settings and no attempt to tune them for the intended workloads and no capacity planning for application resources. Even OS maintainers are starting to recognize this and have created daemons such as tuned for the people that never touch settings. The next evolution will be dynamic adjustments from continuous bpf traces. I just keep it simple and avoid the circular arguments all together.

anyfoo 2 hours ago [ - ]

Oh sure, it might or might not make a significant difference at all. Chances are, if you do a lot of I/O on a large (or very large) amount of data, and you also have a lot of rarely used but resident anonymous memory, then swap space should help, as that anonymous memory can get paged out in favor of disk cache, but I have no idea how common that is.

Bender 2 hours ago [ - ]

Yeah I mean, I know what you mean but this is where it gets into circular reasoning. I will always have operations groups move the workload to a node that has more memory if that is what is needed. In my case having swap on disk would require it to be encrypted due to contracts requiring any customer data touching a disk to be encrypted but I just avoid that all together and just add more memory. If 2TB or RAM isn't enough then they get 3TB and so on. We pushed vendors and OEM's to grow their motherboard capacity. At some point application groups just get more servers.

anyfoo an hour ago [ - ]

Yeah, that seems like a reasonable approach for your case!