Hacker News

One direction to venture would be running rsyslog on every node, using regex to match all the known patterns and use various plugins/addons to send all the applications to the local rsyslog instance using a local spooler and then encrypt the rsyslog upstream to centralized logging servers. Rsyslog supports using a spooler so that if the up-stream server is offline for whatever reason the logs are spooled locally and then resume when upstream is online.

Regex matching on logs is slow but if performed on every node the CPU load is distributed vs. doing this upstream. Configuration management can push the regex rules to all the nodes. This won't help with unknown-unknowns but those can be added quickly to all nodes through configuration management after peer review.

Rsyslog also supports encrypting the log stream so that secret leakage is limited to the sending nodes and the central nodes and it checks a few boxes.

Another thing that helps is limiting to warn and above sent upstream and using an agent on the local nodes to monitor for keywords in the range of info to debug to let someone know to go check the node logs. Less junk on the centralized servers that may have SOC1/SOC2/PCI/FEDRAMP log retention requirements. One can not leak what is not sent in the first place.