>It is where I attach a debugger, it is where I install iotop and use it for the first time. It is where I cat out mysterious /proc and /sys values to discover exotic things about cgroups I only learned about 5 minutes prior in obscure system documentation.
It is, SSH is indeed the tool for that, but that's because until recently we did not have better tools and interfaces.
Once you try newer tools, you don't want to go back.
Here's the example of my fairly recent debug session:
- Network is really slow on the home server, no idea why
- Try to just reboot it, no changes
- Run kernel perf, check the flame graph
- Kernel spends A LOT of time in nf_* (netfilter functions, iptables)
- Check iptables rules
- sshguard has banned 13000 IP addresses in its table
- Each network packet travels through all the rules
- Fix: clean the rules/skip the table for established connections/add timeouts
You don't need debugging facilities for many issues. You need observability and tracing.Instead of debugging the issue for tens of minutes at least, I just used observability tool which showed me the path in 2 minutes.
See I would not reboot the server first before figuring out what is happening. You lose a lot of info by doing that and the worst thing that can happen is that the problem goes away for a little bit.
To be fair, turning it off and on again is unreasonably effective.
I recently diagnosed and fixed an issue with Veeam backups that suddenly stopped working part way through the usual window and stopped working from that point on. This particular setup has three sites (prod, my home and DR), and five backup proxies. Anyway, I read logs and Googled somewhat. I rebooted the backup server - no joy, even though it looked like the issue was there. I restarted the proxies and things started working again.
The error was basically: there are no available proxies, even though they were all available (but not working but not giving off "not working" vibes).
I could bother with trying to look for what went wrong but life is too short. This is the first time that pattern has happened to me (I'll note it down mentally and it was logged in our incident log).
So, OK, I'll agree that a reboot should not generally be the first option. Whilst sciencing it or nerding harder is the purist approach, often a cheeky reboot gets the job done. However, do be aware that a Windows box will often decide to install updates if you are not careful 8)
No, you didn’t diagnose and fix and issue.
You just temporarily mitigated it.
My job as a DevOps engineer is to ensure customer uptime. If rebooting is the fastest, we do that. Figuring out the why is the primary developers’ jobs.
This is also a good reason to log everything all the time in a human readable way. You can get services up and then triage at your own pace after.
My job may be different than other’s as I work at an ITSP and we serve business phone lines. When business phones do not work it is immediately clear to our customers. We have to get them back up not just for their business but for the ability for them to dial 911.
most failstates arent worth preserving in a SMB environment. In larger environments or ones equipped for it a snapshot can be taken before rebooting- should the issue repeat.
Once is chance, twice is coincidence, three times makes a pattern.
Alternatively, if it doesn't happen again it's not worth fixing, if it does happen again then you can investigate it when it happens again.
I've debugged so many issues in my life that sometimes I'd prefer things to just work, and if reboot helps to at least postpone the problem, I'd choose that :D
I fail to understand how your approach is different to your parent.
perf is a shell tool. iptables is a shell tool. sshguard is a log reader and ultimately you will use the CLI to take action.
If you are advocating newer tools, look into nft - iptables is sooo last decade 8) I've used the lot: ipfw, ipchains, iptables and nftables. You might also try fail2ban - it is still worthwhile even in the age of the massively distributed botnet, and covers more than just ssh.
I also recommend a VPN and not exposing ssh to the wild.
Finally, 13,000 address in an ipset is nothing particularly special these days. I hope sshguard is making a properly optimised ipset table and that you running appropriate hardware.
My home router is a pfSense jobbie running on a rather elderly APU4 based box and it has over 200,000 IPs in its pfBlocker-NG IP block tables and about 150,000 records in its DNS tables.
>perf is a shell tool. iptables is a shell tool. sshguard is a log reader and ultimately you will use the CLI to take action.
Well yes, and to be honest in this case I did that all over SSH: run `perf`, generate flame graph, copy the .svg to the PC over SFTP, open it in the file viewer.
What I really wanted is a web interface which will just show me EVERYTHING it knows about the system in a form of charts, graphs, so I can just skim through it and check if everything allright visually, without using the shell and each individual command.
Take a look at Netflix presentation, especially on their web interface screenshots: https://archives.kernel-recipes.org/wp-content/uploads/2025/...
>look into nft - iptables is sooo last decade
It doesn't matter in this context: iptables is using new netfilter (I'm not using iptables-legacy), and this exact scenario is 100% possible with native netfilter nft.
>Finally, 13,000 address in an ipset is nothing particularly special these days
Oh, the other day I had just 70 `iptables -m set --match-set` rules, and did you know how apparently inefficient source/destination address hashing algorithm for the set match is?! It was debugged with perf as well, but I wish I just had it as a dashboard picture from the start.
I'm talking about ~4Gbit/s sudden limitation on a 10Gbit link.
"What I really wanted is a web interface which will just show me EVERYTHING it knows about the system in a form of charts, graphs, so I can just skim through it and check if everything allright visually, without using the shell and each individual command."
Yes, we all want that. I've been running monitoring systems for over 30 years and it is quite a tricky thing to get right. .1.3.1.4.1.33230 is my company enterprise number, which I registered a while back.
The thing is that even though we are now in 2026, monitoring is still a hard problem. There are, however, lots of tools - way more than we had in the day but just like a saw can rip your finger off instead of cutting a piece of wood, well I'm sure you can fill in the blanks.
Back in the day we had a thing called Ethereal which was OK and nearly got buried. However you needed some impressive hardware to use it. Wireshark is a modern marvel and we all have decent hardware. SNMP is still relevant too.
Although we have stonking hardware these days, you do also have to be aware of the effects of "watching". All those stats have to be gathered and stashed somewhere and be analysed etc. That requires some effort from the system that you are trying to watch. That's why things like snmp and RRD were invented.
Anyway, it is 2026 and IT is still properly hard (as it damn well should be)!
>Oh, the other day I had just 70 `iptables -m set --match-set` rules, and did you know how apparently inefficient source/destination address hashing algorithm for the set match is?! It was debugged with perf as well!
>I'm talking about ~4Gbit/s sudden limitation on a 10Gbit link.
I think you need to look into things if 70 IPs in a table are causing issues, such that a 10Gb link ends up at four Gb/s. I presume that if you remove the ipset, that 10Gb/s is restored?
Testing throughput and latency is also quite a challenge - how do you do it?
How did you use tracing to check the current state of a machine’s iptables rules?
In this case I used `perf` utility, but only because the server does not have a proper observability tool.
Take a look at this Netflix presentation, especially on the screenshots of their web interface tool: https://archives.kernel-recipes.org/wp-content/uploads/2025/...
That is a command line tool run over ssh. If you have invented a new way to run command line tools, that’s great (and very possible, writing a service that can fork+exec and map stdio), but it is the equivalent to using ssh. You cannot run commands using traces.
With that mindset anything is equivalent to ssh. The command line is not the pinnacle of user interfaces and giving admins full control of the machine isn't the pinnacle of security either.
We need to accept that UNIX did not get things right decades ago and be willing to evolve UX and security to a better place.
Happy to try an alternative. Traces I have tried, and it is not an alternative.