I was skeptical of the claim that it's faster than traditional SSH, but the README specifies that it is faster at establishing a connection, and that active connections are the same speed. That makes a lot of sense and seems like a reasonable claim to make.

It is not faster in this sense. However, an SSH connection can have multiple substreams, especially for port forwarding. Over a single classical connection, this can lead to head-of-line blocking, where an issue in one stream slows everything down. QUIC/HTTP3 protocol can solve this.

Does this implementation do that do, or does it just use a single h3 stream?

The answer is yes according to code and documentation [0]:

> The stream multiplexing capabilities of QUIC allow reducing the head-of-line blocking that SSHv2 encounters when multiplexing several SSH channels over the same TCP connection

....

> Each channel runs over a bidirectional HTTP/3 stream and is attached to a single remote terminal session

[0] https://www.ietf.org/archive/id/draft-michel-remote-terminal...

Fun fact: SSH also supports multiple streams. It's called multiplexing.

Multiple streams at the application level, which can be head-of-line blocked due to all being multiplexed on the same transport layer connection.

The former kind of multiplexing addresses functionality, the latter performance.

Doesn't it run over a single TCP connection in all cases, unless you manually launch multiple and manually load-balance your clients across is? As in, it won't/can't open a new TCP connection when you open a new connection in the SOCKS proxy or port forward. They'll all share one head-of-line and block each other

Not that I've ever noticed this being an issue (no matter how much we complain, internet here is pretty decent)

Edit: seeing as someone downvoted your hour-old comment just as I was adding this first reply, I guess maybe they 'voted to disagree'... Would be nice if the person would comment. It wasn't me anyway

[deleted]

Although, dollars-to-donuts my bet is that this tool/protocol is much faster than SSH over high-latency links, simply by virtue of using UDP. Not waiting for ack's before sending more data might be a significant boost for things like scp'ing large files from part of the world to the another.

SSH has low throughput on high latency links, but not because it uses TCP. It is because SSH hardcodes a too-small maximum window size in its protocol, in addition to the one of TCP.

This SSH window size limit is per ssh "stream", so it could be overcome by many parallel streams, but most programs do not make use of that (scp, rsync, piping data through the ssh command), so they are much slower than plain TCP as measured eg by iperf3.

I think it's silly that this exists. They should just let TCP handle this.

> I think it's silly that this exists. They should just let TCP handle this.

No, unfortunately it'snecessary so that the SSH proocol can multiplex streams independently over a single established connection.

If one of the multiplexed streams stalls because its receiver is blocked or slow, and the receive buffer (for that stream) fills up, then without window-based flow control, that causes head-of-line blocking of all the other streams.

That's fine if you don't mind streams blocking each other, but it's a problem if they should flow independently. It's pretty much a requirement for opportunistic connection sharing by independent processes, as SSH does.

In some situations, this type of multiplexed stream blockiing can even result in a deadlock, depending on what's sent over the streams.

Solutions to the problem are to either use window-based flow control, separate from TCP,, or to require all stream receive buffers to expand without limit, which is normally unacceptable.

HTTP/2 does something like this.

I once designed a protocol without this, thinking multipexing was enough by itself, and found out the hard way when processes got stuck for no apparent reason.

Then:

* Give users a config options so I can adjust it to my use case, like I can for TCP. Don't just hardcode some 2 MB (which was even raised to this in the past, showing how futile it is to hardcode it because it clearly needs adjustments to people's networks and and ever-increasing speeds). It is extremely silly that within my own networks, controlling both endpoints, I cannot achieve TCP speeds over SSH, but I can with nc and a symmetric encryption piped in. It is silly that any TCP/HTTP transfer is reliably faster than SSH.

* Implement data dropping and retransmissions to handle blocking -- like TCP does. It seems obviously asking for trouble to want to implement multiplexing, but then only implement half of the features needed to make it work well.

When one designs a network protocol, shouldn't one of the first sanity checks be "if my connection becomes 1000x faster, does it scale"?

Yeah, the longstanding hpn-ssh fork started off by adjusting ssh’s window sizes for long fat pipes.

https://github.com/rapier1/hpn-ssh

You're mixing application layer multiplexing and transport layer multiplexing.

If you use the former without the latter, you'll inevitably have head-of-line blocking issues if your connection is bandwidth or receiver limited.

Of course not every SSH user uses protocol multiplexing, many do, as it can avoid repeated and relatively expensive (terms of CPU, performance, and logging volume) handshakes.

Off the top of your head do you know of any file transfer tools that do utilize multiple streams?

Yes, I wrote down some that do and don't support it here:

https://github.com/libfuse/sshfs/issues/300

I tend to use 'rclone', does SSH/more. The '--transfers' arg is useful for handling several files, lol. One, if I recall correctly, isn't parallelized.

That's not really a common TCP problem. Only when there's something severely weird going on in the return path (e.g. an extremely asymmetric and/or congested return path connection dropping ACKs while the forward path has enough capacity) does the ACK mechanism limit TCP.

Also, HTTP/3 must obviously also be using some kind of acknowledgements, since for fairness reasons alone it must be implementing some congestion control mechanism, and I can't think of one that gets by entirely without positive acknowledgements.

It could well be more efficient than TCP's default "ack every other segment", though. (This helps in the type of connection mentioned above; as far as I know, some DOCSIS modems do this via a mechanism called "ack compression", since TCP is generally tolerant of losing some ACKs.)

In a sense, the win of QUIC/HTTP/3 in this sense isn’t that it’s not TCP (it actually provides all the components of TCP per stream!); it’s rather that the application layer can “provide its own TCP”, which might well be more modern than the operating system’s.

Yeah, there’s a replacement for scp that uses ssh for setup and QUIC for bulk data transfer, which is much faster over high-latency paths.

https://github.com/crazyscot/qcp

That's why mosh exists, as it is purpose built for terminals over high latency / high packet loss links.

But mosh doesn't actually do any of what ssh does, let alone do it faster - it wins by changing the problem, to the vastly narrower one of "getting characters in front of human eyeballs". (Which is amazing if that's what you were trying to do - but that has nothing to do with multiple data streams...)

mosh is hard to get into. There are many subtle bugs; a random sample that I ran into is that it fails to connect when the LC_ALL variables diverge between the client and the server[0]. On top of it, development seems abandoned. Finally, when running a terminal multiplexer, the predictive system breaks the panes, which is distracting.

[0]: https://github.com/mobile-shell/mosh/issues/98

Of course it has ACKs. There are protocols without ACKs but they are exotic and HTTP3 is not one of them.

He said not waiting for ACKs.

That makes even less sense, unless we are talking about XMODEM every protocol uses windowing to avoid getting stuck waiting for ACKs.

Of course you need to wait for ACKs at some point though, otherwise they would be useless. That's how we detect, and potentially recover from, broken links. They are a feature. And HTTP3 has that feature.

Is it better implemented than the various TCP algorithms we use underneath regular SSH? Perhaps. That remains to be seen. The use case of SSH (long lived connections with shorter lived channels) is vastly different from the short lived bursts of many connections that QUIC was intented for. My best guess is that it could go both ways, depending on the actual implementation. The devil is in the details, and there are many details here.

Should you find yourself limited by the default buffering of SSH (10+Gbit intercontinental links), that's called "long fat links" in network lingo, and is not what TCP was built for. Look at pages like this Linux Tuning for High Latency networks: https://fasterdata.es.net/host-tuning/linux/

There is also the HPN-SSH project which increases the buffers of SSH even more than what is standard. It is seldom needed anymore since both Linux and OpenSSH has improved, but can still be useful.

> Is it better implemented than the various TCP algorithms we use underneath regular SSH? Perhaps. That remains to be seen.

SSH multiplexes multiple channels on the same TCP connection which results in head of line blocking issues.

> Should you find yourself limited by the default buffering of SSH (10+Gbit intercontinental links), that's called "long fat links" in network lingo, and is not what TCP was built for.

Not really, no. OpenSSH has a 2 MB window size (in the 2000s, 64K), even with just ~gigabit speeds it only takes around 10-20 ms of latency to start being limited by the BDP.

Well, you could peruse the code. Then see what it does and explain it.

Not really that relevant - anybody regularly using SSH over high latency links is using SSH+mosh already anyway.

The huge downside of mosh is it handles its own rendering and destroys the scrollback buffer. (Yes I know I can add tmux for a middle ground.)

But it's still irrelevant here; specifically called out in README:

> The keystroke latency in a running session is unchanged.

"huge downside" (completely mitigated by using tmux)

The YouTube and social media eras made everyone so damn dramatic. :/

Mosh solves a problem. tmux provides a "solution" for some that resolves a design decision that can impact some user workflows.

I guess what I'm saying here, is it you NEED mosh, then running tmux is not even a hard ask.

No it’s not completely mitigated by tmux. mosh has two main use cases (that I know of)

1. High latency, maybe even packet-dropping connections;

2. You’re roaming and don’t want to get disconnected all the time.

For 2, sure tmux is mostly okay, it’s not as versatile as the native buffer if you use a good terminal emulator but whatever. For 1, using tmux in mosh gives you an awful, high latency scrollback buffer compared to the local one you get with regular ssh. And you were specifically taking about 1.

For read-heavy, reconnectable workloads over high latency connections I definitely choose ssh over mosh or mosh+tmux and live with the keystroke latency. So saying it’s a huge downside is not an exaggeration at all.

I believe this depends on the intent of your connection!. The first sentence of your last paragraph: "For read-heavy, reconnectable workloads" - A-ha!

From my stance, and where I've used mosh has been in performing quick actions on routers and servers that may have bad connections to them, or may be under DDoS, etc. "Read" is extremely limited.

So from that perspective and use case, the "huge downside" has never been a problem.

Honestly, it feels like the one being dramatic here is you. Because the one you’re replying to added “huge”, you added a whole sentence calling everyone “so damn dramatic”. But oh well.

You know what has a "huge downside"? Radiation therapy.

Not a scroll back buffer workflow issue.

If you believe that, you clearly haven't had to work with mosh in a heavily firewalled environment.

Filtering inbound UDP on one side is usually enough to break mosh, in my experience. Maybe they use better NAT traversal strategies since I last checked, but there's usually no workaround if at least one network admin involved actively blocks it.

SSH is actually really slow on high latency high bandwidth links (this is what HPN-SSH patches fix: https://www.psc.edu/hpn-ssh-home/hpn-ssh-faq). It's very apparent if you try running rsync between two datacenters on different contients.

HTTP/3 (and hopefully this project) does not have this problem.

Sounds like a complex change to fix a security protocol but, reading the page, it seems to just increase the send buffer, which indeed makes sense for high-latency links

It also tracks with HTTP/3 and QUIC as a whole, as one of the main "selling points" has always been reduced round trips leading to faster connection setup.

If by being faster at making a connection it would reduce latency even if a little, it would mean a really big improvement for other protocols built on top of it like rsync. If Rsync reuses an active connection to stream the files and calculate changes then the impact might be negligible.

openssh is generally not praised for its speed but its security track record. i hope this thing doesnt sacrefice it for a little more speed in something that generally doesn't require more speed..

Should be genuinely faster over many VPNs, because it avoids the "TCP inside TCP" tar pit.

I read this and thought “who cares”?

I use ssh everywhere, maybe establish 200+ SSH sessions a day for my entire career of 20 years and never once have I thought “I wish establishing this connection was faster”

Good for you.

There are a lot of automation use cases for SSH where connection setup time is a significant impediment; if you’re making dozens or hundreds of connections to hundreds or thousands of hosts, those seconds add up.

[deleted]