For most any 5G network you should be safe to 1420 - 80 = 1340 bytes if using IPv6 transport or 1420 - 60 = 1360 bytes if using IPv4 transport.
For testing I recommend starting from 1280 as a "does this even work" baseline and then tweaking from there. I.e. 1280 either as the "outside" MTU if you only care about IPv4 or as the "inside" MTU if you want IPv6 to work through the tunnel. This leverages that IPv6 demands a 1280 byte MTU to work.
Hah! I just ran into this recently and can confirm. The coax to my DOCSIS ISP was damaged during a storm, which was causing upstream channels to barely work at all. (Amusingly, downstream had no trouble.) While waiting for the cable person to come around later in the week, I hooked my home gateway device up to an old phone instead of the modem. I figured there would be consequences, but surprisingly, everything went pretty smoothly... But my Wireguard-encapsulated connections all hung during the TLS handshake! What gives?
The answer is MTU. The MTU on my network devices were all set to 1500, and my Wireguard devices 1420, as is customary. However, I found that 1340 ( - 80) was the maximum I could use safely.
Wait, though... Why in the heck did that only impact Wireguard? My guess is that TCP connections were discovering the correct MSS value automatically. Realistically that does make sense, but something bothers me:
1. How come my Wireguard packets seemed to get lost entirely? Shouldn't they get fragmented on one end and re-assembled on the other? UDP packets are IP packets, surely they should fragment just fine?
2. Even if they don't, if the Linux TCP stack is determining the appropriate MSS for a given connection then why doesn't that seem to work here? Shouldn't the underlying TCP connection be able to discover the safe MSS relatively easily?
I spelunked through Linux code for a while looking for answers but came up empty. Wonder if anyone here knows.
My best guess is that:
1. A stateless firewall/NAT somewhere didn't like the fragmented UDP packets because it couldn't determine the source/dest ports and just dropped them entirely
2. Maybe MSS discovery relies on ICMP packets that were not able to make it through? (edit: Yeah, on second thought, this makes sense: if the Wireguard UDP packets are not making it to their destination, then the underlying encapsulated packets won't make it out either, which means there won't be any ICMP response when the TCP stack sends a packet with Don't Fragment set.)
But I couldn't find anything to strongly support that.
Basically the only parts of the Internet which actually work reliably, around the globe, are the bits needed so that web pages basically kinda work. If you break literally everything else your service is crap, and some customers might notice, but many won't and also some won't have a choice so, sucks to be them. But if you break the Web, now everybody notices that you broke stuff and they're angry.
This is why DoH (DNS over HTTPS) is a thing. It obviously makes no actual sense to use the web protocol to move DNS packet, but, this works and most things don't work for everybody so eh, this is what we have. Smashing the Path MTU discovery doesn't break the web.
Breaking literally everything so long as the web pages work even means you can't upgrade parts of the web unless you get creative. TLS 1.3 the modern security protocol that is used for most of your web pages today, would not work for most people if it admitted that it's TLS 1.3, if you send packets with TLS version 1.3 on them people's "intelligent" "best in classs security" protective garbage (in the industry we call these "middle boxes") thinks it is being attacked by some unknown and unimaginable dastardly foe and kills the data. So TLS 1.3 really, I am not making this up, always pretends it is a TLS 1.2 re-connection, and despite the fact that no such connection ever existed these same "best in class security" technologies just have no idea what's happening and wave it through. It's very very stupid that they do that, but it was needed to make the web work, which matters, whereas actual security eh, suckers already bought the device, who cares.
This situation is deeply sad but, one piece of good news is that while "This Iranian woman can't even talk confidentially to her own mother without using code words because the people in charge there intercept her communications" won't attract as much sympathy as you'd like from some bearded white guy who has never left Ohio, the fact that those people broke his network protocol to do that interception infuriates him, and he's well up for ensuring they can't do that to the next version.
Your ultimate conclusion is correct, to my understanding. I know wireguard sought to be ultra minimal but I do wish they had included DPLPMTUD as something which is required to be supported (but not mandated to be used e.g. if the user wants to hard set it as they would currently) because it's one of those cases where "do it yourself separately the UNIX way™" or "have the tunneled things do it if they need it" instead are both significantly more complex and fragile.
On that note, from the TCP layer it should just look like an ICMP blackhole, which makes me wonder if enabling `net.ipv4.tcp_mtu_probing` will magically make TCP connections under Wireguard work even with the MTU set wrong. I'd try it, but unfortunately with a similar configuration I am unable to get the fragmentation behavior I was getting before; which makes me wonder if it was my UniFi Security Gateway that actually didn't like the fragmented packets.
Why 1320 and not larger?
For most any 5G network you should be safe to 1420 - 80 = 1340 bytes if using IPv6 transport or 1420 - 60 = 1360 bytes if using IPv4 transport.
For testing I recommend starting from 1280 as a "does this even work" baseline and then tweaking from there. I.e. 1280 either as the "outside" MTU if you only care about IPv4 or as the "inside" MTU if you want IPv6 to work through the tunnel. This leverages that IPv6 demands a 1280 byte MTU to work.
Hah! I just ran into this recently and can confirm. The coax to my DOCSIS ISP was damaged during a storm, which was causing upstream channels to barely work at all. (Amusingly, downstream had no trouble.) While waiting for the cable person to come around later in the week, I hooked my home gateway device up to an old phone instead of the modem. I figured there would be consequences, but surprisingly, everything went pretty smoothly... But my Wireguard-encapsulated connections all hung during the TLS handshake! What gives?
The answer is MTU. The MTU on my network devices were all set to 1500, and my Wireguard devices 1420, as is customary. However, I found that 1340 ( - 80) was the maximum I could use safely.
Wait, though... Why in the heck did that only impact Wireguard? My guess is that TCP connections were discovering the correct MSS value automatically. Realistically that does make sense, but something bothers me:
1. How come my Wireguard packets seemed to get lost entirely? Shouldn't they get fragmented on one end and re-assembled on the other? UDP packets are IP packets, surely they should fragment just fine?
2. Even if they don't, if the Linux TCP stack is determining the appropriate MSS for a given connection then why doesn't that seem to work here? Shouldn't the underlying TCP connection be able to discover the safe MSS relatively easily?
I spelunked through Linux code for a while looking for answers but came up empty. Wonder if anyone here knows.
My best guess is that:
1. A stateless firewall/NAT somewhere didn't like the fragmented UDP packets because it couldn't determine the source/dest ports and just dropped them entirely
2. Maybe MSS discovery relies on ICMP packets that were not able to make it through? (edit: Yeah, on second thought, this makes sense: if the Wireguard UDP packets are not making it to their destination, then the underlying encapsulated packets won't make it out either, which means there won't be any ICMP response when the TCP stack sends a packet with Don't Fragment set.)
But I couldn't find anything to strongly support that.
Basically the only parts of the Internet which actually work reliably, around the globe, are the bits needed so that web pages basically kinda work. If you break literally everything else your service is crap, and some customers might notice, but many won't and also some won't have a choice so, sucks to be them. But if you break the Web, now everybody notices that you broke stuff and they're angry.
This is why DoH (DNS over HTTPS) is a thing. It obviously makes no actual sense to use the web protocol to move DNS packet, but, this works and most things don't work for everybody so eh, this is what we have. Smashing the Path MTU discovery doesn't break the web.
Breaking literally everything so long as the web pages work even means you can't upgrade parts of the web unless you get creative. TLS 1.3 the modern security protocol that is used for most of your web pages today, would not work for most people if it admitted that it's TLS 1.3, if you send packets with TLS version 1.3 on them people's "intelligent" "best in classs security" protective garbage (in the industry we call these "middle boxes") thinks it is being attacked by some unknown and unimaginable dastardly foe and kills the data. So TLS 1.3 really, I am not making this up, always pretends it is a TLS 1.2 re-connection, and despite the fact that no such connection ever existed these same "best in class security" technologies just have no idea what's happening and wave it through. It's very very stupid that they do that, but it was needed to make the web work, which matters, whereas actual security eh, suckers already bought the device, who cares.
This situation is deeply sad but, one piece of good news is that while "This Iranian woman can't even talk confidentially to her own mother without using code words because the people in charge there intercept her communications" won't attract as much sympathy as you'd like from some bearded white guy who has never left Ohio, the fact that those people broke his network protocol to do that interception infuriates him, and he's well up for ensuring they can't do that to the next version.
Your ultimate conclusion is correct, to my understanding. I know wireguard sought to be ultra minimal but I do wish they had included DPLPMTUD as something which is required to be supported (but not mandated to be used e.g. if the user wants to hard set it as they would currently) because it's one of those cases where "do it yourself separately the UNIX way™" or "have the tunneled things do it if they need it" instead are both significantly more complex and fragile.
On that note, from the TCP layer it should just look like an ICMP blackhole, which makes me wonder if enabling `net.ipv4.tcp_mtu_probing` will magically make TCP connections under Wireguard work even with the MTU set wrong. I'd try it, but unfortunately with a similar configuration I am unable to get the fragmentation behavior I was getting before; which makes me wonder if it was my UniFi Security Gateway that actually didn't like the fragmented packets.