Almost a year ago now I moved house which naturally required signing up for a new nbn plan. After looking through the deals, my partner suggested one and upon seeing that the ISP seemed to actually know what IPv6 is, I was happy to give them a go.
Pretty much everything was working fine, though I did notice that Whirlpool Forums would periodically hang for a full 10 seconds while trying to mark a page as ‘read’. After a few other people reporting the same, I decided to take a look into it.
At my first glance at a packet capture, I noted that Whirlpool doesn’t support IPv6, but does support HTTP/3. On the server side, this is a bit of an uncommon combination, as many of the CDNs and large sites that have so far rolled out HTTP/3 have long supported IPv6. However given the current state of the global IPv6 rollout, HTTP/3 over IPv4 is not rare on the Internet. If I didn’t have working IPv6, it turns out I would’ve probably noticed this issue a lot more broadly.
For the uninitiated, HTTP/3 is the newest version of the venerable Hypertext Transport Protocol and runs on top of QUIC, a newish transport protocol that incorporates parts of TCP, TLS and previous HTTP versions, and itself runs over UDP. The benefits of QUIC aren’t really relevant here, but broadly it aims to improve performance of connections over the Internet, and provide an alternative to TCP that is able to evolve much more easily.
From the pcap, I could see that Chrome was repeatedly sending an HTTP/3 packet but receiving absolutely no response, then after 10 seconds would retry successfully over HTTP/2(which uses TCP). Wanting a bit more control, I started testing against nginx on a fresh VPS, which allowed me to confirm that I could only reproduce the issue over IPv4 with approximately a 20 second gap between requests, which may sound like a strange set of conditions but pretty strongly points to a NAT mapping timing out in either my router’s NAT or the ISP’s CGNAT.
After running some more captures from different perspectives, I was able to narrow it down to the ISP’s CGNAT rebinding the UDP connection, essentially meaning that the NAT assumed the connection was finished as no packets had been sent for a while, and when further packets were sent, the NAT mapped them to a new port on the external address. This timeout period can usually be adjusted and in this case appeared to be 20 seconds, shorter than the 30 second max_idle_timeout
that Chrome and Firefox seem to use for QUIC.
There is a trade-off here between longer timeouts, which make the ugly hack that is NAT less disruptive, and shorter timeouts, which make it possible to fit more clients behind the limited ports of a single IP address. Here’s what RFC 4787 says on the issue:
REQ-5: A NAT UDP mapping timer MUST NOT expire in less than two minutes, unless REQ-5a applies.
a) For specific destination ports in the well-known port range (ports 0-1023), a NAT MAY have shorter UDP mapping timers that are specific to the IANA-registered application running over that specific destination port.
Vendors? Not following RFCs? Groundbreaking.
Thankfully in this case, a config tweak to the CGNAT appliances was able to fix the issue.
To be fair, it has previously been possible to mostly get away with short UDP timeouts as many protocols that use it are either short lived question-and-answer type deals like DNS and NTP, or very chatty like VoIP and multiplayer games, which brings me to:
Hang on a minute…isn’t QUIC supposed to fix this?
QUIC’s RFC 9000 makes several references to ‘middleboxes’, mainly when describing the exact NAT rebinding scenario I experienced, which more broadly falls under the category of ossification, a whole topic in and of itself covering the overly-restrictive assumptions made by some intermediary devices on the Internet that hinders the deployment of new protocols and the evolution of existing ones.
Given that QUIC’s design is heavily based around combating ossification, you might think it would include a mitigation for NAT rebinding. Well, the good news is it does, so why wasn’t it working here?
Well first we have to take a slight diversion. It’s now been many months since the issue was fixed, so I need a way to deliberately simulate a NAT rebinding on command.
After asking ChatGPT to write me a script and then rewriting 90% of it myself because it was nowhere near what I asked for, I arrived at this slightly crude python script which will proxy a UDP connection and change the outgoing source port every time you hit return.
Playing around with this setup, I came across these two messages in the nginx debug log:
*520 quic no available client ids for new path while handling decrypted packet, client: 192.0.2.2, server: 0.0.0.0:443
*520 quic packet done rc:-4 level:app decr:1 pn:50 perr:0
As some background, QUIC packets have a destination and source(sometimes) connection ID, which are essentially opaque tokens that allow endpoints to associate packets with a connection, independent of IP address or port. The active IDs are transported in clear text, which could allow outside observers to track devices across networks. To combat this privacy implication, endpoints can provide a list of new connection IDs to their peer which will be switched out according to some rules. One of these rules is:
[A]n endpoint MUST NOT reuse a connection ID when sending to more than one destination address. Due to network changes outside the control of its peer, an endpoint might receive packets from a new source address with the same Destination Connection ID field value, in which case it MAY continue to use the current connection ID with the new remote address while still sending from the same local address. RFC 9000 section 9.5
With this context, it seems like the nginx error is saying “I can’t respond to this packet because the client has changed address and I don’t have any fresh connection IDs for it”. nginx doesn’t seem to utilise the exception in that second sentence.
But why doesn’t nginx have any more connection IDs for the client? Well digging deep into another (this time decrypted) packet capture, I found that Chrome uses zero-length connection IDs, which again are described in the RFC:
A zero-length connection ID can be used when a connection ID is not needed to route to the correct endpoint. However, multiplexing connections on the same local IP address and port while using zero-length connection IDs will cause failures in the presence of peer connection migration, NAT rebinding, and client port reuse. RFC 9000 section 5.1
And there we go! An explicit call out that NAT rebinding will break, though Chrome doesn’t seem to multiplex connections in that way so if you were nit-picking then maybe this doesn’t apply.
But the plot thickens, reading into the next section, we see this:
An endpoint that selects a zero-length connection ID during the handshake cannot issue a new connection ID. A zero-length Destination Connection ID field is used in all packets sent toward such an endpoint over any network path. RFC 9000 section 5.1.1
Which seems to imply that the whole connection ID reuse logic shouldn’t apply to zero-length IDs anyway. Some clarifications to this RFC probably wouldn’t go astray.
Firefox on the other hand does use connection IDs and handles my rebind simulator quite well as long as there’s a few seconds between rebinds to allow new connection IDs to be exchanged. At the time of writing, Safari seems to have hidden their HTTP/3 feature flag so I wasn’t able to test it.
On the server side, I’ve also observed the issue with Caddy and Cloudflare, so nginx’s behaviour seems to be a somewhat widespread interpretation of the RFC.
So what’s the solution?
Well if you’re a network operator fielding complains from customers about intermittent lag on websites, then I’d start by making sure your UDP NAT timeout is at least 2 minutes if possible, in line with RFC 4787, though you could try and argue that 30 seconds for port 443 specifically would be acceptable under REQ-5a as that value is given a passing mention in the QUIC RFC.
From a QUIC standards and implementation perspective, RFC 9308 section 11 seems to concur with the idea that NAT rebinding will break connections using zero-length connection IDs, which just makes me wonder why Chrome uses them. Poorly configured NATs aren’t that rare.
The behaviour described in the above quote from section 5.1.1 (ie. that the connection ID reuse logic is disabled for zero-length IDs) is so poorly conveyed in the RFC that I can only assume it’s not the intended behaviour. Clarification here would definitely be good.
While I wait for replies from the QUIC Working Group (which I’ll post as an update if anything interesting happens), I would like to point out that all this effort is because NAT, while it has served us well, is an awful band-aid solution to the problem of running out of IP addresses. Continuing to accelerate the deployment of IPv6 is the only pathway towards the NAT-free utopia we deserve.
Comments