Cam’s Blog

Docker IPv6 networking: The redux

2024-02-22T18:50:00+11:00

A few years ago, I wrote about setting up IPv6 connectivity in Docker, focusing mainly on hooking up a RIPE Atlas probe using NDP proxying. Since then, I’ve moved house and now want to set up another probe in a slightly different environment.

Firstly, I’ll mention that the Proper Way™ to get IPv6 connectivity to Docker containers for general use is probably to enable IP forwarding on the Docker host and allocate a GUA range to it via a static route/RA/DHCPv6-PD. Then the Docker documentation should be able to get you the rest of the way. This aspect seems to have improved a lot since my previous post! 🎉

But I’m trying to do something a bit more obscure. I want the container to have a v6 address:

In whatever dynamic prefix is announced on my LAN. My ISP does offer static IPv6 prefixes but I haven’t found any other use for one yet so haven’t set it up. This isn’t possible under my previous approach nor the aforementioned Proper Way™.
With a specific suffix/token, mainly so I can easily filter Atlas traffic out of packet captures. This was possible under my previous approach, but only because the address was fully statically defined anyway.

The solution I found is the macvlan network driver with some funky config. Firstly, the relevant compose directives:

services:
  probe:
    image: jamesits/ripe-atlas:latest-armv7l
    # ...
    configs:
      - source: setIP6.sh
        target: /usr/local/bin/setIP6.sh
        mode: 0555
    entrypoint: setIP6.sh
    cap_add:
      # ...
      - NET_ADMIN
    sysctls:
      net.ipv6.conf.all.disable_ipv6: 0
    networks:
      probe-network:

configs:
  setIP6.sh:
    content: |
      #!/usr/bin/env bash
      ip token set ::a:71a5 dev eth0
      exec tini -- entrypoint.sh atlas

networks:
  probe-network:
    driver: macvlan
    driver_opts:
      parent: eth0
    ipam:
      config:
        - subnet: "192.0.2.0/24"
          ip_range: "192.0.2.47/32"
          gateway: "192.0.2.1"

The networks section is pretty standard, just restricting the available IPv4 addresses to a single address outside the DHCP range of my network. This is also an improvement over my previous solution, which NATted the container behind the host’s v4 address.

IPv6 is more tricky. macvlan still wants static v6 ranges like the other drivers, which I specifically didn’t want to do. However because macvlan essentially puts the container in the same broadcast domain as the host, simply flipping off the disable_ipv6 sysctl that Docker sets by default is enough to get the container picking up addresses via SLAAC.

The only remaining hurdle is setting the suffix I want the container to use. This is easily done using the ip token set command, which I’m calling through an inline override of the container’s entrypoint that then execs the original entrypoint. The only other change is that ip needs the NET_ADMIN capability.

Introducing: The IPv6 Canvas

2023-02-06T19:20:00+11:00

A few years ago the IPv6 christmas tree emerged as a festive (ab)use of IPv6’s vast address space. There’ve been a few variants on the concept in the meantime, but none of them seem to be active currently. I’ve wanted to play with IPv6 ranges on public cloud providers, so this seemed as good an excuse as any. Have play with the canvas, or keep reading for the behind-the-scenes.

Update: The project has come to an end. Thanks for joining in!

If you’re interested, take a look at the source code, but be warned, it’s not well optimised or neat. If I were to run the canvas again, there’s a number of improvements I’d want to make.

Coverage elsewhere: Jae’s Website, Evan Pratten

Routing the prefix

The base concept behind these previous projects has been that pinging a specific IPv6 address will change something according to parameters encoded in the IP address itself. For example, pinging 2001:db8::0405:ff:00:00 would change the pixel at coordinates (4, 5) to the colour #ff0000, or red.

To achieve that, first you need a range (aka prefix) of IPv6 addresses routed to the server where the project will run. The way this works varies between cloud providers.

I started with Vultr. They provide a 64-bit prefix (/64) to all VMs and state “You may use any IPs within the assigned prefix”, which sounded promising. What I discovered fairly quickly however, is that this is provided as an “on-link” prefix, which is exactly what your home Internet router does.

The way this works is that when a packet from the Internet arrives for an address within my prefix, say 2001:db8::1234, Vultr’s router will send a Neighbour Discovery Protocol (NDP) broadcast asking “Who is 2001:db8::1234?”

For the use case here, this is unhelpful, as I need to be able to receive pings on many addresses. There are ways to force NDP to reply to all queries, but they’re kludgy and inefficient.

So my next stop was Linode. They offer the choice of a /64 or /56 and explicitly state that it will be routed to the VM. This means that Linode’s router already knows that my VM is responsible for 2001:db8::/64 and sends the packets directly to me.

Accepting packets

So we’ve got packets making it to my VM, however the VM doesn’t know what to do with them. The default behaviour of most devices when they receive packets not addressed to them is naturally to ignore them.

This is easily solved with the addition of a “local” route on the VM, which tells the kernel that a given range of addresses belong to the local machine.

With systemd’s networkd, and Linode’s managed configuration of it, this is easily accomplished by creating a file at /etc/systemd/network/05-eth0.network.d/canvasprefix.conf:

[Route]
Destination=2001:db8::/64
Type=local

Receiving pings

Seeing as the protocol ping uses, ICMP, is normally handled by the kernel with no user interaction at all, you might wonder how a normal program can receive ping packets.

The answer is actually very simple and is similar to accepting any other network connection:

import socket

# Open a raw INET6 (IPv6) socket, set to the ICMPv6 protocol
sock = socket.socket(socket.AF_INET6, socket.SOCK_RAW, socket.IPPROTO_ICMPV6)

# Set the RECVPKTINFO flag, which will provide us with the *destination* address of packets
sock.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_RECVPKTINFO, 1)

while True:
    # Receive packets from the socket
    # Keep only 1 byte of the packet body, and 32 bytes of the additional data
    data = sock.recvmsg(1, 32)

    # The first byte of ICMPv6 packets is its type.
    # We only care about ping (echo request), which are type 0x80
    if data[0] == b"\x80":

        # Dig the destination address out of the additional data
        pingDst = data[1][0][2][:16]

        ...

Drawing the canvas

From here, the implementation is fairly simple. We just need to keep track of what colour each pixel is, update it according to incoming pings, and send this information to the webpage that renders the canvas to viewers.

I decided to do this with MQTT, which is not really what it’s designed for, but it does all the things I needed ¯\_(ツ)_/¯

Initially, I had one MQTT topic per pixel, which on a canvas of 256×256 makes 65,536 topics. This, rather predictably, did not work too well, with approximately a 200% network overhead on the pixel data, the canvas took around 20 seconds to load 😂

Moving to one topic per line (a total of 256 topics), the network overhead comes to approximately 1%, and load times are 2-3 seconds. Much better.

Running some numbers

To demonstrate the vastness of the IPv6 address space, I did some calculations.

For this project, I’m using a single /64 prefix, which means I have the remaining 64 bits of address available to use. I should note that a /64 is generally the smallest subnet size used in IPv6.

Standard 8-bit colour uses 24 bits per pixel (8 bits each for red, green and blue). That leaves 40 bits for coordinate information.

If we divide that by 1 million, then we get megapixels:

\[\frac {2^{40}} {1,000,000} = 1,099,511.628...\]

Over a million megapixels! The equivalent of 22,906.49 iPhone 14 Pros, which have a 48MP main camera.

So with a single /64, again the smallest subnet size commonly used in IPv6, we could address every possible colour of every pixel in an image with the resolution of 23 thousand iPhone cameras. That’s…big 😳

If you’re used to dealing with IPv4, that may seem extremely wasteful, and there was indeed much debate around this in the 90’s when IPv6 was being designed, but the core of it is that IPv6 is not designed to save on addresses as there’s simply a near-infinite number of them. The overhead of a few extra bits in IP headers is, in the context of today’s massive amounts of Internet traffic, minimal, and is outweighed by management and privacy benefits.

And now that world IPv6 adoption is at 42%, musing on alternative solutions is basically pointless.

End to end encrypted external access with Tailscale and nginx

2022-10-05T18:40:00+11:00

So you’re a nerd, you’re hosting services at home, and you want to access them externally.

The traditional, independent, tried and true method is to poke a hole in your firewall or create a port forwarding mapping, then set up dynamic DNS to follow your home’s dynamic IP address(es), or pay your ISP for a static allocation.

This still works fine, but there are a few things to consider:

IPv4 exhaustion means an increasing number of ISPs are being forced to implement CGNAT or other IPv6 transition mechanisms, which will prevent IPv4 port forwarding from working at all
If you change ISP, you’ll have to go through the rigmarole again (assuming it’s even possible): disabling CGNAT, ordering a static IP, getting their default firewall disabled, etc
If you are unfortunate enough to have an ISP that follows the discouraged practice of dynamic IPv6 prefix allocations, exposing services over IPv6 will range from annoying to impossible
If you have a backup(or primary) Internet connection over 5G/4G, there’s basically a 100% chance that it’ll be behind a firewall you can’t control, so external access will be impossible

These hurdles are among the reasons behind products like Cloudflare Tunnel, which in short creates a persistent outbound connection from your server to Cloudflare which external traffic can be proxied back through, getting around firewalls, NAT, etc.

I was using Cloudflare Tunnel until fairly recently, when Cloudflare made some extremely concerning executive decisions. I have since decided to work on minimising my use of their services, and Tunnel was one of the easiest products I could cross off.

Mapping out a solution

To start off, the services I want to expose are Home Assistant and my ADS-B receiver. These both run on Raspberry Pis on my home network.

The Pis already have appropriate Let’s Encrypt certificates to serve their services over HTTPS on domain names pointed to them via local overrides in the network’s DNS resolver.

I also already have a Linode cloud VPS running nginx, so my rough plan is to point the domains to my VPS, and then have some form of tunnel from the VPS back to the Pis.

The fun part! (setting up the tunnels)

The first tool I reached for was WireGuard, which you really should have a look at if you haven’t before. It’s described as a VPN, but I feel that really does it a disservice, conjuring images of being elbow-deep in a complicated config file. It’s essentially a lightweight, stateless, mutually authenticated encrypted transport with some routing sauce. The Conceptual Overview on their homepage does a good job of…overviewing the concepts 😉

With WireGuard, I was very quickly able to create tunnels between the Pis and the VPS. I then wanted to restrict access to only the necessary ports, so gazed wearily at ufw and iptables…thankfully though, I was reminded of a much nicer way to do this: tailscale!

Tailscale is essentially WireGuard bundled with some very useful features including access control, a central control plane and NAT traversal (which is a very interesting read if you think, like I did, a peer-to-peer connection on the modern Internet near impossible). With it, you can set up a peer-to-peer network with very little configuration.

I started by installing it on the three devices and enrolling them using auth keys, tagging the VPS with linode and the Pis with external-https.

After enabling the MagicDNS feature, I was off to the races:

cameron@linode:~$ ping hasspi.tugzrida.github.beta.tailscale.net
PING hasspi.tugzrida.github.beta.tailscale.net 56 data bytes
64 bytes from fd7a:115c:a1e0:ab12:4843:cd96:6279:651b: icmp_seq=1 ttl=64 time=8.90 ms
64 bytes from fd7a:115c:a1e0:ab12:4843:cd96:6279:651b: icmp_seq=2 ttl=64 time=8.46 ms
64 bytes from fd7a:115c:a1e0:ab12:4843:cd96:6279:651b: icmp_seq=3 ttl=64 time=8.72 ms
^C
--- hasspi.tugzrida.github.beta.tailscale.net ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 8.463/8.693/8.899/0.178 ms

Now I said this was going to make access control nicer, and it does! Here’s all that’s needed in the tailscale dash to accomplish what I need:

{
    "acls": [
        {
            "action": "accept",
            "src"   : ["tag:linode"],
            "proto" : "tcp",
            "dst"   : ["tag:external-https:4433"]
        }
    ],

    "tagOwners": {
        "tag:linode"        : ["tugzrida@github"],
        "tag:external-https": ["tugzrida@github"]
    },

    "disableIPv4": true
}

You’ll notice I also disabled IPv4 inside my tailnet because I live in the future 😎

If all you want is external access on your own devices, then this is all you really need: put your laptop or phone in the place of the VPS in my setup and you’re set. But I want public external access, so I need to go a bit further.

The fun part, part 2 (setting up nginx)

From here, there’s a number of ways you could proceed. The most common of which would be to set up an HTTPS server in nginx on the VPS and then proxy connections back to the origin over HTTPS. Even plaintext HTTP would be acceptable in this case as the tunnel is encrypted. I won’t go over that here as there’s endless nginx guides which cover it, but it’s a perfectly valid solution.

But there’s a less common way, and it’s how the titular end to end encryption is achieved: nginx’s stream and ssl_preread modules! They let you use the power of nginx, but one layer down, proxying TCP (or even UDP) rather than HTTP.

Seeing as my Pis already have certificates and are serving HTTPS, I’ve also chosen this route so I don’t need to double up on certificate management – the VPS doesn’t need certificates for the exposed services as it’s not decrypting the TLS, just passing it over to the Pis.

Here’s what my config looks like:

stream {
    resolver 127.0.0.53;

    server {
        listen 443;
        listen [::]:443;

        ssl_preread on;

        proxy_pass $upstream;
        proxy_protocol on;
    }

    map $ssl_preread_server_name $upstream {
        default              unix:/run/nginx-locally-terminated-https.sock;

        hass.example.com     hasspi.tugzrida.github.beta.tailscale.net:4433;

        adsb.tugzrida.xyz    adsbpi.tugzrida.github.beta.tailscale.net:4433;
        adsb-pf.tugzrida.xyz adsbpi.tugzrida.github.beta.tailscale.net:4433;
    }
}

There’s two things to draw attention to here:

The stream module will completely take over port 443, so it will conflict with any existing nginx HTTPS servers. This is why the default stream route above is a unix socket. I can still run HTTPS servers on the VPS like this:
```
server {
    listen unix:/run/nginx-locally-terminated-https.sock ssl http2 proxy_protocol;
    real_ip_header proxy_protocol;
    set_real_ip_from unix:;
    #...
}
```
The resolver directive is crucial for nginx to be able to resolve the tailscale hostnames. In my case, it’s pointing to systemd-resolved, which will automatically route queries to MagicDNS. If you’re not using resolved, you’ll need to include a MagicDNS IP directly: fd7a:115c:a1e0::53 or 100.100.100.100

The final step is the config on the Pis. I have nginx running on both of mine, and here’s the relevant config:

server {
    listen [::]:4433 ssl http2 proxy_protocol;
    set_real_ip_from insert_linode_tailscale_ip_here;
    real_ip_header proxy_protocol;
    #...
}

Once nginx is restarted, and the DNS records are pointed to the VPS, everything is good to go. Requests are HTTPS encrypted by the client device, routed through the VPS while still encrypted, sent over the tunnel to the origin server, and finally decrypted there.

I’ve also added a basic server to nginx on the VPS solely to redirect HTTP requests for the relevant hostnames to HTTPS.

Tweaks

Because I’ve used nginx on both ends of the tunnel, and it supports the PROXY protocol, I’ve used it to pass the original client’s IP address to the origin server. If you don’t care about keeping the client’s address (or your origin server software doesn’t support it), there’s no other benefit to using PROXY.

A more common way to preserve the client address is the X-Forwarded-For or X-Real-IP HTTP headers. Those can also be used here, though not with the end-to-end encrypted, stream module solution. You’ll need to do a standard nginx HTTP proxy and add the appropriate header there. That method would also allow you to add HTTPS to an origin service that doesn’t support it.

Additional services can also be exposed very easily. For example, a Prometheus exporter on my Home Assistant Pi is exposed to Prometheus on my VPS by adding the tailscale tag prometheus-exporter to the Pi and tweaking the ACL like so:

{
    "action": "accept",
    "src"   : ["tag:linode"],
    "proto" : "tcp",
    "dst"   : ["tag:external-https:4433", "tag:prometheus-exporter:9100"]
}

Prometheus is then simply pointed to hasspi.tugzrida.github.beta.tailscale.net:9100

This way of exposing services is really quite flexible, so I hope I’ve sparked some ideas! If you discover another interesting or unusual use of tailscale or nginx, share it below!

Pssst, my referral link for Linode will give you a $100, 60 day credit, and give me $25 after your first payment of $25.

80% fully vaccinated. We (finally) did it!

2021-11-06T15:20:00+11:00

We did it! 80% of Australians aged 16+ are fully vaccinated against COVID-19, putting Australia into the “Consolidation” phase of the national transition plan, meaning only highly targeted lockdowns will be necessary, and travel restrictions will loosen significantly. The 12+ cohort isn’t far behind at 78.5%, and the whole population is at 66.7%.

I’ll be honest, at the beginning of the year I doubted we could get this far, but we’re now headed to be amongst the highest vaccinated countries on earth.

I never made a blog post about the vaccine clock I made to track the rollout (not that I have much of an exclusive audience here), but despite only promoting it through my Twitter, friends, and family, it’s ended up with a few thousand views per day and even landed me an interview with Nine News Sydney.

To be able to impact positively on so many people is quite rewarding, but it really just started one night in August when I was daydreaming in the shower and realised that the number of vaccines we were administering per day was greater than the number of seconds in a day. I thought it would be cool to visualise that, so I traded in some lockdown boredom to cobble together a proof-of-concept.

I realised how cheering it was to be able to see every small step we were making towards the end of lockdowns(and me finally getting a haircut!) and thought it would be cool to share that feeling. So I tidied up the code and published the site. The rest is history. And a decent helping of effort 😂

So where to from here? Well now that we’ve met the 80%-of-16+ national target, I’ll change the clock to default to showing the 12+ age group(which is actually quite similar to the 16+ group). As vaccines are approved for younger ages, I’ll default to showing those groups too, though I’ll leave all the age groups selectable as an option as long as their data is still being updated.

I am going to try and step back a little from the day-to-day happenings of the vaccine clock. While the data generation is mostly automated, I have still been checking in every day to see how things are going. And whilst I am still unemployed due to a healthy dose of post-high-school-covid-interrupted-gap-year inertia, I would like to be able to do other things without feeling tied to push notifications and my laptop every day.

In a broader sense, I’m sure many Australians who’ve been trapped overseas(and those who’ve been trapped here) for what’s now coming up to two years will be very grateful that border restrictions are easing. I hope that privileged countries like Australia will also redouble their efforts to support the rest of the world through COVID so that we can all feel a little closer again.

Personally, I really have no idea what’s next. I finished high school at the end of 2019 with a vague intent to relax for a few months and then pick up some work around the middle of 2020, with perhaps some part-time study on the side. Well, we all know what happened then. I’ve been lucky enough to be able to stay in the “relax” phase of that “plan” for far longer than I intended, even if it has lead to some inertia, and even though it wasn’t really very relaxing(!) For the short term, I think I’m going to try and get some proper relaxing and fun in. I think we all deserve some joy after…all that.

Whatever way things go, I’ve been making little doodads like the vaccine clock for years now, and I don’t think that’ll change any time soon. I can’t guarantee that they’ll have as wide of an appeal, but I’ll keep on doing things that interest me :)

To sign off, I can’t mention Australian COVID statistics without giving a massive shout out to Ken Tsang of COVID-19 Near Me and the vaccine data feed, Anthony Macali of COVID Live, and Casey Briggs at the ABC. I can’t thank them all enough for their countless hours of work bringing data to the public. I think we’ve all learned the value of accessible and well-presented data over the past two years, and I’d love to see some of the transparency that we (eventually) achieved grow and continue into the future…just maybe in a CSV this time instead of a PDF of a PowerPoint of a database 😑

And of course, thanks to everyone who got vaccinated. Whether six months ago or just yesterday, we’ve all carried the torch in this relay, though only for about 1 second each on average :p

❤️

DNSSEC KSK Ceremony 43

2021-10-06T10:00:00+11:00

Well somehow it’s coming towards the end of 2021 already, which means it’s nearly time again for one of my favourite Internet quirks: a DNSSEC KSK ceremony, number 43 to be exact. What’s a DNSSEC KSK ceremony? Well I covered it in quite verbose detail in my series on DNS earlier this year, but here’s a shorter version:

The domain names we’re all familiar with (like google.com, twitter.com, etc.) are recorded in and distributed via the Domain Name System, or DNS. The DNS is a hierarchy of servers, each responsible for one zone. A zone is essentially a list of information for all the domain names with a given suffix. For example, in the domain name google.com.au, there is the au zone, the com.au zone, and the google.com.au zone.

There’s also the most important zone, the root zone. The root zone is the top of the hierarchy and contains information about all the top-level domains, such as com, net, org, au, xyz, etc.

From the early days of the Internet until around the 2000s, there was no way to verify the authenticity of DNS data, meaning anyone between say, your computer and Google’s DNS servers, could tamper with DNS responses and make you think you were visiting Google, whilst they actually lead you somewhere else entirely.

Many of the risks caused by this lack of authenticity have now been mitigated via the deployment of encryption standards like TLS, but there are still merits in ensuring DNS data can be authenticated.

Enter Domain Name System Security Extensions, or DNSSEC. DNSSEC allows for a chain of authenticity to be built from any domain name all the way back to the root zone. I won’t go into the specifics of DNSSEC, as I covered it in my earlier series, but the long and short of it is that something needs to be the final “anchor” of this chain of authenticity. This anchor is the Root Key Signing Key (Root KSK).

Because the Root KSK is the anchor of authenticity for the whole of the DNS, it of course needs to be handled both very securely and very transparently, so that the Internet community at large can be sure that it is trustworthy. This transparency manifests primarily through quarterly “ceremonies”, which are the only times that the Root KSK can be accessed, and are when digital signatures for the upcoming three months of DNSSEC operation are generated. Again, there’s more coverage of the specifics in part 3 of my earlier series.

Where did we leave off at the last ceremony?

Before we get to the upcoming Ceremony 43, we’ve got to cover a few details going back to Ceremony 40 in February 2020, which was the last pre-COVID ceremony.

During Ceremony 40, the lock on one of the safes at the California DNSSEC facility wouldn’t open and it took a locksmith around 20 hours to drill it out. Due to the massive delay this caused, one of the scheduled tasks for Ceremony 40, the destruction of Hardware Security Module HSM3, was postponed until the next ceremony scheduled to happen at the California facility, Ceremony 42.

But that would soon go even more pear-shaped with the outbreak of COVID-19 around the world. People travelling from around the globe and gathering in a small room for a few hours was just about the best way to guarantee spread of COVID, and would also be difficult or impossible due to travel restrictions, and so ceremonies 41 and 42 were drastically modified to make them as safe as possible.

The main modifications were that the majority of participants participated remotely, both ceremonies were held at the California facility instead of alternating with the Virginia facility, only critical tasks were performed, and nine months’ worth of digital signatures were generated instead of three to allow the ceremonies to happen less frequently.

With all that, HSM3 in California is still currently pending destruction, but that won’t happen this ceremony either, as now that we’re moving towards COVID normality, Ceremony 43 is being held at the Virginia facility.

Ceremony 43

Ceremony 43 is happening on Thursday Oct 14th at 1700 UTC (which is 4am here in Sydney!) at the Virginia DNSSEC facility. It’ll be the first ceremony to happen there since Ceremony 39 in November 2019, so hopefully we won’t have any issues with safes having stubborn locks!

Well actually, spoiler alert, both safes at the Virginia facility were opened back in June to allow locks and combinations to be changed, so they should(fingers crossed!) open up fine during the ceremony next week. HSM4 and 5E were also booted up (but not activated/unlocked) to ensure they’re still functioning normally.

Most of the COVID alterations present in ceremony 41 and 42 won’t be used for Ceremony 43: only the standard three months’ worth of signatures will be generated and three Crypto Officers (COs) will attend in-person. Participating remotely will be representatives from Verisign, external auditors, and a fourth CO as a backup, who has sent their safe deposit box key to IANA in a tamper-evident bag, as all required COs did for ceremonies 41 and 42.

Besides the standard signature generation, one additional task will happen: the commissioning of a new Hardware Security Module, designated HSM6E.

You can find the proposed script for the ceremony on the ceremony’s page on IANA’s website, along with records from the Administrative Ceremonies in June when the Virginia safes were last opened.

I’ll be following along and live tweeting the ceremony. You can watch live too on YouTube. If you have any questions in the mean time, feel free to reach out on Twitter or down in the comments.

UPDATE (14 Oct): The ceremony went without too many hiccups. Read this Twitter thread I wrote during the ceremony for all the details 👇🏻

Good morning from Sydney! DNSSEC KSK ceremony starts in 10 minutes.

If you want to follow along, the YouTube live stream is here: https://t.co/jsZvLKhZPL

The script for the ceremony is here: https://t.co/vq5EZ0djoX

Feel free to ask me any questions!
— Cameron Steel (@Tugzrida) October 14, 2021

UPDATE (3 Nov): The final materials from the ceremony are now available on the IANA website, including the audit camera footage and annotated scripts.

The main exception that I wanted to confirm in the annotated script happened at Act 5 Step 9. The CA sealed the KSK backup card into its new TEB without first putting it into its plastic holder. I mentioned this on Twitter as it happened and also wrote down the new TEB’s number, as it would now be different from the one in the draft script. The number recorded in the exception in the annotated script matches the number I wrote down during the ceremony: BB46584614.

That’s all from Ceremony 43 now. Ceremony 44 will take place around February 2022.

CheerLights, MQTT and Home Assistant

2021-07-30T18:45:00+10:00

One of the projects I’ve done so far during the 2021 Sydney COVID lockdown is properly setting up Home Assistant with some IoT devices around the house.

CheerLights

I very much wanted to integrate the CheerLights project into Home Assistant so I could have lights at home change colour in sync with CheerLights, and hence lights all over the world, as controlled by Twitter.

Quite a few years ago, I put some LEDs on the back deck for christmas and wrote some code to control them, first using an Arduino Yún, then upgrading to a Raspberry Pi the following year. Both of these were quite janky (you can look at the code for the Pi version if you really want to, but I wouldn’t recommend it 😅), but both were able to set the deck lights to the CheerLights colour.

Coming back to the present and Home Assistant, you could use the REST integration to poll the ThingSpeak API for the CheerLights colour like I did in my previous project, but there’s a much more efficient way now supported by CheerLights: MQTT.

Message Queuing Telemetry Transport is a lightweight messaging protocol that allows a client to subscribe to a given “topic” and receive a notification when a message is sent to that topic. It runs over a simple persistent TCP connection. CheerLights runs an MQTT server at mqtt.cheerlights.com which publishes two topics: cheerlights, which contains the current colour name, and cheerlightsRGB, which contains the current colour hex code.

Getting it into HA

There’s a few ways to get the CheerLights MQTT into Home Assistant. The simplest is to use HA’s MQTT integration pointing to the mqtt.cheerlights.com broker on port 1883, however this will only work if you don’t use MQTT for anything else in HA.

If you already have a local MQTT broker running for Home Assistant, then you can bridge in the cheerlights topic with this addition to the Mosquitto config:

connection cheerlights
address mqtt.cheerlights.com
bridge_protocol_version mqttv50
notifications_local_only true
topic cheerlights in

Because I love to encrypt and IPv6 all the things, I made a mirror of the CheerLights feed on my own server, and so my home broker has a few different options:

address cheerlights-mirror.tugzrida.xyz:8883
bridge_cafile /etc/ssl/certs/ca-certificates.crt

My mirror supports secure MQTT on port 8883, and also MQTT over WebSockets on port 8884, which means it can be used in a web browser with the Paho JS MQTT library, as demonstrated on the mirror splash page.

Using the colour in HA

Anyway, back on track: to use the CheerLights colour in Home Assistant, I first created a template sensor by adding this to my configuration.yaml:

template:
  - trigger:
    - platform: mqtt
      topic: cheerlights
    sensor:
      name: CheerLights Colour RGB
      unique_id: sensor.cheerlights_colour_rgb
      icon: mdi:palette
      state: >
        {% if trigger.payload == "red" %}
          [255,0,0]
        {% elif trigger.payload == "green" %}
          [0,255,0]
        {% elif trigger.payload == "blue" %}
          [0,0,255]
        {% elif trigger.payload == "cyan" %}
          [0,255,255]
        {% elif trigger.payload == "white" %}
          [255,255,255]
        {% elif trigger.payload == "oldlace" %}
          [255,228,146]
        {% elif trigger.payload == "purple" %}
          [181,109,255]
        {% elif trigger.payload == "magenta" %}
          [255,0,255]
        {% elif trigger.payload == "yellow" %}
          [255,255,0]
        {% elif trigger.payload == "orange" %}
          [255,165,0]
        {% elif trigger.payload == "pink" %}
          [255,146,173]
        {% endif %}

You might wonder why I went this route instead of just using the cheerlightsRGB topic that already has the RGB hex code, and there’s two reasons:

Home Assistant doesn’t accept colour values as hex
Manually defining each colour allows them to be tweaked to work best with physical lights rather than on screen. Green, for example, doesn’t come from CheerLights as #00FF00 as you might expect, but rather a darker #008000, because the CSS colour “green” that it’s based on is actually a dark green

Now that we have an RGB value in Home Assistant, all that’s left is to make an automation. This can trigger on the state of sensor.cheerlights_colour_rgb and then call the light.turn_on service with the data attribute rgb_color: '{{ states("sensor.cheerlights_colour_rgb") }}'

Here’s my full automation, which also triggers on the deck LED strip being turned on, such that the strip will always show the CheerLights colour, but can be turned on and off as desired by other automations or manually:

alias: Maintain CheerLights colour on deck strip
trigger:
  - platform: state
    entity_id: sensor.cheerlights_colour_rgb
  - platform: state
    entity_id: light.deck_handrail_strip
    from: 'off'
    to: 'on'
condition:
  - condition: state
    entity_id: light.deck_handrail_strip
    state: 'on'
action:
  - service: light.turn_on
    target:
      entity_id:
        - light.deck_handrail_strip
    data:
      effect: Twinkle
      rgb_color: '{{ states("sensor.cheerlights_colour_rgb") }}'
mode: single

And finally, here’s the LEDs on the deck, which I upgraded to an addressable strip two years ago, and moved to an ESP32 running ESPHome last week:

Docker IPv6 networking, routing, and NDP proxying

2021-06-26T22:00:00+10:00

Today I was working on migrating my RIPE Atlas probe from running natively on a Raspberry Pi to running on a different Pi inside Docker.

You don’t really need to know about the Atlas project to follow along here, though it is an interesting and useful project nonetheless.

The background

By default, a Docker container will be assigned an IPv4 address in some private (RFC1918) range, which the Docker daemon will then NAT to the host’s address. You can expose ports (essentially port forward) from containers to be visible at ports on the host’s IP.

Also by default, Docker just doesn’t do IPv6, which after today has definitely been added to my list of Docker pet peeves! If you want to expose ports from containers over IPv6, then a current practice seems to be:

Create a Docker network with an IPv6 ULA(unique local address) range
Use the docker-ipv6nat container to NAT this to the host’s IPv6 address.

My cringing at the thought of NATv6 aside, NAT does, for both v4 and v6, work *okay* for exposing many services, and in both cases does allow you to specify an address on the host to bind to.

My problem

The difference with my Atlas container however, is that Atlas probes don’t actually need to expose any ports. They simply open an outbound connection to the RIPE servers and then conduct measurements as instructed.

Because of this, my objective was actually to specify an IPv6 address (in the range assigned by my ISP) for outgoing connections from the Atlas container. The add-on NATv6 container, and also the built-in NATv4 for that matter, doesn’t seem to have this capability.

In my specific circumstance, I don’t mind NATing IPv4 in Docker — I only have a single public v4 address so my router does NATv4 anyway. But I did want to set the outgoing IPv6 address for neatness, ease of identification, and separation of Atlas traffic.

A solution or two

In the end, I think there’s probably two ways to accomplish my objective.

The first, which isn’t the one I ended up using, is to connect the container to a Docker network using the macvlan driver. This essentially makes the container the same as any other device on the network. It can receive DHCP and RA’s directly from a router and set itself up however you configure it. The main reason I didn’t go this route is that I would’ve had to make my own Docker image and embed my chosen IPv6 address within the guest OS’s settings (or write a script to pull it from an environment var). This is kinda clunky and also goes against the whole Docker mojo of portability.

The second option I think is sneakily clever. First up, here’s the relevant parts of my docker-compose.yaml, with generic IP addresses (assume 2001:db8::/56 is the range assigned to me from my ISP):

services:
  probe:
    # other container options...
    networks:
      probe-network:
        ipv6_address: 2001:db8::a:71a5

networks:
  probe-network:
    driver: bridge
    enable_ipv6: true
    ipam:
      driver: default
      config:
        - subnet: 2001:db8::a:71a5/125

This alone will establish Docker’s default IPv4+NAT setup, but also will give the container 2001:db8::a:71a5, and add the necessary routes to the host for the 2001:db8::a:71a5/125 range. As a side note, a /125 was the smallest range that worked for me, due to the address required for the interface on the host and other overheads.

At this point, the host should be able to ping the container at 2001:db8::a:71a5, but nothing beyond the host will work, and that’s simply because nothing else knows where to find that address on the local network.

To fix this, firstly some sysctl options on the Docker host need to be set to allow it to route packets. This is generally net.ipv6.conf.all.forwarding=1, as well as net.ipv6.conf.INTERFACE.accept_ra=2 if the host requires SLAAC. If the host uses systemd’s networkd, then adding IPForward=ipv6 to the interface’s config file has the same effect, just with a little better readability!

The host will now be able to route packets to the container, but other devices still won’t know that the host is responsible for the container’s address. There’s a few ways to fix this depending on your exact network setup, such as having the Docker host make Router Advertisements, or adding the subnet to a router as a static route (with the latter noteworthy for also being possible with IPv4 if desired).

Both of those options would be great for an IPv6 range, but for individual addresses, particularly ones within the range of an existing network, there’s a simpler way…

NDP proxying

One of the neatest features of IPv6 I’ve discovered so far is NDP proxying. NDP is the IPv6 equivalent of IPv4’s ARP. They’re both essentially the protocols devices use to find the MAC address for a given IP address on their local network segment.

Adding the line IPv6ProxyNDPAddress=2001:db8::a:71a5 to the relevant networkd config on the Docker host will make it attract traffic from the local network bound for that address by way of Neighbour Advertisements, but without actually assigning it to an interface on the host. Once a packet arrives, the host will see that it matches the route for 2001:db8::a:71a5/125 and pass it to the Docker interface. If you’re not using systemd, then you can also control NDP proxying via the net.ipv6.conf.INTERFACE.proxy_ndp=1 sysctl option and ip -6 neigh command.

NDP proxying works in my case as the address of the container is within the range of my LAN, so routes already exist for it. Something on the network just needs to stick up its hand and say “Hey! That’s me!”, which is pretty much exactly what NDP proxying makes the host do. If I wanted the container address to be outside of my LAN, then I’d have to go with the RA or static route option.

I should note that ARP proxying is also a thing, but from my very brief look at it, it doesn’t seem to be selectable like NDP proxying. At least from the systemd manual, it sounds like a method a router may use to attract all traffic from a network segment so it can route it itself, so definitely something that could brick a network if you’re not careful.

I’ll end by noting that I haven’t extensively tested and researched these solutions, so you should before implementing them in something important 😛

If you discover any big drawbacks then let me know.

UPDATE: I’ve noticed that systemd has a bug where proxied addresses added through the IPv6ProxyNDPAddress option are sometimes dropped when an interface goes down and not re-added when it comes back up. This issue was fixed in systemd 248, and also backported to 247. When using older versions of systemd, you can make a workaround using networkd-dispatcher and a script in /etc/networkd-dispatcher/configured.d like this (substituting your interface of course):

#!/usr/bin/sh
ip -6 neigh add proxy 2001:db8::a:71a5 dev INTERFACE

It’s 2021, what do you still need a personal VPN for?

2021-05-10T12:10:00+10:00

If you’ve watched more than a handful of YouTube videos in the past few years, you’ve probably seen sponsored advertisements for VPNs impressing on you the great security benefits of using a VPN. But in a time where a lot of Internet communications are already encrypted, what are the actual benefits of using a VPN?

And before I get into the details, I’ll just say that I don’t hold anything against creators who accept sponsorship from VPN companies. They’re not responsible for having a deep technical knowledge of the things they advertise, but I strongly believe that VPN companies should be more honest in their marketing.

What does a VPN actually do?

At its core, a VPN or Virtual Private Network creates an encrypted “tunnel” between your device and the VPN company’s server, such that all your Internet traffic is encrypted between these two points, and appears to the broader Internet to originate from the VPN company, rather than your device specifically.

Security

One of the most common benefits touted in VPN advertising is security – that if you’re not using a VPN, then all your bank details and passwords will be stolen by some shady person wearing a hoodie in a dark room 😱.

Nowadays, this point is misleading. At the time of writing, somewhere around 85% of web browser traffic is encrypted with HTTPS, also known more generally as TLS. All content sent between you and a website using HTTPS is only accessible to you and that website, be that your bank, a social media platform, etc.

While reading this blog, you’ll probably (depending on your browser) see a padlock icon in your browser indicating the connection is encrypted. If you go to a site without HTTPS, then you’ll see (again depending on your browser) a “Not Secure” badge warning you that your connection is, you guessed it, not secure.

Adding a VPN on top of a connection already encrypted with TLS(when it’s setup correctly) accomplishes very little from a security perspective. One layer of modern encryption is strong enough that, for all practical purposes, it’s unbreakable.

To break TLS encryption would require something on the level of a supercomputer working for thousands of years, which is why pretty much all attacks on it centre around intercepting the connection and causing it to be encrypted for the attacker rather than the original site. In the context of a modern web browser, doing this pretty much requires the attacker to install a root certificate on your device, and if they have access to your device to do that, then they can pretty much do whatever they want, VPN or not.

So what about connections outside of a web browser, like in apps? Well in a lot of cases, encrypted communications are fairly likely to be happening anyway, as both Apple and Google have been strongly pushing for TLS in apps, but there are some caveats. App developers can still use unencrypted connections in some cases, and even if they are using TLS, it is possible to configure it without the verification necessary to prevent it being intercepted. With Apple and Google’s efforts here, this problem should eventually be solved, but we’re not there yet.

In these situations where there is no encryption, or it’s configured poorly, then VPNs can help, though it’s important to remember what VPNs actually do: they encrypt traffic between you and the VPN company. This will protect your unencrypted traffic through your local network and ISP, however both the VPN company and any networks between them and the final destination will be able to see the unencrypted traffic. This is why it’s important that you pick a VPN company that you trust, though the true long-term solution in this case is for developers to simply use encrypted protocols to begin with, rather than relying on every user using a VPN as a partial bandaid solution.

Another thing to be mindful of in this case is that commercial VPN services effectively end up redirecting traffic from a lot of users through a single point, which makes bulk surveillance or interception easier. It’s another reason to put some care into picking a VPN provider.

So to sum up, VPNs will provide a security benefit(albeit only partial) when sites and apps aren’t already encrypting your traffic(or are doing it improperly), but a better solution is for said sites and apps to simply encrypt traffic themselves. And in that latter case, which is increasingly becoming the norm, VPNs don’t provide a meaningful security benefit.

Privacy

So, if TLS is so great, then what’s the catch? Well, even when using TLS, there are certain pieces of metadata which remain unencrypted, and some of them can’t really be encrypted with a protocol like TLS because they’re simply necessary for the way the Internet functions.

The most widely-known pieces of metadata visible on a network are IP addresses. IP addresses have been quite overblown as some sort of secret identifier, but really they’re just how the origin and destination of Internet traffic is identified. They need to be visible to every network a piece of data travels through so that routers know where to send it, and so that the server at the other end knows where to send the response.

One of the IP addresses of this blog is 104.21.95.118, but all the information that I could gain from that as someone inspecting traffic on the network is that you’re visiting a website hosted behind Cloudflare. Given that Cloudflare hosts around 25 million domains, that alone doesn’t really lead to any privacy issues, however for sites that aren’t hosted behind a large company, just an IP address could very well identify the exact website you’re visiting, even though the content of your communication would remain encrypted.

When using a VPN, your local network and ISP won’t see the IP addresses of the sites you’re using, just the IP address of the VPN server. This is another thing to note: using a VPN doesn’t make you invisible. To any intermediary networks, it will probably be quite obvious that there’s a VPN connection from your device(or at least your network), which in certain circumstances could look suspicious.

The next bit of metadata is DNS. DNS is the mechanism through which your device finds the IP address for a given website, and as I’ve written before, it’s completely unencrypted by default. DNS will reveal the domain you’re browsing (like google.com, twitter.com, etc) while the content, if protected by TLS, of course remains encrypted.

Another bit of metadata that exposes the domain name is actually a part of TLS called SNI, Server Name Indication. This tells the website’s server what website you’re looking for, in case multiple websites are hosted on the same server. There is a standard in development called Encrypted ClientHello (ECH) which would encrypt the SNI, but it’s still in development and therefore not at all widely used at the moment.

Pretty much any VPN will hide the SNI, as it’s integral to the actual website traffic, but DNS is not as guaranteed. The DNS leak test website can help you test this out.

Of course, your VPN provider will still see this metadata, which is yet another reason to pick a reputable VPN provider, and why a lot of emphasis is put on “no log” policies. But in general, VPNs are great for increasing your privacy, particularly if your school/family/country doesn’t … agree … with your sexuality/gender identity/political views/religion/etc.

Anonymity

Another potential benefit of VPNs is anonymity, however it’s important to understand the limitations here.

There are many ways a website can identify you when you go to their site, the most obvious of which is asking you to log in, but your IP address, cookies and other modern fingerprinting techniques can all aid a website in identifying you.

Out of all these identifiers, using a VPN alone will only hide your IP address. Using the incognito/private/guest mode in your browser can help with some of the other identifiers, but if anonymity is a big concern to you, then something like the Tor Browser would be a better option.

Tor takes the approach shown in some spy/hacker movies: it bounces your connection through a number of different relay stations across the world. This, combined with its other anti-fingerprinting measures, means that using Tor makes tracing your identity almost impossible.

You should know however, that using Tor may raise even more suspicion than using a VPN, as its robust anonymity can unfortunately lend itself to criminal activity. Relaying your connection through multiple places across the world also slows things down quite a bit, so like many things, it’s a trade-off.

Accessing blocked or remote content

This is probably the main reason that people use a VPN these days, and indeed the reason that the concept of a VPN was created in the first place.

The more traditional VPN use case would be where a private entity sets up a VPN server intended for, for example, accessing work files away from the office, or if you’re particularly techy, accessing your home network away from home.

The common use of VPNs today is slightly different: because using a VPN means that your traffic appears to originate from the VPN provider, then they can setup servers in different countries to allow you to circumvent geo-restrictions on streaming services, or indeed to access sites blocked by the network you’re on (or by your country), though in this case, VPN connections may also be blocked.

For these use cases, VPNs are a perfect solution, and there’s not really any other general tools that will do the same job, except for perhaps zero-trust mechanisms in the “office” use case, but that’s quite out of the scope of this post :p

What I use

Personally, I only regularly use a VPN for remote access to my home network.

From a security perspective, the only website I can think of that I regularly use which doesn’t have HTTPS is the Australian Bureau of Meteorology, which I’m not too worried about (though more generally, I’m quite disappointed they haven’t gotten around to supporting it yet).

Public WiFi networks can carry a higher risk, as anyone is able to connect to them, or potentially impersonate them. Personally, I don’t often use public WiFi, and when I do, it’s most commonly on my laptop, where I can keep a closer eye on HTTPS in the browser anyway. If I really had to use my phone on public WiFi, then I could use my back-to-home VPN to get me back to a trusted network.

From a combined security and privacy standpoint, I pretty much always use DNS over HTTPS or DNS over TLS, which also gives a bunch of other benefits from speed to guaranteed DNSSEC validation, given the wide variability in the DNS resolvers used by different networks. The easiest cross-platform way to do this is with Cloudflare’s app, but there are also non-proprietary ways which just require a little more effort to setup(this is what I use and I may do a post on it in the future).

For privacy itself, I’m rarely on a network where I’m worried about being monitored, however when I was still at school (or in the rare case I’m using public WiFi for something sensitive), I’d regularly use my VPN back to home or hotspot my laptop to my phone and use mobile data instead. If I want to search for something sensitive, like medical information, then I’ll often use an incognito browser window to reduce the amount of data that sites have to correlate my activity with. In my specific setup, using my mobile data would also increase my anonymity compared to my home connection, just based on how their IP addresses are assigned, though this will likely vary with your exact setup.

tl;dr

Using VPNs can benefit your privacy online, and, to a lesser degree, your security and anonymity. They also allow you to access geo-blocked content or other sites blocked by your network or country.

When not using a VPN, your local network and ISP can see the names of websites you visit, and the content of any communications that aren’t already encrypted with something like HTTPS/TLS. You’ll also potentially be vulnerable to TLS interception by an attacker on your local network if you use apps that don’t verify their TLS connections.

Using a VPN simply shifts these metadata and unencrypted content “leaks” away from your local network and ISP, to between the VPN company and the destination of your communications. This area of the Internet is generally less accessible to small-scale attackers. A VPN will also prevent any attacker on your local network from intercepting poorly-configured TLS connections.

If you have any questions or I’ve gotten something wrong, let me know below or on Twitter.

DNS, security and key ceremonies, oh my! – Part 3

2021-03-31T14:45:00+11:00

Now that I’ve explained both DNS and DNSSEC, I can cover the most widely known part of the DNSSEC infrastructure – the Root KSK Ceremonies. These ceremonies exist to provide transparency to the Internet community around the creation, use, and storage of the root KSK. Transparency is essential in establishing trust of the KSK – asking the Internet to just blindly trust something wouldn’t work, and rightly so!

Who runs the Internet?

Answering this question completely would take a very long time. Over the life of the Internet, it’s moved from being a US military project maintained by a handful of people, to today, where the Internet has penetrated a solid majority of the developed world, and a fair amount of the developing world – thus its history is long and complicated.

In short, operation of the DNS root(among other functions) is overseen by the Internet Assigned Numbers Authority (IANA), which is a function¹ of ICANN, the Internet Corporation for Assigned Names and Numbers. On October 1st 2016, ICANN gained independence as a nonprofit organisation, governed by an international multistakeholder community. Prior to then, ICANN was contracted and overseen by the US Department of Commerce.

Another party involved in DNS root management is Verisign, a public US-based company. Aside from operating the A and J root name servers, Verisign is the Root Zone Maintainer(RZM), meaning they’re responsible for maintaining the root zone file at the direction of IANA.

The components of a ceremony

Ceremonies are held every three months and alternate between two redundant Key Management Facilities(KMFs) in the US – in Culpeper, Virginia, and El Segundo, California. These facilities consist of a secure ceremony room, subject to dual occupancy. Inside this room is a metal cage ‘safe room’, also subject to dual occupancy, which contains two safes.

Safe 1, the equipment safe, contains, each sealed in its own tamper-evident bag (TEB):

Hardware security modules. An HSM is a specialised device for securely storing private keys and allows interacting with them without the key itself leaving the HSM. HSMs are generally tamper-resistant, meaning they’ll delete the key and stop working if tampering is detected – usually through a combination of accelerometers, tamper switches and other mechanisms.

There are often multiple HSMs in the safe at various stages of being commissioned or decommissioned, as well as two operational HSMs, which are alternated between each ceremony.
Two laptops, which are again alternated between. These are fairly normal laptops, however they are operated as airgapped machines with no permanent storage, so they have no WiFi, Bluetooth, battery, hard drive or SSD, and are never plugged into a network.
A DVD and USB flash drive. The DVD contains the operating system for the laptop, which is a minimal Debian build with some extra utilities for the ceremony.

The flash drive, known as the HSMFD, serves as a record and backup of the signatures and logs created during the ceremony. Multiple copies are made during the ceremony to be distributed.
A smartcard containing an encrypted backup of the KSK

Safe 2, the credential safe, contains safe deposit boxes, which in turn contain more smartcards in TEBs that are required to activate the HSMs. The safe deposit boxes in this safe need two (physical) keys to be opened. One is retained by IANA, and the second key for each box remains under the custody of a Crypto Officer (CO).

COs are Trusted Community Representatives(TCRs), meaning they aren’t affiliated with IANA, ICANN or Verisign. Anyone can apply to be a CO, however a certain level of knowledge is required. There are seven COs for each KMF, though only three are required to operate an HSM.

There is another group of TCRs called Recovery Key Share Holders. There are seven RKSHs in total and they are custodians of smartcards containing parts of the encryption key under which the KSK is stored. The RKSHs are only needed in disaster recovery situations, and five RKSHs are required to recover the storage key.

As transparency and auditing is such a core part of the ceremonies, the participants follow a script to ensure correct policies are followed and deviations are recorded. The whole ceremony is also filmed from multiple angles and since 2018 is also live streamed on YouTube. The full resources from every ceremony since the start of DNSSEC signing in 2010 are on IANA’s website.

What happens at a Root KSK ceremony?

Besides the COs, there are a few other people involved in the ceremony, mostly IANA and ICANN staff. The leader of the ceremony is the Ceremony Administrator (CA). They read out the script and conduct the main tasks of the ceremony. The time every step is completed is recorded by the Internal Witness (IW), as are deviations from the script, called exceptions. There is also a Safe Security Controller (SSC) for each safe, responsible for opening and closing the safe and recording what is taken and put back.

Also present are representatives from the RZM Verisign, external auditors, and both staff and external witnesses. Anyone can apply to attend a ceremony (in non-COVID times) as an external witness. The final key role is system administrator, who is responsible for the filming and live-streaming of the ceremony, as well as the physical access system for the ceremony and safe rooms, and other support tasks.

As I mentioned in part 2, Verisign generates the root ZSKs, and in order to be useful, the ZSK needs to be signed by the root KSK. This is the main objective of the ceremonies – to generate signatures of the ZSK using the KSK. There are however a multitude of other tasks, including HSM introduction and destruction, KSK generation, safe maintenance and TCR rotation.

To start off, I’ll go through the simplest ceremony, where signature generation is the only task. I’m going to abbreviate things substantially as the scripts usually start at around 30 pages. If you want to follow along in the actual script, then I’d suggest using the one from ceremony 36, as it’s the simplest recent ceremony at time of writing. I’ll get to the complexities of more recent ceremonies later.

After starting the cameras and welcoming the participants, the CA, IW, SSC for safe 2, and COs enter the safe room. The SSC opens the safe, then each CO in turn(along with the CA using the common key) opens their safe deposit box to retrieve the needed smartcard. Each safe deposit box contains a second smartcard used for more advanced tasks, but if it’s not needed, it’s left in the safe. During the process, the TEBs of both cards are checked for integrity and their serial numbers are matched against what they were during the last ceremony. The safe is then locked and everyone exits the safe room.
After a short delay enforced by the physical access system, the CA and IW go back into the safe room with the SSC for safe 1. After safe 1 is opened, the CA retrieves the HSM, laptop, and OS DVD required for the ceremony. Again, the TEBs of both the removed and remaining items are verified, the safe is locked, and everyone exits the safe room.
The CA now sets up the equipment on a table at the front of the ceremony room. This involves plugging in and booting the laptop, which is only connected to its power cable, a USB printer, and an external monitor so the screen can be easily seen in the room. The hash of the OS DVD is then verified and the time set – as the laptop doesn’t have a battery and is never connected to the Internet, it won’t keep the time itself.

After verifying the contents of the HSMFD (which was with the OS DVD from the last ceremony) and starting audit logging, it’s time to setup the HSM.
The HSM is connected to the laptop by both a serial cable for logging purposes, and an ethernet cable over which the laptop and HSM communicate. At this stage, the HSM will not do anything as it’s inactive. To activate it, three COs come to the table and present their smartcards to the CA, who inserts them into the HSM in turn to activate it. The HSM is now ready for signing.
The CA now plugs in another flash drive(KSRFD) to the laptop, which contains the public part of the ZSK from the RZM. The CA runs a command on the laptop which prints out the hash of the ZSK. A representative from the RZM in the room reads off the hash from their own documentation, which all participants verify matches the hash displayed by the laptop. If the hashes match and there are no objections, the CA types “y” on the laptop to confirm the signing. The laptop communicates with the HSM and the generated signatures are saved to the KSRFD, to be given back to the RZM. First, however, the contents of the KSRFD are copied to the HSMFD as a backup. The signing also generates a log, which is printed and distributed to everyone in the room.
Now that the signatures are generated, the HSM can be deactivated, which again requires three CO cards. The HSM is sealed into a new TEB which has its serial number recorded so it can be verified at the next ceremony. The audit logging on the laptop is now stopped and saved to the HSMFD, which has five copies made for various audit processes. The audit logs are also printed. The laptop is now shut down and the HSMFD and OS DVD are put into a new TEB. The laptop and CO smartcards are also returned to their new TEBs.
The final stage is returning the equipment to the safes, starting with the HSM, laptop and OS DVD into safe 1, and then the CO smartcards into safe 2, in essentially the inverse process of steps 1 and 2. All participants must then come and sign the IW’s script to attest that it is an accurate record of the ceremony. The cameras are then stopped and various materials collected for auditing purposes, including logs from the physical access system.

Other ceremony tasks

TCR replacement

Some of the simplest secondary tasks are the replacement of RKSHs and COs. For an RKSH replacement, the outgoing RKSH attends a ceremony with their smartcards, and while an HSM is out of the safe for use in the ceremony, their smartcards are verified to still work, then are repackaged and given to the incoming RKSH.

For a CO replacement, the outgoing and incoming CO go into the safe room along with the CA, IW and SSC2. The outgoing CO opens their safe deposit box, removes the two TEBs and gives them to the incoming CO, who verifies their integrity, then places them into a new safe deposit box and collects its key.

In the event that an outgoing CO isn’t available for the transition, then a locksmith will attend to drill out the lock on the safe deposit box. This highlights another priority of the ceremonies: reliability. There are transparent and auditable processes to recover from pretty much any problem. In this case, as the cards are still in their TEBs inside the box, the audit trail remains intact.

Safe maintenance

On recovering from problems, ceremony 40 is a good example. Besides the standard signature generation, HSM3 (West) was scheduled to be decommissioned, a new set of replacement safe deposit box locks were to be prepared, and the locks on both safes were to be replaced.

On the Monday, the safe deposit box ceremony went smoothly, however on Tuesday, the SSC wasn’t able to open safe 1 for the lock to be changed. The backup SSC tried, but also couldn’t open it. It was determined that the lock had failed as it was accepting the code, however wasn’t physically unlocking. The lock on safe 2 worked fine and was successfully replaced, however with safe 1 unable to be opened, the ceremony wouldn’t be able to go ahead either, so on Friday, a locksmith attended to drill out the lock. It took almost 20 hours to open the safe, which is a testament to the strength of the lock! Once the safe was finally repaired and the lock replaced, the HSMs were tested to ensure their tamper mechanisms hadn’t tripped.

The ceremony could then take place late on Saturday evening. The destruction of HSM3 was postponed for a future ceremony due to the massive delay caused by the lock malfunction.

HSM tasks

The policies for the DNSSEC root management include HSM rotation schedules to ensure the HSMs do not fail over time. When a new HSM is to be introduced, multiple temporary smartcards are generated using an existing HSM in order to transfer the storage and access keys to the new HSM. After the new HSM is setup, the KSK can be imported from the KSK backup smartcard in safe 1. After this, the temporary smartcards are erased and then shredded.

When an HSM is due to be retired, the KSK is deleted from it, then its tamper mechanism is manually triggered to render in inoperable. It’s then disassembled and the sensitive components are sealed in a TEB to be sent to a third party for shredding.

When a new KSK is generated(which at this stage has only happened in 2010 and 2017), it’s generated inside one HSM and transferred to the other HSMs in both KMFs, again through smartcards(in TEBs when couriering between the two KMFs). An HSM can hold multiple KSKs at once and during the transition to a new KSK, both will be used to sign the ZSK(and each other). After the transition, the old KSK is deleted, and its backup smartcards wiped and shredded.

COVID-19 provisions

The two most recent ceremonies at time of writing, 41 and 42, were conducted in April 2020 and February 2021 respectively. As the COVID-19 pandemic made travel and indoor gatherings unsafe, a few major changes were implemented:

The West KMF was used as using the East KMF would have required IANA staff to fly across the country.
All participants who were able to, participated remotely via video call. This included the COs, who couriered their safe deposit box keys to IANA in TEBs prior to the ceremony. Each CO remotely granted the IANA staff permission to use their key and witnessed them being put into new TEBs to be sent back to them after the ceremony.
Instead of the usual three months’ worth of signatures being generated per ceremony, both of these ceremonies generated nine months’ worth of signatures each. The additional generated signatures were retained by IANA until the time they’d normally have been generated.
Non-essential tasks scheduled for the ceremonies were postponed.

At the time of writing, the next ceremony needs to happen around November 2021. The state of international travel and vaccination in the lead-up to then will determine whether a third minimal COVID-style ceremony will take place.

On the ‘Seven keys to the Internet’ analogy and accountability…

As I mentioned at the beginning of part 1, the DNSSEC process is often dramatised when presented to the general public. I’d like to counter some of the more common arguments, as it’s irritating to see people getting angry at things they don’t understand, and it isn’t helpful to the objective of trust in the DNSSEC processes.

The root KSK ceremonies are only for DNS, and only DNSSEC at that. Even if we assume that DNSSEC eventually reaches 100% adoption(which as I mentioned before we’re a long way from at the moment), DNSSEC keys do not, and never will, directly protect communications such as HTTPS.

But taking a step back from that, we need to look at the likelihood of various situations. Firstly, failure of the ceremony and DNSSEC would essentially require simultaneous disasters of extreme proportions at both KMFs such that all copies of the KSK were destroyed. In this situation, worldwide DNS recursor operators may have to disable DNSSEC validation temporarily if a new KSK couldn’t be setup before signatures expired.

Another common concern is various parties going rogue. In the case of COs, all they retain outside of a ceremony is the key to their safe deposit box. In the case of trying to break into the safes, this gives them barely any benefit over any person off the street, as the key is only useful once in the safe room, with the safe open. Getting to this point without detection would be almost impossible, and even then, all the materials are inside TEBs which would need to be broken in order to attain any access to the KSK.

So in a sense, there is a group of seven people with “keys to the Internet”, but on their own, without ICANN (and the tens-of-thousands-strong Internet community backing them), they’re powerless, so bringing is up as some sort of mind-blowing fact is disingenuous at best.

In the case of the RKSHs, if five or more were to go rogue, then they could only recover the HSM storage key, which without access to safe 1(and again breaking TEBs) does not provide them with the KSK.

Another thing people worry about is ICANN going rogue. This is difficult to address briefly, but the truth of the matter is that ICANN, and the Internet in general, is governed, maintained, and improved by a massive global community comprised of many different subgroups. There are almost certainly many people with the same ideas as you in the community who are making their voices heard. And while the community is, like many corporate environments, quite straight-white-male biased, change is slowly coming and it’s the responsibility of everyone to ensure that the Internet community is as representative of the world as possible.

The last party to talk about is the US government. While they no longer have any special “veto” over the Internet, many Internet companies, including ICANN and Verisign, are US-based and thus subject to US law. The US has used this power against Verisign, who also operate the .com TLD, in order to take down .com websites that violate US law. To attempt to use this power in a broader manner against ICANN would be an almost unimaginable step for the US government to take, and would undoubtedly face very vocal resistance from the Internet community.

There are some pushing for the physical aspects of the Internet to be more globalised, in addition to the multistakeholder communities and such that already exist. I’m broadly in support of that, however I’m not holding my breath. Internet governance has been a complicated topic pretty much since its inception and big changes take time.

I hope this series has been interesting. It got more technical than I anticipated, but I feel like I’ve covered everything pretty well. I’ve started a list of other useful resources around DNS and DNSSEC below and I’ll add to it if I find anything else interesting or useful. If you have any questions, leave a comment or ask me on Twitter.

Additional resources

The Key to the Internet and Key Ceremonies: An explainer — A blog article from Kim Davies, VP of IANA Services at ICANN, addressing some common questions around DNSSEC
The Problem with “The Seven Keys” — A blog article from ICANN debunking the “seven keys” myth
DNSSEC – What Is It and Why Is It Important? — Another article from ICANN explaining DNSSEC more generally
Episode 61 of the “Ask Mr. DNS” podcast — Kim Davies discusses DNSSEC and key ceremonies during COVID-19. I only found this podcast through the writing of this series, but it looks like it’s been running since 2008, so there’s probably a few interesting historical episodes, as DNSSEC was only rolled out in the root zone in 2010.
Cloudflare articles on both DNSSEC and Root KSK ceremonies, and DNS Security in general

Technically, the IANA functions are operated for ICANN by Public Technical Identifiers (PTI), an affiliate. Knowing this isn’t essential to understanding the general operations of IANA, but if you look deeper into the structure, then it’ll probably come up. PTI exists mainly to seperate IANA operations from the policy-making roles of ICANN. ↩︎

DNS, security and key ceremonies, oh my! – Part 2

2021-03-14T16:00:00+11:00

In my last post, I gave a pretty detailed overview of DNS itself and touched on its weaknesses in terms of authenticity and privacy. This time, I’ll go into detail on DNS Security Extensions (DNSSEC) and how it allows clients to establish a chain of authenticity back to an authoritative DNS server, preventing tampering by intermediate parties.

Why is DNSSEC important?

As I mentioned in my last post, DNS by itself is completely unencrypted and unauthenticated. This means that any networks between you and authoritative DNS servers can alter DNS responses, which can facilitate phishing and intercepting basically all traffic headed to a domain. By establishing an authentication chain, DNSSEC eliminates the ability of any intermediate party to alter DNS answers.

DNSSEC Overview

You’ll remember that zones in the DNS hierarchy are denoted by NS records in the parent zone that point to the child zone’s DNS servers. DNSSEC adds some additional records both around these zone boundaries and within the zone itself to build the chain of authenticity. They quite heavily involve public-private key cryptography and digital signatures, so if you need a primer on those, then I’d recommend this Computerphile video on public key cryptography by Rob Miles, and this one on digital signatures by Mike Pound.

Most zones using DNSSEC utilise two keypairs called the Key Signing Key (KSK) and Zone Signing Key (ZSK). It is possible to use one keypair, however using two makes it easier to facilitate the changing of keys over time.

When using two keypairs, all the RRsets in a zone are signed by the ZSK, then the ZSK is signed by the KSK, which is then added to the parent zone. An RRset (resource record set) is simply the group of records for a given name and type.

When a KSK is added to its parent zone, it is in turn signed by the parent’s ZSK, which is signed by the parent’s KSK, and so on up the DNS hierarchy. Thinking about this can be a bit confusing but luckily DNSViz is a great tool for visualising the DNS and DNSSEC hierarchies together.

In order to make sense of the DNSViz diagram, you’ll need to know some of the DNSSEC record types:

DNSKEY: These records exist at the top of a zone and store the public part of the KSK and ZSK for that zone. Depending on which domain you’re looking at and when, you might see more than just two DNSKEYs, which usually indicates that the keys are in the process of being changed.

In DNSViz, KSKs are shown with a grey background.
RRSIG: The most fundamental DNSSEC record, RRSIGs hold a signature of an RRset generated with the private key of the KSK or ZSK. Every authoritative RRset¹ in a DNSSEC-signed zone has an RRSIG, normally from the ZSK, except for the DNSKEY RRset, which is signed by the KSK.

RRSIGs also include an inception and expiration timestamp, outside of which they’re invalid. This guards against long-term replay attacks.

RRSIGs are represented in DNSViz by the arrow between a DNSKEY and RRset.
DS: Short for Delegation Signer, these records exist in a parent zone and store a hash of the KSK of a child zone. They have an RRSIG from the parent’s ZSK just like any other record in the zone, and therefore bridge the authenticity chain to the child zone.

Multiple DS records can exist for a given child zone, either using different hashing algorithms, or representing multiple KSKs in the child zone, for example when the child zone is changing its KSK.

Setting up and maintaining all these records manually would be nearly impossible, so modern DNS server software will generally auto-generate them and keep it all up to date for you, and you just have ask the parent zone to add the appropriate DS record. This is usually done out-of-band, such as through your domain registrar’s website. There are in-band DNS mechanisms to do this, but they’re not very widely supported.

DNSSEC Hierarchy

If you’ve looked at DNSViz, then you’ll have seen that the DNSSEC hierarchy ends much like the DNS hierarchy: in the root zone. The very top is the root KSK, the first of which was generated in mid 2010 and used until late 2018, when it was replaced by the current root KSK. The public component of the root KSK is, like the root server list, pretty much hard-coded into every single DNSSEC-validating resolver on the Internet, so it’s only changed very rarely.

The next step down is the root ZSK. Verisign, the root zone maintainer, generates the root ZSKs, with each one to be used for three months. However in order for DNS servers around the world to trust these keys, they of course need to be signed by the root KSK, which is maintained by ICANN through IANA.

The private part of the root KSK is stored in two redundant and extremely secure facilities in the US. It’s accessed during root key ceremonies(which are every quarter in non-COVID times) to generate a series of signatures authenticating the next root ZSK, valid for about three weeks each. I’ll expand on the root key ceremonies in part 3.

Authenticated non-existence

Here’s an issue you might not have thought about: we can authenticate DNS records that do exist through signatures, but what do we do about DNS records that don’t exist?

For say, the A records of a website, authenticating an empty answer doesn’t really do a whole lot besides maybe allowing the showing of a different error message. If an intermediate party has removed a record from a DNS answer, then we can’t get it back through authentication, and if they wanted to stop you from accessing a particular site, they could just block its DNS outright. In this situation, DoH or DoT could be a more useful fix.

The more important use of authenticated non-existence is at zone boundaries. When resolving, the lack of a DS record in a parent zone is taken to mean that the child does not have DNSSEC enabled. In this case, not authenticating the missing DS would mean that an intermediate party could have removed it from the answer, which would cause the resolver to believe that the child did not use DNSSEC. This bypasses the chain of authenticity, allowing the intermediate party to completely forge the child zone even though it had DNSSEC enabled.

To facilitate authenticated non-existence, there are two more record types: NSEC(next secure) and NSEC3(next secure v3). I’m not going to get too far into the differences between them here as it’s not really necessary to understanding how they work, which is wonderfully simple at its core: every name that does exist in a zone is given an NSEC record, which contains the next name in the zone when in alphabetical order, and also a list of the record types that exist for the current name.

This link to the next name essentially builds a loop around a zone: if you request a name that doesn’t exist, you’ll get an NSEC record (and its RRSIG) guaranteeing that the name doesn’t exist by essentially stating “nothing exists between these two names, and the name you wanted falls in that gap”.

If you request a name that does exist, but not with the type you requested(like the missing DS in the delegation to a non DNSSEC-enabled child), you’ll also get an NSEC and RRSIG stating “you wanted a DS record, but I only have an NS record”. You can see this in DNSViz for a zone that doesn’t have DNSSEC. In this case, the child zones can be said to be covered by the NSEC, meaning they’re not expected to have DNSSEC enabled.

You might have noticed that forming this NSEC loop makes it very easy to find the names that exist in a zone – if you request a name that doesn’t exist, the server will literally tell you two that do. This is called zone walking and while DNS is not at all intended to store secret data, there are some cases where being able to discover the whole zone’s contents is not ideal from a privacy perspective.

This is the main benefit of NSEC3 – it uses hashed versions of the names to make it more difficult to walk the zone. If you’re looking for a more detailed breakdown of NSEC and NSEC3, then RFC7129 is a good place to start.

There are also a few extra approaches to implementing NSEC in environments where the DNS servers are able to generate and sign records on the fly. They allow for complete mitigation of zone walking issues, decreased response sizes, and can reduce database load in environments where DNS records are more fluid than the traditional “zone file” paradigm. If you’re interested, take a look at Cloudflare’s blog article on the techniques they use.

DNSSEC validation

If you simply point a client to a public DNSSEC-validating recursor like 1.1.1.1, 9.9.9.9 or 8.8.8.8, then the recursor does all the DNSSEC validation and the path from the recursor to client is still open to tampering. Using DoH or DoT will fix this(and also give privacy) with the slight caveat that you must trust the recursor’s DNSSEC validation. This is what I do on my home network, as my router is able to act as a stub resolver with DoT.

The solution to the trust issue(if you consider it an issue) is either to run a local DNSSEC-validating recursor(which removes the possibility of using DoH/DoT for privacy) or to run a local DNSSEC-validating stub resolver.

Implementations of validating stub resolvers tend to be somewhat buggy as they’re not widely used, but theoretically a local stub resolver can re-verify DNSSEC on responses it gets from a recursor (optionally over DoH or DoT for privacy).

DNSSEC lookups

The basic sequence of operations for a DNSSEC lookup is the same as I covered for basic DNS in part 1, however all outgoing queries from the recursor will have the EDNS DO or “DNSSEC OK” option set, which tells authoritative servers to include DNSSEC records in their answers, and the recursor will of course have to validate the answers before returning an answer to the client.

DNSSEC behaviour differs slightly between DNS server software, so I’m basing my walk through here on one of the most popular suites, BIND.

On startup, BIND loads both the list of root servers and the root KSK from files. It then conducts a priming query, where it sends two queries to a root server, one asking for the root DNSKEY records and the other for the root NS records. This means that as long as the local files that BIND loaded are current to within a few years, it’ll be able to bootstrap itself into an up-to-date state.

As long as one root server still exists at the IP address recorded in the local file, BIND will be able to ask it for the addresses of the other roots. The root server IP addresses are only changed very rarely.

Similarly with the root KSK, as long as the key recorded in the local file is still in use, then BIND will be able to obtain a new root KSK should it exist, as when the KSK is changed, both keys coexist in the root zone for some period of time with each having an RRSIG from the other to authenticate the transition.

From the priming query, BIND also now has the current root ZSK and is ready to start resolution. BIND will repeat priming queries as needed when their TTL is passed.

As a recursor traverses the DNS hierarchy with the DO option set, it will acquire both the usual NS and other answer records, as well as DS, NSEC and RRSIG records. Once it’s gotten an answer through the recursive process, a recursor will also need to send an additional query to each zone in the current hierarchy for its DNSKEY records.

The recursor now has all the data it needs to validate the answer. There’s three verdicts a recursor can reach:

SECURE: The recursor was able to build a complete chain from the root KSK to the final answer. It will return the answer to the client who requested it with another DNS option, AD or “Authentic Data”, set.
INSECURE: The recursor couldn’t complete the authentication chain, however the requested name does not have DNSSEC enabled – it’s covered by an NSEC at some point in the hierarchy. It will return this answer to the client who requested it without AD set.
BOGUS: The recursor couldn’t complete the chain but expected to be able to – there was no NSEC covering the requested name. This is the situation that would occur if the DNS traffic had been tampered with, and so the recursor returns an empty answer to the client with the SERVFAIL error code.

A note on DNSSEC adoption

Much like IPv6, DNSSEC is an important Internet standard that’s been around in some form since the late 1990s (although admittedly not really a useful one until 2010) yet still faces dismal adoption in the real world. Outside of some Scandinavian countries (who offer discounts on domain registration for using DNSSEC), the rate of DNSSEC-enabled domains is usually less than 5%. DNSSEC validating resolvers are slightly more commonplace at around 25%.

With negative UI indicators and free certificates driving deployment of HTTPS, and the IPv4-pocalypse (slowly) driving IPv6 adoption, I don’t think it’s too far-fetched to think that at some point browser and device makers will add mechanisms to push DNSSEC adoption – Chrome and Firefox already have DoH auto-upgrade, so they’re clearly keen to make progress in this space.

If you’re starting with a new domain, or looking to move your DNS hosting to a new provider, it’s worth putting in the little bit of effort to find a provider which supports DNSSEC, and not just because of some vague potential for issues in the future that I’m theorising on, but because it offers a great(and usually quite easy) improvement to the security of your domain, whether or not you handle sensitive information (going back to the same idea as Does my site need HTTPS?).

A lot of the big providers do support DNSSEC, so have a look at your current provider – it could be as simple as a single option!

And if you’re interested to see if your device is receiving DNSSEC-validated answers, then have a look at the great Internet.nl tool from the Dutch Internet Standards Platform. The site also has tools to test a domain name, in addition to DNSViz.

Again, I’m open to answering questions in the comments or on Twitter. I’ll be back with part 3 soon :)

This note will probably make more sense once you’ve read the whole post, but there’s a nuance here:

In general, only the authoritative nameservers for a given zone can contain records for that zone. For example, the nameserver for xyz won’t return records for blog.tugzrida.xyz, because tugzrida.xyz is a new zone, delegated to a different server.

What the xyz zone does need to contain is NS, DS, NSEC, and possibly glue records for tugzrida.xyz, however it’s not really considered authoritative for them – the authoritative records for tugzrida.xyz are in tugzrida.xyz’s authoritative nameservers.

Because they’re not authoritative for them, parent zones will not contain RRSIG records covering a child’s NS or glue records. An exception from this rule is made for a child’s DS and NSEC records as they must be signed in order to maintain the chain of authenticity. If the key in the DS matches the key in the child’s DNSKEY, and that key is then used to sign the zone, then the NS records must be correct, or at the very worst, pointing to the wrong servers but which happen to be serving the unaltered zone anyway. ↩︎