DNS, security and key ceremonies, oh my! – Part 1

If you follow me on Twitter, you might have seen me live tweeting the 42nd DNSSEC root key signing ceremony a few weeks ago. Without any context, it was probably pretty boring and didn’t make much sense, but to me, it’s one of the most interesting and human parts of the massive global infrastructure that makes up the Internet.

While catching up on the specifics of the ceremony the day before, I realised that there’s not a whole lot of easily accessible (and to be honest, correct) information about it. In quite a few articles meant for the general public, it’s been clickbaited as “These Seven People can Turn Off the Internet!”, which really isn’t true. So I thought I’d write a bit about it myself, and try to present a factual but easy to understand explanation. A cornerstone of the DNSSEC key ceremony is, after all, establishing trust, and so some public explanations surely can’t hurt!

But before understanding DNSSEC, we really need to understand DNS itself, so I’ll start by explaining that.

What is DNS?

DNS, the Domain Name System, is often described as the “phonebook of the Internet”, which is pretty much accurate: its main job is converting domain names such as tugzrida.xyz to IP addresses such as Down at the hardware level, computers can’t reach each other by the human readable names we enter. Every device on the Internet has a (usually) unique IP(Internet protocol) address, and data travelling around the Internet is routed by blocks of IP addresses. This means that computers need a mechanism to find out which IP address to talk to when you enter a domain name, and DNS is that mechanism.

Pulling the abstraction back a layer, DNS is essentially a distributed hierarchical database. Clients can lookup records by two attributes, name and type, and receive an answer containing some number of records, each consisting of two more attributes, time to live(TTL) and data. Expanding on each of these:

  • Name: Simply the domain names we’re all familiar with: twitter.com, youtube.com, etc. Each part of a domain name, separated by a dot, is called a label, and often(but not always) denotes a layer in the DNS hierarchy, called a zone.

    For example in www.google.com.au., the first zone is the DNS root, the top of the hierarchy, which can be illustrated as a trailing dot. Next is au, delegated to auDA for use by Australia. They’ve chosen to denote com.au for commercial purposes and allow companies to buy names under there. Google has then bought google.com.au and can do whatever they want with any further labels. www (world wide web) is commonly used for websites, however it’s not usually necessary to type it in anymore, depending on how the website is configured.

  • Type: The type of data that is requested, whether it be an IPv4 address (type A), an IPv6 address (AAAA), the server to send emails to (MX), arbitrary text data (TXT), or a few dozen others.

    There is also a somewhat meta type, NS, which points to the DNS servers responsible for the named zone. For example, in the com.au zone, there are NS records with a name of google.com.au containing Google’s DNS servers: ns1.google.com, ns2.google.com, ns3.google.com and ns4.google.com.

  • TTL: This is a number of seconds. Devices may cache the record for this many seconds, which greatly increases the efficiency of DNS: your device doesn’t have to look up an answer every single time it needs it, it can use a previous answer if it’s still valid.

  • Data: The actual answer you requested. For some types, this is further structured for specific pieces of data, but in general you can think of it as one block.

So how does DNS work in practice? Well first, we need to learn the types of DNS servers that we’ll encounter:

  • Authoritative nameserver: A DNS server that holds the definitive set of records for a given zone. These servers make up the actual hierarchy of DNS.

  • Recursive resolver/recursor: A DNS server that traverses the hierarchy of authoritative servers to find an answer. In theory, every device on the Internet could act as its own recursor, but this would be quite inefficient and place a massive amount of load on authoritative nameservers, rather than distributing it between thousands of recursors, which implement caching so they don’t need to query authoritative servers quite as often.

  • Stub resolver/forwarder: A DNS server that simply passes all queries onto a recursive resolver – it doesn’t query authoritative servers itself. The main purpose of stub resolvers is caching, and in fact they can also be called caching servers. This can be a bit of a confusing name, because as I mentioned, recursors generally also cache answers.

    There can be any number of stub resolvers between your device and a recursor (or zero if your device is set to use a recursor directly), though having too many will slow things down.

DNS servers are often referred to like they’re a single machine, however it’s often the case that a single logical DNS server behind one IP address is serviced by many physical servers around the world, to improve response times and reliability.

DNS Lookups

When your device conducts a DNS lookup, assuming it doesn’t have a valid cached answer itself, it will ask the DNS server configured in its settings. This is often your router, which usually acts as a stub resolver. When your query eventually reaches a recursor is when the real work of finding the answer begins. Recursors are often run by ISPs for their customers, or by other parties for public use (like Cloudflare’s, Quad9, Google’s, etc).

Here’s an example of what a recursor would do to find the answer to a query for the AAAA record of blog.tugzrida.xyz. For simplicity, I’m assuming the recursor has no answers cached for any of the steps. If it did, it’d simply use the cached answer instead of querying a server.

  1. The recursor will ask one of the 13 root DNS servers for the AAAA record of blog.tugzrida.xyz.

    The root servers are the authoritative nameservers for the DNS root. The list of root servers and their IP addresses is hard-coded into every recursor – seeing as this is the starting point, there’s no other mechanism through which this list could be determined.

    As the root servers only know the root zone, they can’t return the requested AAAA record, but will instead return NS records for xyz, saying that the authoritative servers for xyz are x.nic.xyz, y.nic.xyz, z.nic.xyz and generationxyz.nic.xyz.

    If you’ve been following closely, then you might wonder how we get the IP addresses for these servers, seeing as they’re all inside xyz, but we can’t lookup any xyz names without being able to talk to these servers! Well, the response from the root server also contains glue records, which are the A and AAAA address records for these very servers.

  2. The recursor will pick one of the xyz authoritative servers and ask it for the AAAA record of blog.tugzrida.xyz.

    As these servers only know the xyz zone, they’ll again return NS records, this time for tugzrida.xyz and listing bella.ns.cloudflare.com and darwin.ns.cloudflare.com. In this case, no glue records are necessary as the nameserver names aren’t under tugzrida.xyz, however the recursor will need to conduct this whole lookup process to get the IP addresses for those server names before continuing if it doesn’t have them cached.

  3. The recursor will then ask one of the servers authoritative for tugzrida.xyz for the AAAA record of blog.tugzrida.xyz.

    In this case, two AAAA records containing an IPv6 address will be returned to the recursor, and then back to the original client, however there are two other things that could happen:

    • The authoritative server could return an NS record delegating blog.tugzrida.xyz to a different server. This brings up a subtlety I mentioned earlier: each label in a domain name can represent a new zone, but doesn’t always. Julia Evans actually did a thread on this very topic as I was writing this post.

      If I wanted, I could add a record like foo.bar.boop.honk.tugzrida.xyz to the authoritative server for tugzrida.xyz, which would still be in the tugzrida.xyz zone. Alternatively, I could add an NS record delegating honk.tugzrida.xyz to another server, which would form a new zone.

    • If the authoritative server doesn’t have any data matching the request, it’ll return an error: NXDOMAIN (nonexistent domain) if the name doesn’t exist at all, or NODATA if the name exists, but not with the requested type.

DNS’s Weaknesses

An important thing to note is that, like many protocols developed in the early stages of the Internet and world wide web, standard DNS traffic is completely unencrypted and unauthenticated. This means that any party in the right place on any network between your device, the recursor, and authoritative server can see the queries and answers, and alter the answers if they want.

In theory, this means a malicious party could redirect a legitimate website to a malicious site without any visible change to the user. Given that nearly 85% of web traffic is itself authenticated with HTTPS these days, such an attack is less possible, however not impossible as HTTPS certificates are themselves commonly authorized through DNS! Also, other protocols such as email are still somewhat lacking in security, so DNS security is definitely important.

There are a number of technologies involved in patching these privacy and authentication gaps:

  • Query name minimization (QNAME minimization or qmin): In the walkthrough of a recursive lookup, I said that the recursor will begin by asking a root server for the AAAA record of blog.tugzrida.xyz. This isn’t really necessary, as all the root server is ever going to tell us is the NS of xyz. qmin is the idea of asking for as little information as possible from authoritative servers for the benefit of privacy: the root servers don’t need to know the full name of the websites you’re going to, so there’s no reason a recursor should tell them. There are some technical challenges with qmin regarding zone boundaries, so it’s still in the experimental phase. If you’re interested in more detail, the experimental RFC for qmin explains it pretty well.

  • DNS over HTTPS(DoH) and DNS over TLS(DoT): These are both methods of encrypting DNS traffic, primarily between client devices and resursors. This serves both a privacy and authenticity purpose: no one between you and the recursor can see which names you’re looking up or modify the responses. Both protocols are becoming pretty widely implemented: all three public recursive resolvers I mentioned before support them, so does iOS 14 and macOS Big Sur (via profiles), and Android has supported DoT since version 9/Pie. Alone, either of these two protocols still require the user to ‘trust’ the recursor itself, and the traffic between the recursor and authoritative servers is still completely unprotected.

  • DNSSEC: Using DNSSEC (DNS Security Extensions), it’s possible for a client to establish a chain of authenticity right back to the authoritative server. This prevents any party other than the owner of the domain making changes to answer data.

And I think I’ll leave it there for now. Hopefully I didn’t explode your brain! In part 2, I’ll go into the details of DNSSEC.

In the meantime, if you have any questions, feel free to leave a comment or tweet me :)