DNS, security and key ceremonies, oh my! – Part 3

Now that I’ve explained both DNS and DNSSEC, I can cover the most widely known part of the DNSSEC infrastructure – the Root KSK Ceremonies. These ceremonies exist to provide transparency to the Internet community around the creation, use, and storage of the root KSK. Transparency is essential in establishing trust of the KSK – asking the Internet to just blindly trust something wouldn’t work, and rightly so!

Who runs the Internet?

Answering this question completely would take a very long time. Over the life of the Internet, it’s moved from being a US military project maintained by a handful of people, to today, where the Internet has penetrated a solid majority of the developed world, and a fair amount of the developing world – thus its history is long and complicated.

In short, operation of the DNS root(among other functions) is overseen by the Internet Assigned Numbers Authority (IANA), which is a function1 of ICANN, the Internet Corporation for Assigned Names and Numbers. On October 1st 2016, ICANN gained independence as a nonprofit organisation, governed by an international multistakeholder community. Prior to then, ICANN was contracted and overseen by the US Department of Commerce.

Another party involved in DNS root management is Verisign, a public US-based company. Aside from operating the A and J root name servers, Verisign is the Root Zone Maintainer(RZM), meaning they’re responsible for maintaining the root zone file at the direction of IANA.

The components of a ceremony

Ceremonies are held every three months and alternate between two redundant Key Management Facilities(KMFs) in the US – in Culpeper, Virginia, and El Segundo, California. These facilities consist of a secure ceremony room, subject to dual occupancy. Inside this room is a metal cage ‘safe room’, also subject to dual occupancy, which contains two safes.

Safe 1, the equipment safe, contains, each sealed in its own tamper-evident bag (TEB):

  • Hardware security modules. An HSM is a specialised device for securely storing private keys and allows interacting with them without the key itself leaving the HSM. HSMs are generally tamper-resistant, meaning they’ll delete the key and stop working if tampering is detected – usually through a combination of accelerometers, tamper switches and other mechanisms.

    There are often multiple HSMs in the safe at various stages of being commissioned or decommissioned, as well as two operational HSMs, which are alternated between each ceremony.

  • Two laptops, which are again alternated between. These are fairly normal laptops, however they are operated as airgapped machines with no permanent storage, so they have no WiFi, Bluetooth, battery, hard drive or SSD, and are never plugged into a network.

  • A DVD and USB flash drive. The DVD contains the operating system for the laptop, which is a minimal Debian build with some extra utilities for the ceremony.

    The flash drive, known as the HSMFD, serves as a record and backup of the signatures and logs created during the ceremony. Multiple copies are made during the ceremony to be distributed.

  • A smartcard containing an encrypted backup of the KSK

Safe 2, the credential safe, contains safe deposit boxes, which in turn contain more smartcards in TEBs that are required to activate the HSMs. The safe deposit boxes in this safe need two (physical) keys to be opened. One is retained by IANA, and the second key for each box remains under the custody of a Crypto Officer (CO).

COs are Trusted Community Representatives(TCRs), meaning they aren’t affiliated with IANA, ICANN or Verisign. Anyone can apply to be a CO, however a certain level of knowledge is required. There are seven COs for each KMF, though only three are required to operate an HSM.

There is another group of TCRs called Recovery Key Share Holders. There are seven RKSHs in total and they are custodians of smartcards containing parts of the encryption key under which the KSK is stored. The RKSHs are only needed in disaster recovery situations, and five RKSHs are required to recover the storage key.

As transparency and auditing is such a core part of the ceremonies, the participants follow a script to ensure correct policies are followed and deviations are recorded. The whole ceremony is also filmed from multiple angles and since 2018 is also live streamed on YouTube. The full resources from every ceremony since the start of DNSSEC signing in 2010 are on IANA’s website.

What happens at a Root KSK ceremony?

Besides the COs, there are a few other people involved in the ceremony, mostly IANA and ICANN staff. The leader of the ceremony is the Ceremony Administrator (CA). They read out the script and conduct the main tasks of the ceremony. The time every step is completed is recorded by the Internal Witness (IW), as are deviations from the script, called exceptions. There is also a Safe Security Controller (SSC) for each safe, responsible for opening and closing the safe and recording what is taken and put back.

Also present are representatives from the RZM Verisign, external auditors, and both staff and external witnesses. Anyone can apply to attend a ceremony (in non-COVID times) as an external witness. The final key role is system administrator, who is responsible for the filming and live-streaming of the ceremony, as well as the physical access system for the ceremony and safe rooms, and other support tasks.

As I mentioned in part 2, Verisign generates the root ZSKs, and in order to be useful, the ZSK needs to be signed by the root KSK. This is the main objective of the ceremonies – to generate signatures of the ZSK using the KSK. There are however a multitude of other tasks, including HSM introduction and destruction, KSK generation, safe maintenance and TCR rotation.

To start off, I’ll go through the simplest ceremony, where signature generation is the only task. I’m going to abbreviate things substantially as the scripts usually start at around 30 pages. If you want to follow along in the actual script, then I’d suggest using the one from ceremony 36, as it’s the simplest recent ceremony at time of writing. I’ll get to the complexities of more recent ceremonies later.

  1. After starting the cameras and welcoming the participants, the CA, IW, SSC for safe 2, and COs enter the safe room. The SSC opens the safe, then each CO in turn(along with the CA using the common key) opens their safe deposit box to retrieve the needed smartcard. Each safe deposit box contains a second smartcard used for more advanced tasks, but if it’s not needed, it’s left in the safe. During the process, the TEBs of both cards are checked for integrity and their serial numbers are matched against what they were during the last ceremony. The safe is then locked and everyone exits the safe room.

  2. After a short delay enforced by the physical access system, the CA and IW go back into the safe room with the SSC for safe 1. After safe 1 is opened, the CA retrieves the HSM, laptop, and OS DVD required for the ceremony. Again, the TEBs of both the removed and remaining items are verified, the safe is locked, and everyone exits the safe room.

  3. The CA now sets up the equipment on a table at the front of the ceremony room. This involves plugging in and booting the laptop, which is only connected to its power cable, a USB printer, and an external monitor so the screen can be easily seen in the room. The hash of the OS DVD is then verified and the time set – as the laptop doesn’t have a battery and is never connected to the Internet, it won’t keep the time itself.

    After verifying the contents of the HSMFD (which was with the OS DVD from the last ceremony) and starting audit logging, it’s time to setup the HSM.

  4. The HSM is connected to the laptop by both a serial cable for logging purposes, and an ethernet cable over which the laptop and HSM communicate. At this stage, the HSM will not do anything as it’s inactive. To activate it, three COs come to the table and present their smartcards to the CA, who inserts them into the HSM in turn to activate it. The HSM is now ready for signing.

  5. The CA now plugs in another flash drive(KSRFD) to the laptop, which contains the public part of the ZSK from the RZM. The CA runs a command on the laptop which prints out the hash of the ZSK. A representative from the RZM in the room reads off the hash from their own documentation, which all participants verify matches the hash displayed by the laptop. If the hashes match and there are no objections, the CA types “y” on the laptop to confirm the signing. The laptop communicates with the HSM and the generated signatures are saved to the KSRFD, to be given back to the RZM. First, however, the contents of the KSRFD are copied to the HSMFD as a backup. The signing also generates a log, which is printed and distributed to everyone in the room.

  6. Now that the signatures are generated, the HSM can be deactivated, which again requires three CO cards. The HSM is sealed into a new TEB which has its serial number recorded so it can be verified at the next ceremony. The audit logging on the laptop is now stopped and saved to the HSMFD, which has five copies made for various audit processes. The audit logs are also printed. The laptop is now shut down and the HSMFD and OS DVD are put into a new TEB. The laptop and CO smartcards are also returned to their new TEBs.

  7. The final stage is returning the equipment to the safes, starting with the HSM, laptop and OS DVD into safe 1, and then the CO smartcards into safe 2, in essentially the inverse process of steps 1 and 2. All participants must then come and sign the IW’s script to attest that it is an accurate record of the ceremony. The cameras are then stopped and various materials collected for auditing purposes, including logs from the physical access system.

Other ceremony tasks

TCR replacement

Some of the simplest secondary tasks are the replacement of RKSHs and COs. For an RKSH replacement, the outgoing RKSH attends a ceremony with their smartcards, and while an HSM is out of the safe for use in the ceremony, their smartcards are verified to still work, then are repackaged and given to the incoming RKSH.

For a CO replacement, the outgoing and incoming CO go into the safe room along with the CA, IW and SSC2. The outgoing CO opens their safe deposit box, removes the two TEBs and gives them to the incoming CO, who verifies their integrity, then places them into a new safe deposit box and collects its key.

In the event that an outgoing CO isn’t available for the transition, then a locksmith will attend to drill out the lock on the safe deposit box. This highlights another priority of the ceremonies: reliability. There are transparent and auditable processes to recover from pretty much any problem. In this case, as the cards are still in their TEBs inside the box, the audit trail remains intact.

Safe maintenance

On recovering from problems, ceremony 40 is a good example. Besides the standard signature generation, HSM3 (West) was scheduled to be decommissioned, a new set of replacement safe deposit box locks were to be prepared, and the locks on both safes were to be replaced.

On the Monday, the safe deposit box ceremony went smoothly, however on Tuesday, the SSC wasn’t able to open safe 1 for the lock to be changed. The backup SSC tried, but also couldn’t open it. It was determined that the lock had failed as it was accepting the code, however wasn’t physically unlocking. The lock on safe 2 worked fine and was successfully replaced, however with safe 1 unable to be opened, the ceremony wouldn’t be able to go ahead either, so on Friday, a locksmith attended to drill out the lock. It took almost 20 hours to open the safe, which is a testament to the strength of the lock! Once the safe was finally repaired and the lock replaced, the HSMs were tested to ensure their tamper mechanisms hadn’t tripped.

The ceremony could then take place late on Saturday evening. The destruction of HSM3 was postponed for a future ceremony due to the massive delay caused by the lock malfunction.

HSM tasks

The policies for the DNSSEC root management include HSM rotation schedules to ensure the HSMs do not fail over time. When a new HSM is to be introduced, multiple temporary smartcards are generated using an existing HSM in order to transfer the storage and access keys to the new HSM. After the new HSM is setup, the KSK can be imported from the KSK backup smartcard in safe 1. After this, the temporary smartcards are erased and then shredded.

When an HSM is due to be retired, the KSK is deleted from it, then its tamper mechanism is manually triggered to render in inoperable. It’s then disassembled and the sensitive components are sealed in a TEB to be sent to a third party for shredding.

When a new KSK is generated(which at this stage has only happened in 2010 and 2017), it’s generated inside one HSM and transferred to the other HSMs in both KMFs, again through smartcards(in TEBs when couriering between the two KMFs). An HSM can hold multiple KSKs at once and during the transition to a new KSK, both will be used to sign the ZSK(and each other). After the transition, the old KSK is deleted, and its backup smartcards wiped and shredded.

COVID-19 provisions

The two most recent ceremonies at time of writing, 41 and 42, were conducted in April 2020 and February 2021 respectively. As the COVID-19 pandemic made travel and indoor gatherings unsafe, a few major changes were implemented:

  • The West KMF was used as using the East KMF would have required IANA staff to fly across the country.

  • All participants who were able to, participated remotely via video call. This included the COs, who couriered their safe deposit box keys to IANA in TEBs prior to the ceremony. Each CO remotely granted the IANA staff permission to use their key and witnessed them being put into new TEBs to be sent back to them after the ceremony.

  • Instead of the usual three months’ worth of signatures being generated per ceremony, both of these ceremonies generated nine months’ worth of signatures each. The additional generated signatures were retained by IANA until the time they’d normally have been generated.

  • Non-essential tasks scheduled for the ceremonies were postponed.

At the time of writing, the next ceremony needs to happen around November 2021. The state of international travel and vaccination in the lead-up to then will determine whether a third minimal COVID-style ceremony will take place.

On the ‘Seven keys to the Internet’ analogy and accountability…

As I mentioned at the beginning of part 1, the DNSSEC process is often dramatised when presented to the general public. I’d like to counter some of the more common arguments, as it’s irritating to see people getting angry at things they don’t understand, and it isn’t helpful to the objective of trust in the DNSSEC processes.

The root KSK ceremonies are only for DNS, and only DNSSEC at that. Even if we assume that DNSSEC eventually reaches 100% adoption(which as I mentioned before we’re a long way from at the moment), DNSSEC keys do not, and never will, directly protect communications such as HTTPS.

But taking a step back from that, we need to look at the likelihood of various situations. Firstly, failure of the ceremony and DNSSEC would essentially require simultaneous disasters of extreme proportions at both KMFs such that all copies of the KSK were destroyed. In this situation, worldwide DNS recursor operators may have to disable DNSSEC validation temporarily if a new KSK couldn’t be setup before signatures expired.

Another common concern is various parties going rogue. In the case of COs, all they retain outside of a ceremony is the key to their safe deposit box. In the case of trying to break into the safes, this gives them barely any benefit over any person off the street, as the key is only useful once in the safe room, with the safe open. Getting to this point without detection would be almost impossible, and even then, all the materials are inside TEBs which would need to be broken in order to attain any access to the KSK.

So in a sense, there is a group of seven people with “keys to the Internet”, but on their own, without ICANN (and the tens-of-thousands-strong Internet community backing them), they’re powerless, so bringing is up as some sort of mind-blowing fact is disingenuous at best.

In the case of the RKSHs, if five or more were to go rogue, then they could only recover the HSM storage key, which without access to safe 1(and again breaking TEBs) does not provide them with the KSK.

Another thing people worry about is ICANN going rogue. This is difficult to address briefly, but the truth of the matter is that ICANN, and the Internet in general, is governed, maintained, and improved by a massive global community comprised of many different subgroups. There are almost certainly many people with the same ideas as you in the community who are making their voices heard. And while the community is, like many corporate environments, quite straight-white-male biased, change is slowly coming and it’s the responsibility of everyone to ensure that the Internet community is as representative of the world as possible.

The last party to talk about is the US government. While they no longer have any special “veto” over the Internet, many Internet companies, including ICANN and Verisign, are US-based and thus subject to US law. The US has used this power against Verisign, who also operate the .com TLD, in order to take down .com websites that violate US law. To attempt to use this power in a broader manner against ICANN would be an almost unimaginable step for the US government to take, and would undoubtedly face very vocal resistance from the Internet community.

There are some pushing for the physical aspects of the Internet to be more globalised, in addition to the multistakeholder communities and such that already exist. I’m broadly in support of that, however I’m not holding my breath. Internet governance has been a complicated topic pretty much since its inception and big changes take time.

I hope this series has been interesting. It got more technical than I anticipated, but I feel like I’ve covered everything pretty well. I’ve started a list of other useful resources around DNS and DNSSEC below and I’ll add to it if I find anything else interesting or useful. If you have any questions, leave a comment or ask me on Twitter.

Additional resources

  1. Technically, the IANA functions are operated for ICANN by Public Technical Identifiers (PTI), an affiliate. Knowing this isn’t essential to understanding the general operations of IANA, but if you look deeper into the structure, then it’ll probably come up. PTI exists mainly to seperate IANA operations from the policy-making roles of ICANN. ↩︎