Andrea Corbellini

Running the operating system that you're currently using in a virtual machine (with Secure Boot and TPM emulation)

andreacorbellini — Sun, 19 Nov 2023 16:33:00 +0000

In this article I will show you how to start your current operating system inside a virtual machine. That is: launching the operating system (with all your settings, files, and everything), inside a virtual machine, while you’re using it.

This article was written for Ubuntu, but it can be easily adapted to other distributions, and with appropriate care it can be adapted to non-Linux kernels and operating systems as well.

Motivation

Before we start, why would a sane person want to do this in the first place? Well, here’s why I did it:

To test changes that affect Secure Boot without a reboot.

Recently I was doing some experiments with Secure Boot and the Trusted Platform Module (TPM) on a new laptop, and I got frustrated by how time consuming it was to test changes to the boot chain. Every time I modified a file involved during boot, I would need to reboot, then log in, then re-open my terminal windows and files to make more modifications… Plus, whenever I screwed up, I would need to manually recover my system, which would be even more time consuming.

I thought that I could speed up my experiments by using a virtual machine instead.
To predict the future TPM state (in particular, the values of PCRs 4, 5, 8, and 9) after a change, without a reboot.

I wanted to predict the values of my TPM PCR banks after making changes to the bootloader, kernel, and initrd. Writing a script to calculate the PCR values automatically is in principle not that hard (and I actually did it before, in a different context), but I wanted a robust, generic solution that would work on most systems and in most situations, and emulation was the natural choice.
And, of course, just for the fun of it!

To be honest, I’m not a big fan of Secure Boot. The reason why I’ve been working on it is simply that it’s the standard nowadays and so I have to stick with it. Also, there are no real alternatives out there to achieve the same goals. I’ll write an article about Secure Boot in the future to explain the reasons why I don’t like it, and how to make it work better, but that’s another story…

Procedure

The procedure that I’m going to describe has 3 main steps:

create a copy of your drive
emulate a TPM device using swtpm
emulate the system with QEMU

I’ve tested this procedure on Ubuntu 23.04 (Lunar) and 23.10 (Mantic), but it should work on any Linux distribution with minimal adjustments. The general approach can be used for any operating system, as long as appropriate replacements for QEMU and swtpm exist.

Prerequisites

Before we can start, we need to install:

QEMU: a virtual machine emulator
swtpm: a TPM emulator
OVMF: a UEFI firmware implementation

On a recent version of Ubuntu, these can be installed with:

sudo apt install qemu-system-x86 ovmf swtpm

Note that OVMF only supports the x86_64 architecture, so we can only emulate that. If you run a different architecture, you’ll need to find another UEFI implementation that is not OVMF (but I’m not aware of any freely available ones).

Create a copy of your drive

We can decide to either:

Choice #1: run only the components involved early at boot (shim, bootloader, kernel, initrd). This is useful if you, like me, only need to test those components and how they affect Secure Boot and the TPM, and don’t really care about the rest (the init process, login manager, …).
Choice #2: run the entire operating system. This can give you a fully usable operating system running inside the virtual machine, but may also result in some instability inside the guest (because we’re giving it a filesystem that is in use), and may also lead to some data loss if we’re not careful and make typos. Use with care!

Choice #1: Early boot components only

If we’re interested in the early boot components only, then we need to make a copy the following from our drive: the GPT partition table, the EFI partition, and the /boot partition (if we have one). Usually all these 3 pieces are at the “start” of the drive, but this is not always the case.

To figure out where the partitions are located, run:

sudo parted -l

On my system, this is the output:

Model: WD_BLACK SN750 2TB (nvme)
Disk /dev/nvme0n1: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  525MB   524MB   fat32              boot, esp
 2      525MB   1599MB  1074MB  ext4
 3      1599MB  2000GB  1999GB                     lvm

In my case, the partition number 1 is the EFI partition, and the partition number 2 is the /boot partition. If you’re not sure what partitions to look for, run mount | grep -e /boot -e /efi. Note that, on some distributions (most notably the ones that use systemd-boot), a /boot partition may not exist, so you can leave that out in that case.

Anyway, in my case, I need to copy the first 1599 MB of my drive, because that’s where the data I’m interested in ends: those first 1599 MB contain the GPT partition table (which is always at the start of the drive), the EFI partition, and the /boot partition.

Now that we have identified how many bytes to copy, we can copy them to a file named drive.img with dd (maybe after running sync to make sure that all changes have been committed):

# replace '/dev/nvme0n1' with your main drive (which may be '/dev/sda' instead),
# and 'count' with the number of MBs to copy
sync && sudo -g disk dd if=/dev/nvme0n1 of=drive.img bs=1M count=1599 conv=sparse

Choice #2: Entire system

If we want to run our entire system in a virtual machine, then I would recommend creating a QEMU copy-on-write (COW) file:

# replace '/dev/nvme0n1' with your main drive (which may be '/dev/sda' instead)
sudo -g disk qemu-img create -f qcow2 -b /dev/nvme0n1 -F raw drive.qcow2

This will create a new copy-on-write image using /dev/nvme0n1 as its “backing storage”. Be very careful when running this command: you don’t want to mess up the order of the arguments, or you might end up writing to your storage device (leading to data loss)!

The advantage of using a copy-on-write file, as opposed to copying the whole drive, is that this is much faster. Also, if we had to copy the entire drive, we might not even have enough space for it (even when using sparse files).

The big drawback of using a copy-on-write file is that, because our main drive likely contains filesystems that are mounted read-write, any modification to the filesystems on the host may be perceived as data corruption on the guest, and that in turn may cause all sort of bad consequences inside the guest, including kernel panics.

Another drawback is that, with this solution, later we will need to give QEMU permission to read our drive, and if we’re not careful enough with the commands we type (e.g. we swap the order of some arguments, or make some typos), we may potentially end up writing to the drive instead.

Emulate a TPM device using swtpm

There are various ways to run the swtpm emulator. Here I will use the “vTPM proxy” way, which is not the easiest, but has the advantage that the emulated device will look like a real TPM device not only to the guest, but also to the host, so that we can inspect its PCR banks (among other things) from the host using familiar tools like tpm2_pcrread.

First, enable the tpm_vtpm_proxy module (which is not enabled by default on Ubuntu):

sudo modprobe tpm_vtpm_proxy

If that worked, we should have a /dev/vtpmx device. We can verify its presence with:

ls /dev/vtpmx

swtpm in “vTPM proxy” mode will interact with /dev/vtpmx, but in order to do so it needs the sys_admin capability. On Ubuntu, swtpm ships with this capability explicitly disabled by AppArmor, but we can enable it with:

sudo sh -c "echo '  capability sys_admin,' > /etc/apparmor.d/local/usr.bin.swtpm"
systemctl reload apparmor

Now that /dev/vtpmx is present, and swtpm can talk to it, we can run swtpm in “vTPM proxy” mode:

sudo mkdir /tpm/swtpm-state
sudo swtpm chardev --tpmstate dir=/tmp/swtpm-state --vtpm-proxy --tpm2

Upon start, swtpm should create a new /dev/tpmN device and print its name on the terminal. On my system, I already have a real TPM on /dev/tpm0, and therefore swtpm allocates /dev/tpm1.

The emulated TPM device will need to be readable and writeable by QEMU, but the emulated TPM device is by default accessible only by root, so either we run QEMU as root (not recommended), or we relax the permissions on the device:

# replace '/dev/tpm1' with the device created by swtpm
sudo chmod a+rw /dev/tpm1

Make sure not to accidentally change the permissions of your real TPM device!

Emulate the system with QEMU

Inside the QEMU emulator, we will run the OVMF UEFI firmware. On Ubuntu, the firmware comes in 2 flavors:

with Secure Boot enabled (/usr/share/OVMF/OVMF_CODE_4M.ms.fd), and
with Secure Boot disabled (in /usr/share/OVMF/OVMF_CODE_4M.fd)

(There are actually even more flavors, see this AskUbuntu question for the details.)

In the commands that follow I’m going to use the Secure Boot flavor, but if you need to disable Secure Boot in your guest, just replace .ms.fd with .fd in all the commands below.

To use OVMF, first we need to copy the EFI variables to a file that can be read & written by QEMU:

cp /usr/share/OVMF/OVMF_VARS_4M.ms.fd /tmp/

This file (/tmp/OVMF_VARS_4M.ms.fd) will be the equivalent of the EFI flash storage, and it’s where OVMF will read and store its configuration, which is why we need to make a copy of it (to avoid modifications to the original file).

Now we’re ready to run QEMU:

If you copied only the early boot files (choice #1):

# replace '/dev/tpm1' with the device created by swtpm
qemu-system-x86_64 \
  -accel kvm \
  -machine q35,smm=on \
  -cpu host \
  -smp cores=4,threads=1 \
  -m 4096 \
  -vga virtio \
  -bios /usr/share/ovmf/OVMF.fd \
  -drive if=pflash,unit=0,format=raw,file=/usr/share/OVMF/OVMF_CODE_4M.ms.fd,readonly=on \
  -drive if=pflash,unit=1,format=raw,file=/tmp/OVMF_VARS_4M.ms.fd \
  -drive if=virtio,format=raw,file=drive.img \
  -tpmdev passthrough,id=tpm0,path=/dev/tpm1,cancel-path=/dev/null \
  -device tpm-tis,tpmdev=tpm0

If you have a copy-on-write file for the entire system (choice #2):

# replace '/dev/tpm1' with the device created by swtpm
sudo -g disk qemu-system-x86_64 \
  -accel kvm \
  -machine q35,smm=on \
  -cpu host \
  -smp cores=4,threads=1 \
  -m 4096 \
  -vga virtio \
  -bios /usr/share/ovmf/OVMF.fd \
  -drive if=pflash,unit=0,format=raw,file=/usr/share/OVMF/OVMF_CODE_4M.ms.fd,readonly=on \
  -drive if=pflash,unit=1,format=raw,file=/tmp/OVMF_VARS_4M.ms.fd \
  -drive if=virtio,format=qcow2,file=drive.qcow2 \
  -tpmdev passthrough,id=tpm0,path=/dev/tpm1,cancel-path=/dev/null \
  -device tpm-tis,tpmdev=tpm0

Note that this last command makes QEMU run as the disk group: on Ubuntu, this group has the permission to read and write all storage devices, so be careful when running this command, or you risk losing your files forever! If you want to add more safety, you may consider using an ACL to give the user running QEMU read-only permission to your backing storage.

In either case, after launching QEMU, our operating system should boot… while running inside itself!

In some circumstances though it may happen that the wrong operating system is booted, or that you end up at the EFI setup screen. This can happen if your system is not configured to boot from the “first” EFI entry listed in the EFI partition. Because the boot order is not recorded anywhere on the storage device (it’s recorded in the EFI flash memory), of course OVMF won’t know which operating system you intended to boot, and will just attempt to launch the first one it finds. You can use the EFI setup screen provided by OVMF to change the boot order in the way you like. After that, changes will be saved into the /tmp/OVMF_VARS_4M.ms.fd file on the host: you should keep a copy of that file so that, next time you launch QEMU, you’ll boot directly into your operating system.

Reading PCR banks after boot

Once our operating system has launched inside QEMU, and after the boot process is complete, the PCR banks will be filled and recorded by swtpm.

If we choose to copy only the early boot files (choice #1), then of course our operating system won’t be fully booted: it’ll likely hang waiting for the root filesystem to appear, and may eventually drop to the initrd shell. None of that really matters if all we want is to see the PCR values stored by the bootloader.

Before we can extract those PCR values, we first need to stop QEMU (Ctrl-C is fine), and then we can read it with tpm2_pcrread:

# replace '/dev/tpm1' with the device created by swtpm
tpm2_pcrread -T device:/dev/tpm1

Using the method described here in this article, PCRs 4, 5, 8, and 9 inside the emulated TPM should match the PCRs in our real TPM. And here comes an interesting application of this method: if we upgrade our bootloader or kernel, and we want to know the future PCR values that our system will have after reboot, we can simply follow this procedure and obtain those PCR values without shutting down our system! This can be especially useful if we use TPM sealing: we can reseal our secrets and make them unsealable at the next reboot without trouble.

Restarting the virtual machine

If we want to restart the guest inside the virtual machine, and obtain a consistent TPM state every time, we should start from a “clean” state every time, which means:

restart swtpm
recreate the drive.img or drive.qcow2 file
launch QEMU again

If we don’t restart swtpm, the virtual TPM state (and in particular the PCR banks) won’t be cleared, and new PCR measurements will simply be added on top of the existing state. If we don’t recreate the drive file, it’s possible that some modifications to the filesystems will have an impact on the future PCR measurements.

We don’t necessarily need to recreate the /tmp/OVMF_VARS_4M.ms.fd file every time. In fact, if you need to modify any EFI setting to make your system bootable, you might want to preserve it so that you don’t need to change EFI settings at every boot.

Automating the entire process

I’m (very slowly) working on turning this entire procedure into a script, so that everything can be automated. Once I find some time I’ll finish the script and publish it, so if you liked this article, stay tuned, and let me know if you have any comment/suggestion/improvement/critique!

How to run Remark42 on Fly.io

andreacorbellini — Tue, 19 Sep 2023 02:12:00 +0000

As I wrote on my previous post, I recently switched from Disqus to Remark42 for the comments on my blog. Here I will explain how I set it up on Fly.io.

Overview

The setup that I ended up with looks like the following:

Something to note about this setup is that the “machine” (more on that later) and the storage volume are both a single instance. This is not a distributed setup. This is because Remark42 stores comments in a single file and does not make use of a distributed database. This is listed as a “feature” on the Remark42 website. How one is supposed to implement replication? I have no idea. Thankfully Fly.io seems to be fast to provision machines, and the Remark42 daemon also seems fast to start, so hopefully if a problem occurs (or when updates are required), the downtime will be minimal.

It is imperative however to understand that, because of the non-distributed/non-replicated nature of this setup, backups should be made periodically to avoid the risk of losing your comments forever.

Preliminaries

Before setting up Remark42, I had never used Fly.io before. As Fly.io newbie, I would describe it as a cloud provider focused on Docker containers. Fly.io uses some concepts (like “apps” and “machines”) that make sense after you practice a bit with them, but as a beginner they are not the easiest to learn. Most of the complexity I think comes from the fact that the Fly.io documentation is poorly written. On top of that, it appears that Fly.io is migrating their offering from “V1 apps” to “V2 apps”, and today some documentation applies only to “V1 apps”, other pieces apply only to “V2 apps”, resulting in a big mess. The error messages you get are also far from clear.

But don’t get too scared: once you get to know Fly.io, it can actually be fun to use.

Creating resources on Fly.io requires installing their command line client: flyctl. Because I do not like to run unknown software unconfined, I packaged it as a snap that you can install using:

snap install andrea-flyctl

Another source of confusion that I had the beginning was that, by reading the documentation, it looked like a second command line tool named fly was needed in addition to flyctl. It turns out that fly and flyctl are the same thing, it’s just that they’re transitioning from a name to another. If you installed the tool through the snap, you can set up these aliases so that you can copy and paste commands without trouble:

alias fly=/snap/bin/andrea-flyctl.fly
alias flyctl=/snap/bin/andrea-flyctl.fly

According to the documentation (and assuming it’s up-to-date), flyctl does not support everything that Fly.io supports, so sometimes curl is used to interact directly with the Fly.io API. In order to use that, you’ll need to download an authentication token from the Fly.io interface and store it in a file (that I’ll call ~/fly-token from now on).

I’m going to skip over the steps to create and configure a Fly.io account, obtaining an authentication token, as those were easy steps in my opinion.

Creating a machine

A Fly.io “machine” is a virtual machine running a single Docker container with a persistent volume attached to it. In order to create my Fly.io machine to run Remark42 in it, I loosely followed this page from the Fly.io documentation: Run User Code on Fly Machines . “Loosely” because it turned out that some pieces on that page are not fully correct, but anyway…

Before creating a machine, you first need to create an “app”. A Fly.io app is basically an endpoint, which consists of a DNS name (in the form ${app_name}.fly.dev), and a set of IP addresses. Behind these IP addresses there are Fly.io load balancers that will forward requests to the machines inside the app.

You can do that through the API like this:

curl -X POST \
  -H "Authorization: Bearer $(<~/fly-token)" \
  -H 'Content-Type: application/json' \
  'https://api.machines.dev/v1/apps' \
  -d '{ "app_name": "${app_name}", "org_slug": "personal" }'

(Replace ${app_name} with some identifier of your choice; I chose remark42 without knowing that this would have removed the possibility for other people to register an app with the same name.)

IP addresses need to be manually allocated:

fly ips allocate-v4 --app=${app_name} --shared
fly ips allocate-v6 --app=${app_name}

The --shared option to allocate-v4 tells Fly.io to allocate an IP address that may be shared with other Fly.io apps, even outside of your account/organization. Remove --shared if you want to use a dedicated IP, but note that dedicated IPv4 addresses is a paid feature.

Allocating IPs is an important step: it can be done later, after creating the machine, but it must be done, otherwise your machine will be unreachable and it won’t be obvious why.

You should now create a persistent volume for your machine:

fly volume create remark42_db_0 --app=${app_name} --size=1

This will display a warning about replication, but you can ignore it because, sadly, Remark42 does not support replication.

Remark42 needs to be given a secret key (I guess for the purpose of signing JWT tokens). Fly.io has a handy feature to manage secrets, and make them available to machines, albeit poorly documented. You can set the Remark42 secret like this:

fly secrets set --app=${app_name} SECRET='a very secret string'

(You can generate a random secret string with a command like cat /dev/urandom | tr -Cd 'a-zA-Z0-9' | head -c64, which means: get some random bytes, keep only alphanumeric characters, get the first 64 characters.)

You may be wondering: how is the container running inside the machine supposed to access this secret? The Fly.io documentation doesn’t say a word about it, but after experimenting I was able to find that all the app secrets are passed as environment variables, which is great, because this is exactly what Remark42 expects.

Note: it’s important to set SECRET before creating the machine, or Remark42 will refuse to start.

Now you’re ready to spin up the machine: create a configuration file for it…

{
  "name": "remark42-0",
  "config": {
    "image": "umputun/remark42:latest",
    "env": {
      "SITE": "andrea.corbellini.name",
      "REMARK_URL": "https://${app_name}.fly.dev",
      "ALLOWED_HOSTS": "'self',https://andrea.corbellini.name",
      "AUTH_SAME_SITE": "none",
      "AUTH_ANON": "true",
      "AUTH_EMAIL_ENABLE": "true",
      "AUTH_EMAIL_FROM": "Andrea's Blog <hi@andrea.corbellini.name>",
      "AUTH_EMAIL_SUBJ": "Andrea's Blog - Email Confirmation",
      "NOTIFY_USERS": "email",
      "NOTIFY_ADMINS": "email",
      "NOTIFY_EMAIL_FROM": "Andrea's Blog <hi@andrea.corbellini.name>",
      "ADMIN_SHARED_EMAIL": "corbellini.andrea@gmail.com",
    },
    "mounts": [
      {
        "volume": "${volume_id}",
        "path": "/srv/var"
      }
    ],
    "services": [
      {
        "ports": [
          {
            "port": 443,
            "handlers": [
              "tls",
              "http"
            ]
          },
          {
            "port": 80,
            "handlers": [
              "http"
            ]
          }
        ],
        "protocol": "tcp",
        "internal_port": 8080
      }
    ],
    "checks": {
      "httpget": {
        "type": "http",
        "port": 8080,
        "method": "GET",
        "path": "/ping"
        "interval": "15s",
        "timeout": "10s",
      }
    },
    "metadata": {
      "fly_platform_version": "v2",
    }
  }
}

…and give it to Fly.io:

curl -X POST \
  -H "Authorization: Bearer $(<~/fly-token)" \
  -H 'Content-Type: application/json' \
  "https://api.machines.dev/v1/apps/${app_name}/machines"
  -d @config.json

There’s a lot here, so let me break it down for you:

"image": "umputun/remark42:latest": this is the Docker image for Remark42.
"env": { ... }: these are all the environment variables to pass to our container. They are briefly documented on the Remark42 website, and here’s a bit more detailed explanation of some of them:
- "SITE": "andrea.corbellini.name": this is the internal identifier for the site, it can be an arbitrary string, it won’t be visible, and you can omit it.
- "REMARK_URL": "https://${app_name}.fly.dev": this is the URL where Remark42 will be serving requests from. I set it to the Fly.io app endpoint. It’s important that you do not put a trailing slash, or Remark42 will error out later on. It’s also important that the protocol (http or https) matches your blog’s protocol, or Remark42 will refuse to display comments (this makes local testing a bit annoying).
- "ALLOWED_HOSTS": "'self',https://andrea.corbellini.name": this is the list of sources that will be put into the Content-Security-Policy: frame-ancestors header) of HTTP responses. Essentially, this defines where the Remark42 comments can be displayed.
- "AUTH_SAME_SITE": "none": this disable the “same site” policy for cookies. Disabling it is necessary because, in my setup, comments are served from one domain (remark42.fly.dev) to another domain (andrea.corbellini.name).
- "AUTH_ANON": "true": allows anonymous commenters. You may or may not want it.
- "AUTH_EMAIL_ENABLE": "true" and friends: allows email-based authentication of commenters.
- "NOTIFY_USERS" "email": allows readers and commenters to be notified of new comments via email.
- "NOTIFY_ADMINS" "email" and "ADMIN_SHARED_EMAIL": "corbellini.andrea@gmail.com": makes Remark42 send me an email every time there’s a new comment.
"mounts": [ ... ]: this tells Fly.io to attach the volume that you created earlier to the container at the path /srv/var, which is what Remark42 uses to store its database as well as daily backups.
"services": [ ... ]: this tells Fly.io what to expose through the load balancer. With the configuration that I provided, the Fly.io endpoint (${app_name}.fly.dev) will provide both HTTP and HTTPS to the internet. However, the load balancer will talk to the machine over plain HTTP on port 8080 (meaning that TLS is terminated at the load balancer).

I think in the future I will setup certbot inside the container so that I can do TLS termination on the machine, but not today.
"checks": { ... }: this tells Fly.io to check if the Remark42 daemon is healthy by using its /pingendpoint.
"metadata": { "fly_platform_version": "v2" }: this tells Fly.io to use a “V2 machine”, or something like that. Setting this metadata is very important, or certain things won’t work later on. The Fly.io documentation doesn’t tell you to do it, but this is needed if you need to update the environment variables or the secrets inside the machine.

Note that all of this configuration can be changed at any time, so if you make any mistakes or you just want to experiment, you don’t have to overly worry. You can even destroy your machine and recreate it from scratch if you want.

To view the configuration of an existing machine use the following:

curl \
  -H "Authorization: Bearer $(<~/fly-token)" \
  "https://api.machines.dev/v1/apps/${app_name}/machines/${machine_id}"

And to update it:

curl -X POST \
  -H "Authorization: Bearer $(<~/fly-token)" \
  -H 'Content-Type: application/json' \
  "https://api.machines.dev/v1/apps/${app_name}/machines/${machine_id}" \
  -d @new-config.json

I was also successful at changing configuration using fly machines update, although it can’t be used for everything (for example: it can be used to add or change environment variables, but not to remove them).

Testing the setup

If everything went well, you should be able to interact with Remark42 at https://${app_name}.fly.dev/web. This should let you read and post new comments.

Configuring Remark42 to send emails

For sending emails, I chose to use ~~Elastic Email~~ Mailtrap, which is an email-delivery service that supports SMTP with STARTTLS. Creating a Mailtrap account, setting up DKIM and SPF, and obtaining SMTP credentials was extremely easy, so I won’t cover it here.

UPDATE: I initially chose to go with Elastic Email, but I found it to be garbage. They force the insertion of tracking URLs every one of your emails, and they refuse to disable tracking if you ask them to.

Setting up email delivery with Remark42 is pretty easy once you have the SMTP credentials. Set the necessary (non-secret) configuration like this:

fly machines update ${machine_id} --app=${app_name} \
  -e SMTP_HOST=live.smtp.mailtrap.io \
  -e SMTP_PORT=587 \
  -e SMTP_STARTTLS=true \
  -e SMTP_USERNAME=...

And then set the SMTP password as a Fly.io secret:

fly secrets set --app=${app_name} SMTP_PASSWD='a very secret password'

Doing both machines update and secrets set will automatically restart the machine so that Remark42 can pick up the new configuration. Pretty neat, heh?

Configuring authentication providers for Remark42

Remark42 can let your users log in from a variety of providers, including: GitHub, Google, Facebook, Telegram, and more. There are specific instructions for each provider in the Remark42 documentation. There’s really not much to add on top of what’s already written there. Just remember: set non-secret environment variables with fly machines update, and set secrets with fly secrets set.

Creating an administrator account

If you want to be able to moderate comments, you’ll need an administrator account. With Remark42, this is a 3 step process: first you create an account (like any other user would do), then you copy the ID of the user you just created, and lastly you add that user ID to the ADMIN_SHARED_ID environment variable:

fly machines update ${machine_id} --app=${app_name} -e ADMIN_SHARED_ID=...

As step-by-step guide is on the Remark42 documentation.

Importing comments from Disqus (or any other platform)

In order to import comments into Remark42, first you need to temporarily set an “admin password” for Remark42 (here the word “admin” has nothing to do with the administrator account you just created; it’s a totally separate concept):

fly secrets set --app=${app_name} ADMIN_PASSWD='this is super secret'

You can now copy your Disqus (or equivalent) backup on the machine and import it. I could not find an easy way to do it through flyctl (but I also did not spend too much time looking for an option), I did however find a way to open a console on the machine, so what I did was simply copying and pasting the base64-encoded backup:

# on my laptop
base64 < disqus-export.xml.gz  # copy the output

# attach to the machine
fly console --app=${app_name} --machine=${machine_id}

# on the machine
cd /srv/var
base64 -d > disqus-export.xml.gz  # paste the output from earlier
gunzip disqus-export.xml.gz
import --provider=disqus --file=/srv/var/disqus-export.xml --url=http://localhost:8080
rm disqus-export.xml

Note: importing comments will clear the Remark42 database. Any pre-existing comment will be deleted. See also the Remark42 documentation for more information.

Another note: for some reason, my Disqus export referenced my blog posts using http:// URLs instead of https://. Because of that, Remark42 did correctly import all the Disqus comments in its database, but would not display them under my blog posts. Remember: Remark42 is very picky when it comes to URL schemes. To fix this, I simply created a backup from Remark42, modified the backup to change all http entries to https, and then restored the backup. This was quite trivial given that the format used by the backups is extremely intuitive.

Final remarks

That was it!

Setting up Remark42 on Fly.io wasn’t particularly difficult, but it took me way more time than expected due to the poor documentation of both Remark42 and Fly.io. I had to resort to trial-and-error multiple times to make things work.

One big drawback of Remark42 is that it does not allow replication. This means that:

if the machine running my instance of Remark42 goes down, or becomes unreachable for any reason, there will be downtime;
some people who are “far away” from the Remark42 instance may experience higher latency than others;
I need to periodically take backups of my Remark42 database and copy it somewhere, otherwise if my single storage volume is lost, I will lose all the comments.

Nonetheless I think both Remark42 and Fly.io are very interesting products. I love Remark42’s features, and Fly.io is easy enough to use once you get familiar with it. I think I’m gonna stick with them for a long time.

My journey from Disqus to Remark42

andreacorbellini — Tue, 05 Sep 2023 08:30:00 +0000

Readers of this blog might have noticed a few changes recently. For example, I’ve been working on improving the look of the blog (maybe with questionable results), as well as improving the experience on mobile. But one of the biggest changes that perhaps some have noticed is that all of the comments on all of my articles have suddenly disappeared since February 2023. Now, almost 7 months later, all comments have finally been restored.

The reason for this 7 months blackout of comments is that I decided to change the platform that hosts comments: I got rid of Disqus, and eventually replaced it with Remark42. Here I will describe why I did it. There will be another (more technical) blog post about my new setup.

Premise

My blog is a static website that has been using Disqus as a commenting platform for a long time: since at least 2015 (8 years ago), or maybe even more (back when my blog was on WordPress). Disqus at that time was gaining a lot of popularity, it was free, and it was very attractive to me because easy to set up. I might be wrong, but at that time, Disqus did not look to me like the data-savvy, privacy-invading, revenue-oriented company that it is today. Maybe I just naive, but so I kept using Disqus all these years without paying too much attention to it: after all, it worked, so why would I spend any time thinking about it?

Advertisements on my blog!?

Fast-forward to February 2023: one day, a person very close to me, with the utmost kindness that characterizes her, came to me and said: “the ads on your blog suck! They’re the worst kind of ads!”

At the beginning I had no idea what she was talking about. I have never intentionally run any sort of advertisements on my blog. I hate advertisements!

Then I realized what was going on: precisely because I hate advertisements, I run ad-blockers on all my devices. Maybe there were ads on my blog, but I never noticed because I block those ads. The only third-party service that I used to run on my blog was Disqus, so I immediately turned my attention to it. I disabled my ad-blockers, refreshed my blog, scrolled down to the comments section, and… the sad truth was revealed: Disqus was showing ads to my readers. And yes, those ads were some of the worst kind of ads.

And I knew that, together with those ads, there was massive tracking, collection of data, and maybe even data sharing with third-parties. People who know me, know that I deeply care about privacy, and having Disqus on my blog tracking my readers was the complete opposite of what I wanted.

I was extremely disappointed.

Leaving Disqus

I did some quick research and I discovered that (1) I could not disable Disqus ads without paying, and (2) Disqus was no longer that nice commenting platform that I met in 2015. It had mutated into something obsessed about revenue, and it was clear that their business model was completely based on ads. My fears about tracking were quickly confirmed. Let’s just say that Disqus turned out to something that does not really align with my values.

I made the difficult decision to completely remove Disqus from my blog on the same day. But I firmly believe that a blog without comments is not a blog, and so I had to find an alternative.

Looking for a new platform

I quickly started to look for new commenting platforms that could replace Disqus. The basic criteria that this new platform had to meet were (in no particular order):

be free of charge
display only comments, no ads
respect the privacy of users
allow users to comment anonymously (at least to some extent)

The last time that I searched for a commenting platform was in 2015. Back in those days, there were not many solutions, and that’s one reason why I ended up with Disqus. I thought: 8 years have passed since then, surely the space must have improved, and alternatives must be proliferating, right? Well, no, not really. I struggled to find a managed platform that met those criteria.

I did find some solutions that were using Mastodon or GitHub as a backend to store comments, but I did not like at all the idea of forcing my readers to have a Mastodon or GitHub account to comment on my blog.

Trying Cactus Comments

One platform that came up multiple times during my search was Cactus Comments. Quoting the homepage of the project:

Cactus Comments is a federated comment system built on Matrix. It respects your privacy, and puts you in control. The entire thing is completely free and open source.

That sounded interesting, although I did not really know what Matrix was to begin with (if you, like me earlier this year, do not know what Matrix is: it is a team communication platform, somewhat similar to Slack). I thought that I could give Cactus a try. So, a few days after removing Disqus, I onboarded on Cactus Comments.

Onboarding was not hard, but it was not trivial either, mostly because I was not familiar with Matrix. The frontend shown to readers was a bit disappointing: even though Matrix supports threads, Cactus Comments does not. Overall, the number of features that commenters could use was scarce: people could only post a comment, and not much else; they had no ability to edit their comments, or delete them. But it did allow people to post even without creating a Matrix account, and that was great for me.

The “administrative interface” (if we can call it this way) was also disappointing. All the administration and moderation had to be done through Matrix, sometimes by communicating with a bot, and could not be done by clicking buttons on my blog. Every blog post had to have its own Matrix channel and I (the author) had to manually join each channel in order to get some sort of notification for new comments.

I needed a Matrix client to spot new comments, and to perform moderation actions, and I chose Element for that purpose. Sadly, Element was totally unreadable on small displays like my phone. And apparently there’s no web-based Matrix client that works well on mobile. I could have installed an app for my phone, but I hate installing apps, especially for activities that can in theory be done through a web browser.

Cactus Comments also did not support importing comments from Disqus, so moving to this platform meant that all the conversations that happened over the years on my blog were lost. But because Cactus Comments is free & open source software, I thought that I could add support for importing comments from Disqus if I decided to settle with Cactus Comments, so this was not a deal breaker.

Overall my experience with Cactus Comments was not great, but I was willing to accept that in exchange for a platform that was free, managed by someone else, and respecting the privacy of my readers.

There was however one big problem that eventually led me to remove Cactus Comments from my blog: Cactus did not support sending email notifications. This meant that if you left a comment on this blog, I would not get notified. And if I responded to your comment, you would not get notified. In order to spot new comments, I had to check the Matrix channels periodically, and readers and to check my blog periodically. Maybe if I installed a Matrix app I could have received push notifications on my phone, but that’s not what I wanted, and this wouldn’t have solved the problem for my commenters anyway.

I was pretty bad at checking for new comments on Cactus. What happened multiple times is that people would leave comments or questions on my blog, but I wouldn’t notice until 2 weeks later. At that point, it was pointless for me to respond because so much time had passed that those commenters surely wouldn’t be checking my blog for a response…

I would say that with Cactus I had a blog that allowed comments, but did not allow conversations. Not allowing conversations made the comments pointless in my opinion. I might as well have had no comments at all: at least people would stop leaving questions there that were destined to be unanswered, and instead they would have emailed me directly.

Meet Remark42

Between August and September 2023, I decided that I had to restart my quest for a commenting platform. This time I knew that I had to look for a solution that I had to install and manage myself. I was not super-excited about it, but from my first search for a Disqus alternative, I couldn’t find any managed solution that I really liked.

Initially I thought about writing my own commenting platform in Rust with a key-value store, but then I figured that if I looked for a software to install instead of a managed platform, maybe I could find something I liked.

After some research, I decided to go with Remark42. There were a few contenders, but Remark42 won because it looked like it had all of the features I needed, and more:

it supports sending of email notifications, both to me, and to my readers;
it supports various authentication mechanisms, including: email, GitHub, Google, Facebook, etc (it’s nice to give commenters a choice);
it supports leaving comments anonymously, without logging in or leaving an email address;
commenters can edit and delete comments;
it supports importing comments from Disqus;
in fact, it supports importing comments from any platform: the format it uses for restoring backups is JSON-based and very easy to replicate (in theory I could import the comments from Cactus, even though I have not done that yet);
it’s privacy-focused, and it looks like it’s implemented with security in mind.

I decided to host it on Fly.io, which offers some compute and storage capacity for free. I was introduced to Fly.io on Mastodon, but I had never used it before.

For sending emails, I chose Elastic Email, which also offers the features I needed for free. I also had never used this service before, and did not know much about it: it showed up while searching for a free SMTP provider. Elastic Email describes itself as a marketing service, which does not sound great from the point of view of privacy, but I figured that all the emails being sent here contain only public information (all comments are public after all), so there’s not much to protect besides email addresses. And people are free to use temporary email providers like Mailinator if they don’t want to leave their real email, or even leave no email address at all. (Should I be concerned about Elastic Email, like I should have been concerned about Disqus? Let me know… in the comments below.)

Setting up Remark42 on Fly.io was relatively easy, but it took me way longer than I had expected, mostly because the Fly.io documentation was quite inconsistent and confusing, and also the Remark42 documentation was not fully clear. In the end I managed to make everything work and I’m pretty happy with the setup I ended up with. I’m going to publish details about my setup in a future blog post, in case you’re interested (update: said blog post is now published).

Conclusion

That’s all I have to say for now! Remark42 has been running on my blog for a few days, so it’s too early for me to say whether I’ll stick with it or I will look for a new solution, but so far it looks very promising, and I’m very happy with it. I hope this is the beginning of a long journey!

On ignoring mistakes, resilience, and the hidden dangers therein

andreacorbellini — Sat, 18 Mar 2023 08:20:00 +0000

As a scuba diver who often explores new places, I can say that I have found myself in some dangerous situations, but I always made it back to the surface without facing any negative consequences. Does this mean that I never made any mistakes? Absolutely not: mistakes were made, and lessons were learned.

We can all agree that learning from mistakes is good, but sometimes, when mistakes happen and consequences don’t manifest themselves immediately, we run the risk of not noticing them, not learning from them, repeating them, and over time developing a false sense of confidence, which can drive us to believe that our repeated mistakes are actually good practices.

Why do we ignore mistakes? Because sometimes outcomes are positive even if we make mistakes. “I made it out of water even this time, this means that my dive was executed perfectly.” This is a common way of reasoning, but in reality, things are much more complex than that. There is a difference between correct execution and successful outcome, and the two should not be confused. In fact, everyone should know from experience that goals can be achieved even if the execution was sloppy and full of mistakes. Catastrophic consequences may happen if we fail to see that.

An example of the consequences of ignoring mistakes is given by the two space shuttle disasters: the Challenger disaster of 1986, and the Columbia disaster of 2003. Both these instances were caused by NASA leadership ignoring the concerns from the engineering teams. Problems that occurred in previous shuttle launches should have been a wake-up call for NASA leadership. Instead, all the previous successful launches and re-entries despite the problems were seen as accomplishments, and nourished the leadership’s overconfidence. “We have made it this time too, this means that all those concerns that engineers raised were excessive.”

The tendency of diverting from proper procedures, dismissing valid concerns, and ignoring problems, has a name: it’s called normalization of deviance. The driving force of normalization of deviance is overconfidence and the false belief that positive outcomes are inherently caused by correct executions.

Overconfidence and normalization of deviance can spread like a virus in an organization. It is important to be vigilant for signs of overconfidence in individuals, before it infects other people. I once had to deal with a manager who was a self-declared micromanager (and proud to be) but lacked technical foundations and knowledge of the product. He would consistently and quickly dismiss anything that he did not understand, and focus on short-term goals of questionable usefulness. Whenever his team would accomplish a goal, he would send a pumped-up announcement, often containing inaccuracies, and carefully skipping over the shortcomings of the solutions implemented. Given the apparent success of this management style, other managers started to follow his example. Soon after (in less than a year), the entire organization became a toxic environment where raising even the minimal concern was seen as an attack on the “great new vision”.

I see many parallels between this manager story and what is happening with ‘Twitter 2.0’ right now (although, I must say, in my case engineers did not get fired on the spot for speaking the truth). And with that manager, just like with ‘Twitter 2.0’, whenever problems occurred, those problems would either be ignored or blamed on the preexisting components built before the manager joined, never on the new, careless developments.

The truth however was that problems that occurred had been preannounced weeks, or months before, but concerns around them had been promptly dismissed due to being too challenging to address, and because “everything works right now, so that’s not a concern”.

The idea that everything must be correct because everything works, goals are achieved, and outcomes are successful, is a dangerous idea that can potentially have catastrophic consequences. It’s important to be critical and analytical, regardless of the outcome. This does not mean that success shouldn’t be celebrated, but that mistakes should be captured so that lessons can be learned from them, even if the final outcome was successful. Not learning from mistakes does not allow us to advance, and on the contrary can only lead us to repeat them. And if we keep repeating the same mistakes, sooner or later, those will have some negative consequences.

A common practice in the aviation industry is to write reports on incidents, close calls, and near misses, whenever they occur, even if the flight was concluded successfully and no injuries or damages occurred. These reports are collected in databases like the Aviation Safety Reporting System (which can be freely consulted online), so that flight safety experts and regulators can identify common failure scenarios and eventually introduce mechanisms to improve safety in the aviation industry. A key element of these reports is that they are not meant to put the blame on certain people, but rather focus on what chain of events led to a certain mistake. “Human mistake” is generally not a valid root cause: if a human was able to make a mistake, it means that a mechanism is missing that can either prevent the mistake or detect it before it causes any negative consequences.

Some companies in other industries have similar processes for writing reports or retrospectives when a mistake happens (regardless of the outcome), with the goal of finding proper root causes and preventing future mistakes. Amazon with its Correction of Error practice is a famous example.

I think introducing these practices in an organization can help to establish a healthy culture where finding mistakes and raising concerns is encouraged, rather than being oppressed. However these practices, by themselves, may not be enough to ensure that such a culture can be maintained over time, because people can always disagree on what is considered a ‘mistake’. Empathy is probably the key to a truly healthy culture that allows people to learn and advance.

There are also cases where we are aware of problems, and we see them as such, but we deliberately choose not to do anything about them. This is where resilience comes into play.

Resilience is generally a good quality to have. Resilience can give us the strength to go through long-term hardships, and can have positive effects on our tenancy and determination. But even resilience, when taken to the extreme, can be dangerous. Resilience can lead us to ignore problems, and not react to them. Resilience can make us tolerate a negative situation, without finding a proper strategy to cope with it.

Poor planning forces you to consistently work extra hours? Resist and keep going, until you burn out. The relationship with your partner doesn’t satisfy you? Resist and think that things will get better, while the relationship slowly deteriorates. Feel pain in your knee every time you run for more than 30 minutes? Resist and don’t go to see a doctor, the pain will go away, sooner or later… until you cannot run anymore.

When we let resilience become an excuse to avoid solving problems, we can end up in situations from which it’s difficult to recover.

It’s important to make a distinction between what is under our control and what is not. We can fix problems that are under our control, but in situations where we cannot directly change the course of things, finding an alternative strategy is the only way. Resisting and hoping that things will get better often does not give the expected outcome–on the contrary, it can be detrimental.

In the end, I think that the ‘practice’ of ignoring mistakes (because of the overconfidence built from successful outcomes) or ignoring problems (because of resilience taken to the extreme) are hidden time bombs, silently ticking, waiting for the right conditions before exploding. We need to be aware that just because things seem to work today, it doesn’t mean that we’re making the right decisions, and this can have consequences in the future. Being critical, analytical, empathetic, and honest is important to avoid these behaviors and the dangers that come with them.

Authenticated encryption: why you need it and how it works

andreacorbellini — Thu, 09 Mar 2023 18:35:00 +0000

In this article I want to explore a common problem of modern cryptographic ciphers: malleability. I will explain that problem with some hands-on examples, and then look in detail at how that problem is solved through the use of authenticated encryption. I will describe in particular two algorithms that provide authenticated encryption: ChaCha20-Poly1305 and AES-GCM, and briefly mention some of their variants.

The problem

If we want to encrypt some data, a very common approach is to use a symmetric cipher. When we use a symmetric cipher, we hold a secret key, which is generally a sequence of bits chosen at random of some fixed length (nowadays ranging from 128 to 256 bits). The symmetric cipher takes two inputs: the secret key, and the message that we want to encrypt, and produces a single output: a ciphertext. Decryption is the inverse process: it takes the secret key and the ciphertext as the input and yields back the original message as an output. With symmetric ciphers, we use the same secret key both to encrypt and decrypt messages, and this is why they are called symmetric (this is in contrast with public key cryptography, or asymmetric cryptography, where encryption and decryption are performed using two different keys: a public key and a private key).

Generally speaking, symmetric ciphers can be divided into two big families:

stream ciphers, which can encrypt data bit-by-bit;
block ciphers, which can encrypt data block-by-block, where a block has a fixed size (usually 128 bits).

As we will discover soon, both these two families exhibit the same fundamental problem, although they slightly differ in the way this problem manifests itself. To understand this problem, let’s take a close look at how these two families of algorithms work and how we can manipulate the ciphertexts they produce.

Stream ciphers

A good way to think of a stream cipher is as a deterministic random number generator that yields a sequence of random bits. The secret key can be thought of as the seed for the random number generator. Every time we initialize the random number generator with the same secret key, we will get exactly the same sequence of random bits out of it.

The bits coming out of the random number generator can then be XOR-ed together with the data that we want to encrypt: ciphertext = random sequence XOR message, like in the following example:

random sequence: 3bAWC5ThFSPXX1W8P94q3XV35TG6CRVTNAPW27Q69F
                                     ⊕
        message: I would really like an ice cream right now
                                     =
     ciphertext: zB686Y0H46144HwT9RQQR6vZV1gU1779n390ZCqXV1

The XOR operator acts as a toggle that can either flip bits or keep them unchanged. Let me explain with an example:

a XOR 0 = a
a XOR 1 = NOT a

If we XOR “something” with a 0 bit, we get “something” out; if we XOR “something” with a 1 bit, we get the opposite of “something”. And if we use the same toggle twice, we return to the initial state:

a XOR b XOR b = a

This works for any a and any b and it’s due to the fact that b XOR b is always equal to 0. In more technical terms, each input is its own self-inverse under the XOR operator.

The self-inverse property gives us a way to decrypt the message that we encrypted above: all we have to do is to replay the random sequence and XOR it together with the ciphertext:

random sequence: 3bAWC5ThFSPXX1W8P94q3XV35TG6CRVTNAPW27Q69F
                                     ⊕
     ciphertext: zB686Y0H46144HwT9RQQR6vZV1gU1779n390ZCqXV1
                                     =
        message: I would really like an ice cream right now

This works because ciphertext = random sequence XOR message, therefore random sequence XOR ciphertext = random sequence XOR random sequence XOR message. The two random sequence are the same, so they cancel each other (self-inverse), leaving only message:

random sequence: 3bAWC5ThFSPXX1W8P94q3XV35TG6CRVTNAPW27Q69F
                                     ⊕
random sequence: 3bAWC5ThFSPXX1W8P94q3XV35TG6CRVTNAPW27Q69F
                                     ⊕
        message: I would really like an ice cream right now
                                     =
        message: I would really like an ice cream right now

Only the owner of the secret key will be able to generate the random sequence, therefore only the owner of the secret key should, in theory, be able to recover the message using this method.

Playing with stream ciphers

The self-inverse property not only allows us to recover the message from the random sequence and the ciphertext, but it also allows us to recover the random sequence if can correctly guess the message:

        message: I would really like an ice cream right now
                                     ⊕
     ciphertext: zB686Y0H46144HwT9RQQR6vZV1gU1779n390ZCqXV1
                                     =
random sequence: 3bAWC5ThFSPXX1W8P94q3XV35TG6CRVTNAPW27Q69F

This “feature” opens the door to at least two serious problems. If we are able to correctly guess the message or a portion of it, then we can:

decrypt other ciphertexts produced by the same secret key (or at least portions of them, depending on what portions of the random sequence we were able to recover);
modify ciphertexts.

And we can do all of this without any knowledge of the secret key.

The first problem implies that key reuse is forbidden with stream ciphers. Every time we want to encrypt something with a stream cipher, we need a new key. This problem is easily solved by the use of a nonce (also known as initialization vector, IV, or starting variable, SV): a random value that is generated before every encryption, and that is combined in some way with the secret key to produce a new value to initialize the random number generator. If the nonce is unique per encryption, then we can be sufficiently confident that the random sequence generated will also be unique. The nonce value does not necessarily need to be kept secret, and needs to be known at decryption time. Nonces are usually generated at random at encryption time and stored alongside the ciphertext.

The second problem is a bit more subtle: if we have a ciphertext and we can correctly guess the original message that produced it, we can modify it using the XOR operator to “cancel” the original message and “insert” a new message, like in this example:

         ciphertext: zB686Y0H46144HwT9RQQR6vZV1gU1779n390ZCqXV1
                                         ⊕
            message: I would really like an ice cream right now
                                         ⊕
    altered message: I would really like to go to bed right now
                                         =
tampered ciphertext: zB686Y0H46144HwT9RQQG7vTZt3Yc030n390ZCqXV1

This message, when correctly decrypted with the secret key, will return the tampered ciphertext without detection!

Note that I do not need to know the full message to carry out this technique, in fact, the following example (where unknown parts have been replaced by hyphens) produces the same result as the above one:

         ciphertext: zB686Y0H46144HwT9RQQR6vZV1gU1779n390ZCqXV1
                                         ⊕
            message: --------------------an ice cream----------
                                         ⊕
    altered message: --------------------to go to bed----------
                                         =
tampered ciphertext: zB686Y0H46144HwT9RQQG7vTZt3Yc030n390ZCqXV1

This problem is known as malleability, and it’s a serious issue in the real world because most of the messages that we exchange are in practice relatively easy to guess.

Suppose for example that I have control over a WiFi network, and I can inspect and alter the internet traffic that passes through it. Suppose that I know that a person connected to my WiFi network is visiting an e-commerce website and that they’re interested in a particular item. The traffic that your browser exchanges with the e-commerce website may be encrypted, and therefore I won’t be able to decrypt its contents, but I might be able to guess certain parts of it, like the HTTP headers sent by the website, or some parts of the HTML that are common to all pages on that website, or even the name and the price of the item you want to buy. If I can guess that information (which is public information, not a secret, and it’s generally easy to guess), then I might be able to alter some parts of the web page, showing you false information, and altering the price that you see in an attempt to trick you into buying that item.

An example of malleability using ChaCha20 with OpenSSL

Here’s a practical example of how we can take the output of a stream cipher, and alter it as we wish without knowledge of the secret key. I’m going to use the OpenSSL command line interface to encrypt a message with a stream cipher: ChaCha20. This is a modern, fast, stream cipher with a good reputation:

openssl enc -chacha20 \
    -K 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef \
    -iv 0123456789abcdef0123456789abcdef \
    -in <(echo 'I would really like an ice cream right now') \
    -out ciphertext.bin

The -K option specifies the key in hexadecimal format (256 bits, or 32 bytes, or 64 hex characters), the -iv is the nonce, also known as initialization vector (128 bits, or 16 bytes, or 32 hex characters).

This trivial Python script can tamper with the ciphertext:

with open('ciphertext.bin', 'rb') as file:
    ciphertext = file.read()

guessed_message     = b'--------------------an ice cream----------\n'
replacement_message = b'--------------------to go to bed----------\n'

tampered_ciphertext = bytes(x ^ y ^ z for (x, y, z) in
                            zip(ciphertext, guessed_message, replacement_message))

with open('tampered-ciphertext.bin', 'wb') as file:
    file.write(tampered_ciphertext)

This script is using partial knowledge of the message. It knows (thanks to an educated guess) that the original message contained the words “an ice cream” at a specific offset, and uses that knowledge to replace those words with new ones (“to go to bed”) which add up to the same length. Note that this technique cannot be used to remove or add parts from the message, only to modify them without changing their length.

Now if we run this script and we decrypt the tampered-ciphertext.bin file with the same key and nonce as before, we get “to go to bed” instead of “an ice cream”, without any error indicating that tampering occurred:

openssl enc -d -chacha20 \
    -K 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef \
    -iv 0123456789abcdef0123456789abcdef \
    -in tampered-ciphertext.bin

Block ciphers

We have seen that stream ciphers alone have a serious problem (malleability) that allows anyone to modify arbitrary portions of ciphertexts without detection. Let’s take a look at the alternative: block ciphers. Will they have the same problem?

While a stream cipher can encrypt variable amounts of data, a block cipher can only take as input a block of data of a fixed size, and produce as output another block of data. A good block cipher produces an output that is indistinguishable from random.

The block size is generally small, usually 128 bits (16 bytes), so if we want to encrypt larger amounts of data, we have to split the data into multiple blocks, and encrypt each block individually. If the data is too short to fit in a block, the data will also need to be padded.

   message: The cat is on th e table.........
            |______________| |______________|
                block #1         block #2
                                 (padded)

ciphertext: c2TNPW3r09hZ6f1P Vc32VX41XSy579Y9

This approach however has a problem: if we encrypt multiple blocks with the same secret key, then portions of messages that are the repeated will produce the same output. This gives the ability to analyze a ciphertext and find patterns in it without knowledge of the secret key. This problem is famously evident when encrypting pictures:

Example of applying a block cipher to an uncompressed image. The original colors are lost, but the overall layout of the image is still understandable. That's because multiple blocks of the image (containing the RGB values of each pixel), for example from the white background, are repeated multiple times, yielding the same exact encrypted blocks. The inspiration for making this image came from Wikipedia.

How this image was generated

Before jumping into how I encrypted the image, let me spend a few words on how I did NOT encrypt the image: I did not use a modern image format. Modern image formats are very sophisticated, they’re not a simple sequence of RGB values. Instead, they have some control structures mixed in the image, they implement compression to reduce the image size, etc. This complexity means that if I simply take an image in any format and encrypt it, the result won’t be visualizable by an image viewer: the image viewer would just throw an error because it would find invalid data structures.

Note that this does not imply that encrypting modern image formats is more secure: people can still analyze patterns in them, but it simply means that a modern image format, once encrypted, would not produce the sensational visualization that I showed above.

In order to produce this visualization I had to find an uncompressed image format without too much metadata in it. Thankfully the Wikipedia article on image file formats provided a list, which included the Netpbm family of formats (something I never heard of before). Among the formats in this family, I chose PPM, because it’s the one that supports colors.

The PPM file format is very simple: it has 3 lines of metadata, followed by the RGB values for each pixel. No compression. Definitely the right format for this kind of experiment!

So here’s what I did: first of all I downloaded an image (the Ubuntu “Circle of Friends” logo, obtained from Wikipedia) and converted it to PPM with ImageMagick:

convert UbuntuCoF.png img.ppm

I separated the header from the RGB values:

head -n3 img.ppm > ppm-header
tail -n+4 img.ppm > ppm-image

The reason why I separated the header from RGB values is that I won’t encrypt the header. If I did, then the image won’t be visualizable by an image viewer, just like if I used a modern image format. In a real-world scenario, a person would be able to easily guess the header if it was encrypted.

I encrypted the RGB values with AES-256, a modern, strong block cipher with a good reputation:

openssl enc -aes-256-ecb \
    -K 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef \
    -in ppm-image \
    -out ppm-image-encrypted

Then I joined the header and the encrypted RGB values in a PPM file:

cat ppm-header ppm-image-encrypted > img-encrypted.ppm

This results in a randomized image that can be viewed without problems on an image viewer. It’s interesting to see that, if you change the encryption key, you will get a different randomized image!

The problem we have just seen is known as lack of diffusion. This is kinda analogous to the first problem we identified with stream ciphers: at the root of both problems there is key reuse. We solved this problem for stream ciphers by combining a key with a random nonce. We could use the same strategy here, but it would be an expensive approach, as initializing a block cipher with a new key is a relatively expensive operation. It’s much cheaper to initialize the block cipher once, and reuse it for every block encryption. We need a way to “link” blocks to each other, so that if two linked blocks contain the same plaintext, their encryption will give different results.

There are various strategies do that. These strategies are known as mode of operation of block ciphers. Let’s take a look at two of them: Cipher Block Chaining (CBC) and Counter Mode (CTR).

Cipher Block Chaining (CBC)

This mode of operation, as the name suggests, ‘chains’ each block to the next one. The way it works is by using the XOR operator in the following way:

First of all, a random nonce is generated. The purpose of the nonce is the same as before (with stream ciphers): ensuring that using the same secret key to perform multiple encryptions yields different results each time, so that secret information or patterns are not revealed.

The nonce does not need to be kept secret and is normally stored alongside the ciphertext so that it can be easily used during decryption. It is however important that the nonce is unique.
The first block of message m[0] is XOR-ed with the nonce, and then encrypted the block cipher, producing the first block of ciphertext c[0] = block_encrypt(m[0] XOR nonce)
The second block of message m[1] is XOR-ed with c[0], and then encrypted with the block cipher: c[1] = block_encrypt(m2 XOR c[0])
…
The last block of message m[n] is XOR-ed together with c[n-1], and then encrypted with the block cipher: c[n] = block_encrypt(m[n] XOR c[n-1])

The XOR operator is back! With stream ciphers, the XOR operator was allowing us to tamper with ciphertexts. Can we do the same thing here? Yes of course! The approach is slightly different though: instead of acting directly on the block that we want to change, we will act on the block that precedes it.

For example, if we want to change the sentence “I came home in the afternoon and the cat was on the table” so that it reads ‘dog’ instead of ‘cat’, we would need to change the block right before the one that contains the word ‘cat’. If we want to change the very first block, for example to change the word ‘came’ to ‘left’, we would need to change the nonce instead.

                             nonce+ciphertext: yzURZRbP6X1w3ZRL XRDnPbEkx3JUP2Fv C2ZWt19EdAXDi76H pkbk8qTgaSdzerbF 8CWYqscBqE6cSLmx
                                                                ⊕
        message (shifted 1 block to the left): I came home in t he afternoon and  the cat was on  the table.......
                                                                ⊕
altered message (shifted 1 block to the left): I left home in t he afternoon and  the dog was on  the table.......
                                                                =
                    tampered nonce+ciphertext: yzZVQCbP6X1w3ZRL XRDnPbEkx3JUP2Fv C2ZWt67VdAXDi76H pkbk8qTgaSdzerbF 8CWYqscBqE6cSLmx

If we do the above, and then decrypt the tampered ciphertext, we will get something like this:

I left home in t���������������the dog was on the table

How to get this result using AES-CBC with OpenSSL

Here’s a step-by-step guide on how to tamper with a ciphertext encrypted with AES-256 in CBC mode.

First, generate a valid ciphertext:

openssl enc -aes-256-cbc \
    -K 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef \
    -iv 0123456789abcdef0123456789abcdef \
    -in <(echo 'I came home in the afternoon and the cat was on the table') \
    -out ciphertext.bin

Like in the stream cipher example, -K is the key in hexadecimal format (256 bits), while -iv is the nonce (128 bits).

We can perform the tampering with this Python script:

with open('ciphertext.bin', 'rb') as file:
    ciphertext = file.read()

guessed_message     = b'---------------------cat----------------------------------------'
replacement_message = b'---------------------dog----------------------------------------'

tampered_ciphertext = bytes(x ^ y ^ z for (x, y, z) in
                            zip(ciphertext, guessed_message, replacement_message))

with open('tampered-ciphertext.bin', 'wb') as file:
    file.write(tampered_ciphertext)

Note that OpenSSL does not store the nonce along with the ciphertext, but instead expects it to be passed as a command line argument. We need to modify it separately, so here’s another Python script just for the nonce:

nonce = bytes.fromhex('0123456789abcdef0123456789abcdef')

guessed_message     = b'--came----------'
replacement_message = b'--left----------'

tampered_nonce = bytes(x ^ y ^ z for (x, y, z) in
                       zip(nonce, guessed_message, replacement_message))

print(tampered_nonce.hex())

If we run that script, we get: 01234a6382bacdef0123456789abcdef.

Now to decrypt the tampered ciphertext with the tampered nonce:

openssl enc -d -aes-256-cbc \
    -K 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef \
    -iv 01234a6382bacdef0123456789abcdef \
    -in tampered-ciphertext.bin

It’s interesting to see that we were successful in changing the word ‘cat’ to ‘dog’ but in doing so we had to sacrifice a block, which, when decrypted, resulted in random bytes.

In a real world scenario, seeing some random bytes could raise some suspicion, and maybe generate some errors in applications, however that’s not always the case (how many times have we seen garbled text on our monitors, and we never worried that somebody was tampering with our communications). Also, when dealing with formats like HTML, one could conceal tampering attempts using comment blocks, or using JavaScript. One example of what I’m describing is the EFAIL vulnerability: discovered in 2017, it affected some popular email clients including Gmail, it targeted the use of AES in CBC mode (as well as another mode very similar to it: Cipher Feedback, CFB), and allowed the injection of malicious content in HTML emails.

We can conclude that block ciphers in CBC mode, just like stream ciphers, are also malleable.

Counter Mode (CTR)

Are other modes of operation all malleable like CBC, or will they be different? Let’s take a look at another, very common, mode of operation: Counter Mode (CTR), so that we can get a better sense of how the problem of malleability can affect the world of block ciphers.

The mechanism behind Counter Mode is very simple:

A random nonce is generated. The purpose of the nonce is the usual one: make sure that repeated encryptions using the same key produce different results.
A counter (usually an integer) is initialized to 1 (or any other starting value of your choice).
The nonce is concatenated with the counter, and encrypted using the block cipher: r[0] = block_encrypt(nonce || counter).

Because the block cipher can only accept as input a block of a fixed size, it follows that the length of the nonce plus the length of the counter must be equal to the block size. For example, for a 128-bit block cipher, a common choice is to have a 96-bit nonce and a 32-bit counter.
The counter is incremented: counter = counter + 1 (the increment does not necessarily need to be by 1, but that’s a common choice). The nonce and the new counter are concatenated again, and encrypted using the block cipher: r[1] = block_encrypt(nonce || counter).
The counter is incremented again (counter = counter + 1), and a new block is encrypted, just like before: r[2] = block_encrypt(nonce || counter).
…

This mechanism produces a sequence of blocks r[0], r[1], r[2], … which are indistinguishable from random. This sequence of random blocks can be XOR-ed with the message to produce the ciphertext.

It’s important that the values for the counter never repeat. If, for example, we’re using a 32-bit counter, the counter will “reset” (go back to the starting value) after 2³² iterations, and will start repeating the same sequence of random blocks as it did at the beginning. This introduces the problem of lack of diffusion that we have seen before, just at a larger scale. If we’re using a 32-bit counter with a 128-bit block cipher, we cannot encrypt more than 128·2³² bits = 64 GiB of data at once. This is a very important detail: exceeding these limits may allow the decryption of portions of ciphertext without knowledge of the secret key.

What Counter Mode is doing is effectively turning a block cipher into a stream cipher. As such, a block cipher in Counter Mode has the exact same malleability problems of stream ciphers that we have seen before.

An example of malleability using AES-CTR with OpenSSL

This example is going to be very similar (almost identical) to the example with ChaCha20 that I showed in the stream cipher section, just that this time I’m going to use AES-256 in CTR mode.

Let’s produce a valid ciphertext:

openssl enc -aes-256-ctr \
    -K 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef \
    -iv 0123456789abcdef0123456700000001 \
    -in <(echo 'Can you give me a ride to the party?') \
    -in <(echo 'Do not  give me a ride to the party!') \
    -out ciphertext.bin

Tamper it with Python:

with open('ciphertext.bin', 'rb') as file:
    ciphertext = file.read()

guessed_message     = b'--------------------an ice cream----------\n'
replacement_message = b'--------------------to go to bed----------\n'

tampered_ciphertext = bytes(x ^ y ^ z for (x, y, z) in
                            zip(ciphertext, guessed_message, replacement_message))

with open('tampered-ciphertext.bin', 'wb') as file:
    file.write(tampered_ciphertext)

And now we can decrypt it:

openssl enc -d -aes-256-ctr \
    -K 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef \
    -iv 0123456789abcdef0123456700000001 \
    -in tampered-ciphertext.bin

The solution: Authenticated Encryption (AE)

We have seen that stream ciphers and block ciphers (in their mode of operation) both exhibit the same problem (in different flavors): malleability. I’ve shown some examples of how this problem can be exploited with modern ciphers like ChaCha20 and AES. These ciphers, alone, cannot guarantee the integrity or authenticity of encrypted data.

In this context, integrity is the assurance that data is not corrupted or modified in any way. Authenticity can be thought of as a stronger version of integrity, and it’s the assurance that a given ciphertext was produced only with knowledge of the secret key.

Does this mean that modern ciphers, like ChaCha20 and AES, should be considered insecure and avoided? Absolutely not! The correct answer is that those ciphers cannot be used alone. You should think of them as basic building blocks, and you need some additional building blocks in order to construct a complete and secure cryptosystem. One of these additional building blocks, that we are going to explore in this article, is an algorithm that provides integrity and authentication: welcome Authenticated Encryption (AE).

When using authenticated encryption, an adversary may be able to modify a ciphertext using the techniques described above, but such modification would be detected by the authentication algorithm, and decryption will fail with an error. The decrypted message at that point should be discarded, preventing the use of tampered data.

There are many different methods to implement authenticated encryption. The most common approach is to use an authentication algorithm to authenticate the ciphertext produced by a cipher. Here I will describe two very popular authentication algorithms:

Poly1305, which is often used in conjunction with the stream cipher ChaCha20 to form ChaCha20-Poly1305;
Galois/Counter Mode (GCM), which is often used with the block cipher AES to form AES-GCM.

These authentication algorithms work by computing a hash of the ciphertext, which is then stored alongside the ciphertext. This hash is not a regular hash, but it’s a keyed hash. A regular hash is a function that takes as input some data and returns a fixed-size bit string:

$$\operatorname{hash}: data \rightarrow bits$$

A keyed hash instead takes two inputs: a secret key and some data, and produces a fixed-size bit string:

$$\operatorname{keyed-hash}: (key, data) \rightarrow bits$$

The output of the keyed hash is more often called Message Authentication Code (MAC), or authentication tag, or even just tag.

During decryption, the same authentication algorithm is run again on the ciphertext, and a new tag is produced. If the new tag matches the original tag (that was stored alongside the ciphertext), then decryption succeeds. Else, if the tags don’t match, it means that the ciphertext was modified (or the stored tag was modified), and decryption fails. This gives us a way to detect tampering and gives us the opportunity to reject ciphertexts that were not produced by the secret key.

The secret key passed to the keyed hash function is not necessarily the same secret key used for the encryption. In fact, both ChaCha20-Poly1305 and AES-GCM operate on a subkey derived from the key used for encryption.

Poly1305

Poly1305 is a keyed hash function proposed by Daniel J. Bernstein in 2004, who is also the author of ChaCha20. It works by using polynomials evaluated modulo the prime 2¹³⁰ - 5, hence the name.

The key to Poly1305 is a 256-bit string, and it’s split into two halves:

the first half (128 bits) is called $r$;
the second half (128 bits) is called $s$.

We’ll see later how this key is generated when Poly1305 is used to implement authenticated encryption. For now, let’s assume that the key is a random (unpredictable) bit string provided as an input.

The first half $r$ is also clamped by setting some of its bits to 0. This is a performance-related optimization that some Poly1305 implementations can take advantage of when doing multiplication using 64-bit registers. Clamping is performed by applying the following hexadecimal bitmask:

0ffffffc0ffffffc0ffffffc0fffffff

The message to authenticate is split into chunks of 128 bits each: $m_1$, $m_2$, $m_3$, … $m_n$. If the length of the message is not a multiple of 128 bits, then the last block may be shorter. The authentication tag is then calculated as follows:

Interpret $r$ and $s$ as two 128-bit little-endian integers.
Initialize the Poly1305 state $a_0$ to the integer 0. As we shall see later, this state will need to hold at most 131 bits.
For each block $m_i$:
- Interpret the block $m_i$ as a little-endian integer.
- Compute $\overline{m}_i$ by appending a 1-bit to the end of the block $m_i$. If $m_i$ is 128 bits long, then this is equivalent to computing $\overline{m}_i = 2^{128} + m_i$. In general, if the length of the block $m_i$ in bits is $\operatorname{len}(m_i)$, then this is equivalent to $\overline{m}_i = 2^{\operatorname{len}(m_i)} + m_i$.
  
  This step ensures that the resulting block $\overline{m}_i$ is always non-zero, even if the original block $\overline{m}_i$ is zero. This is important for the security of the algorithm, as explained later.
- Compute the new state $a_i = (a_{i-1} + \overline{m}_i) \cdot r \pmod{2^{130} - 5}$. Note that, because the operation is modulo $2^{130} - 5$, the result will always fit in 130 bits.
Once each block has been processed, compute the final state $a_{n+1} = a_n + s$. Note that the state $a_n$ is at most 130 bits long, and $s$ is at most 128 bits long, hence the result will be at most 131 bits long.
Truncate the final state $a_{n+1}$ to 128 bits by removing the most significant bits.
Return the truncated final state $a_{n+1}$ as a little-endian byte string.

What this method is doing is computing the following polynomial in $r$ and $s$:

$$\begin{align*} tag & = ((((((\overline{m}_1 \cdot r) + \overline{m}_2) \cdot r) + \cdots + \overline{m}_n) \cdot r) \bmod{(2^{130} - 5)}) + s \\ & = (\overline{m}_1 r^n + \overline{m}_2 r^{n-1} + \cdots + \overline{m}_n r) \bmod{(2^{130} - 5)} + s \end{align*}$$

$r$ and $s$ are secrets, and they come from the Poly1305 key. Note that if we didn’t add $s$ at the end, then the resulting polynomial would be a polynomial in $r$, and one could use polynomial root-finding methods to figure out $r$ from the authentication tag, without knowledge of the key. Therefore it’s important that $s$ is non-zero.

In Python, this is what a Poly1305 implementation could look like (disclaimer: this is for learning purposes, and not necessarily secure or optimized for performance):

def iter_blocks(message: bytes):
    """
    Splits a message in blocks of 16 bytes (128 bits) each, except for the last
    block, which may be shorter.
    """
    start = 0
    while start < len(message):
        yield message[start:start+16]
        start += 16

def poly1305(key: bytes, message: bytes):
    assert len(key) == 32  # 256 bits

    # Prime for the evaluation of the polynomial
    p = (1 << 130) - 5

    # Split the key into two parts r and s
    r = int.from_bytes(key[:16], 'little')  # 128 bits
    s = int.from_bytes(key[16:], 'little')  # 128 bits
    # Clamp r
    r = r & 0x0ffffffc0ffffffc0ffffffc0fffffff

    # Initialize the state
    a = 0

    # Update the state with every block
    for block in iter_blocks(message):
        # Append a 1-bit to the end of each block
        block = block + b'\1'
        # Convert the block to an integer
        c = int.from_bytes(block, 'little')
        # Update the state
        a = ((a + c) * r) % p

    # Add s to the state and truncate it to 128 bits, removing the most
    # significant bits and keeping only the least significant 128 bits
    a = (a + s) & ((1 << 128) - 1)

    # Convert the state from an integer to a 16-byte string (128 bits)
    return a.to_bytes(16, 'little')

And here is an example of how that code could be used:

key = bytes.fromhex('0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef')
msg = b'I had a very nice day today at the beach'
print(poly1305(key, msg).hex())

This returns b0c4cb74b3089e9a982e3baa90c1bb5f, which is the same result that we would get using OpenSSL:

openssl mac \
    -macopt hexkey:0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef \
    -in <(echo -n 'I had a very nice day today at the beach') \
    poly1305

A few things to note:

The same key cannot be reused to construct two distinct tags. In fact, suppose that we use the same hash key to compute tag1 = Poly1305(key, msg1) and tag2 = Poly1305(key, msg2). Then, because $s$ is the same for both, we could subtract the two tags (tag1 - tag2) to remove the $s$ part and obtain a polynomial in $r$. From there, we could use algebraic methods to figure out $r$. Once we have $r$, we can use either one of the tags and compute $s$, therefore recovering the full secret key.

Similarly, if the keys were generated using a predictable algorithm (for example, incrementally: key[i+1] = key[i] + 1), it would still be possible to use a similar approach to figure out the secret key.

For this reason, Poly1305 keys must be unique and unpredictable. Generating Poly1305 keys randomly or pseudo-randomly is an acceptable approach. Authentication functions like Poly1305 are called one-time authenticators because they can be used only one time with the same key.
If we didn’t add the 1-bits at the end of each block (in other words, if we used the $m_i$ blocks instead of $\overline{m}_i$), then encrypting a message full of zero bits would be the equivalent of encrypting an empty message. Adding the 1-bits is a way to ensure that the length of the message always has an effect on the output.

Use of Poly1305 with ChaCha20 (ChaCha20-Poly1305)

Let’s see how we can combine ChaCha20 and Poly1305 to construct an authenticated cipher. To recap:

ChaCha20 is a stream cipher;
Poly1305 is a one-time authenticator;
ChaCha20, like most ciphers, requires the use of a unique nonce to allow key reuse.

Putting the two together gives birth to ChaCha20-Poly1305. Here I’m going to describe how to implement it as standardized in RFC 8439.

The inputs to the ChaCha20-Poly1305 encryption function are:

a 256-bit secret key;
a 96-bit nonce;
a variable-length plaintext message.

The outputs from the ChaCha20-Poly1305 encryption function are:

a variable-length ciphertext (same length as the input plaintext);
a 128-bit authentication tag.

The ChaCha20-Poly1305 decryption function will accept the same secret key, nonce, ciphertext, and authentication tag as the input, and produce either the plaintext or an error as the output. The error is returned in case the authentication fails.

Data flow during a ChaCha20-Poly1305 encryption. This shows the inputs in blue, the outputs in green, and the intermediate objects in red.

ChaCha20-Poly1305 works in the following way:

The ChaCha20 stream cipher is initialized with the 256-bit secret key and the 96-bit nonce.
The stream cipher is used to encrypt a 256-bit string of all zeros. The result is the Poly1305 subkey.

If you recall how a stream cipher works, you should know that encrypting using a stream cipher is equivalent to performing the XOR of a random bit stream with the plaintext. Here the plaintext is all zeros, so the process of generating the Poly1305 subkey is equivalent to grabbing the first 256 bits from the ChaCha20 bit stream.

We previously saw that the Poly1305 subkey must be unpredictable and unique in order for Poly1305 to be secure. The use of ChaCha20 with a unique nonce ensures that: because ChaCha20 is a stream cipher, its output will be random and unpredictable. Therefore, with this construction, the subkey will be unpredictable even if the nonce is predictable.
The stream cipher is used to encrypt another 256-bit string. The result is discarded. This is equivalent to advancing the stream cipher state by 256 bits.

This step may seem weird, and in fact is not needed for security purposes, but it’s a mere implementation detail. This step is here because ChaCha20 has an internal state of 512 bits. In the previous step we obtained the first 256 bits of the state, and this next step is to discard the rest of the state to start with a fresh state. There is no particular reason for requiring a fresh state. The reason why RFC 8439 does that is because… spoiler alert: ChaCha20 is a block cipher under the hood. Its block size is 512 bits. If you read the RFC, you’ll see that it asks to call the ChaCha20 block encryption function once, grab the first 256 bits, and discard the rest. Here I’m treating ChaCha20 as a stream cipher, so I have to include this extra step to discard the bits.
The plaintext is encrypted using the stream cipher.

Note that this is done without resetting the state of the cipher. We are continuing to use the same stream cipher instance that was used to generate the Poly1305 subkey.
The ciphertext is padded with zeros to make its length a multiple of 16 bytes (128 bits) and is authenticated using Poly1305, via the subkey generated in step 2.

This step may be done in parallel to the previous one, that is: every time we generate a chunk of ciphertext, we feed it to the Poly1305 authentication function.

Why pad the ciphertext before passing it to Poly1305? After all, ChaCha20 is a stream cipher, and Poly1305 can accept arbitrary-sized messages. Again, this is an detail of RFC 8439 and padding does not serve any specific purpose.
The length of the ciphertext (in bytes) is fed into the Poly1305 authenticator. This length is represented as a 64-bit little-endian integer padded with 64 zero bits.

The reason why the length is represented as 64 bits and padded (instead of representing it as 128 bits) will be clearer later: what I have given you so far is a simplified view of ChaCha20-Poly1305 and authenticated encryption in general. I will give you the full picture when talking about associated data later on, and at that point this step will be slightly modified.
The ciphertext from ChaCha20 and the authentication tag from Poly1305 are returned.

The decryption algorithm works in a very similar way: ChaCha20 is initialized in the same way, the subkey is generated in the same way, the Poly1305 authentication tag is calculated from the ciphertext in the same way. The only difference is that ChaCha20 is used to decrypt the ciphertext (instead of encrypting the plaintext) and that the input authentication tag is compared to the calculated authentication tag before returning.

Here is a Python implementation of ChaCha20-Poly1305, based on the implementations of ChaCha20 and Poly1305 from pycryptodome (usual disclaimer: this code is for educational purposes, and is not necessarily secure or optimized for performance):

from Crypto.Cipher import ChaCha20
from Crypto.Hash import Poly1305

def chacha20poly1305_encrypt(key, nonce, message):
    assert len(key) == 32  # 256 bits
    assert len(nonce) == 12  # 96 bits

    # Initialize the ChaCha20 cipher with the key and nonce
    cipher = ChaCha20.new(key=key, nonce=nonce)

    # Derive the Poly1305 subkey using the ChaCha20 cipher
    subkey = cipher.encrypt(b'\0' * 32)  # 256 bits
    subkey_r = subkey[:16]
    subkey_s = subkey[16:]

    # Initialize the Poly1305 authenticator with the subkey
    authenticator = Poly1305.Poly1305_MAC(r=subkey_r, s=subkey_s, data=None)

    # Discard the rest of the internal ChaCha20 state
    cipher.encrypt(b'\0' * 32)  # 256 bits

    # Encrypt the message
    ciphertext = cipher.encrypt(message)

    # Authenticate the ciphertext
    authenticator.update(ciphertext)
    # Pad the ciphertext with zeros (to make it a multiple of 16 bytes)
    if len(ciphertext) % 16 != 0:
        authenticator.update(b'\0' * (16 - len(ciphertext) % 16))
    # Authenticate the length of the associated data (0 for simplicity)
    authenticator.update((0).to_bytes(8, 'little'))  # 64 bits
    # Authenticate the length of the ciphertext
    authenticator.update(len(ciphertext).to_bytes(8, 'little'))  # 64 bits
    # Generate the authentication tag
    tag = authenticator.digest()

    return (ciphertext, tag)

def chacha20poly1305_decrypt(key, nonce, ciphertext, tag):
    assert len(key) == 32  # 256 bits
    assert len(nonce) == 12  # 96 bits
    assert len(tag) == 16  # 128 bits

    # Initialize the ChaCha20 cipher and the Poly1305 authenticator, in the
    # same exact way as it was done during encryption
    cipher = ChaCha20.new(key=key, nonce=nonce)

    subkey = cipher.encrypt(b'\0' * 32)
    subkey_r = subkey[:16]
    subkey_s = subkey[16:]
    authenticator = Poly1305.Poly1305_MAC(r=subkey_r, s=subkey_s, data=None)

    cipher.encrypt(b'\0' * 32)

    # Generate the authentication tag, like during encryption
    authenticator.update(ciphertext)
    if len(ciphertext) % 16:
        authenticator.update(b'\0' * (16 - len(ciphertext) % 16))
    authenticator.update((0).to_bytes(8, 'little'))
    authenticator.update(len(ciphertext).to_bytes(8, 'little'))
    expected_tag = authenticator.digest()

    # Compare the input tag with the generated tag. If they're different, the
    # plaintext must not be returned to the caller
    if tag != expected_tag:
        raise ValueError('authentication failed')

    # The two tags match; decrypt the plaintext and return it to the caller
    # Note that, because ChaCha20 is a symmetric cipher, there is no difference
    # between the encrypt and decrypt method: here we are reusing the same
    # exact code used during decryption
    message = cipher.encrypt(ciphertext)

    return message

And here is how it can be used:

key = bytes.fromhex('0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef')
nonce = bytes.fromhex('0123456789abcdef01234567')
message = b'I wanted to go to the beach, but now I changed my mind'

ciphertext, tag = chacha20poly1305_encrypt(key, nonce, message)
decrypted_message = chacha20poly1305_decrypt(key, nonce, ciphertext, tag)
assert message == decrypted_message

print(f'ciphertext: {ciphertext.hex()}')
print(f'       tag: {tag.hex()}')
print(f' plaintext: {decrypted_message}')

Running it produces the following output:

ciphertext: 5d9b09cc5d90ca9ddff2d3470cfd6b563c5158e952bfae6acf1ebf9a3b968a488a41969567ef5ccfe05dcf9e548567028ff374a754af
       tag: dac3c05d261920e278ceb22e2800aa95
 plaintext: b'I wanted to go to the beach, but now I changed my mind'

This is the same output we would obtain by using the ChaCha20-Poly1305 implementation from pycryptodome directly:

from Crypto.Cipher import ChaCha20_Poly1305

key = bytes.fromhex('0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef')
nonce = bytes.fromhex('0123456789abcdef01234567')
message = b'I wanted to go to the beach, but now I changed my mind'

cipher = ChaCha20_Poly1305.new(key=key, nonce=nonce)
ciphertext, tag = cipher.encrypt_and_digest(message)

print(f'ciphertext: {ciphertext.hex()}')
print(f'       tag: {tag.hex()}')

As already stated, it is extremely important that the nonce passed to ChaCha20-Poly1305 is unique. It may be predictable, but it must be unique. If the same nonce is reused twice or more, we can:

Decrypt arbitrary messages without using the secret key, if we can guess at least one message from its ciphertext.

This can be done using the techniques described at the beginning of this article: by recovering the random bit string from the XOR of the ciphertext with the guessed message.
Recover the Poly1305 subkey, and, at that point, tamper with ciphertexts and forge new, valid authentication tags.

This can be done by using algebraic methods on the polynomial of the authentication tag.

There is also a variant of ChaCha20-Poly1305, called XChaCha20-Poly1305, that features an extended 192-bit nonce (the X stands for ‘extended’). This is described in an RFC draft but so far it hasn’t been accepted as a standard yet. I won’t cover XChaCha20 in detail here, because it’s slightly more complex and does not add much to the topic of this article, but XChaCha20-Poly1305 has better security properties than ChaCha20-Poly1305, so you should prefer it in your applications if you can use it. The reason why XChaCha20-Poly1305 has better properties than ChaCha20-Poly1305 is that, having a longer nonce, the probability of generating two random nonces with the same value are much lower.

Galois/Counter Mode (GCM)

Let’s now take a look at Galois/Counter Mode (GCM). This is commonly used with the Advanced Encryption Standard (AES), to construct the authenticated cipher AES-GCM. One main difference between Poly1305 and GCM is that Poly1305 can work with any stream or block cipher, while GCM is designed to work with block ciphers with a block size of 128 bits.

GCM was proposed by David McGrew and John Viega in 2004 and is standardized in NIST Special Publication 800-38D as well as RFC 5288. It takes its name from Galois fields, also known as finite fields, which in turn get their name from the French mathematician Évariste Galois, who introduced the concept of finite fields as we know them today.

As we did before with Poly1305, we are going to first see how the keyed hash function used by GCM works, and then we will see how to use it to construct an authenticated cipher like AES-GCM on top of it. Before we can do that though, we need to understand what finite fields are, and what specific types of finite fields are used in GCM.

Finite Fields (Galois Fields)

What is a field? A field is a mathematical structure that contains a bunch of elements, and those elements can interact with each other using addition and multiplication. For both these operations there’s an identity element and an inverse element. Addition and multiplication in a field must obey the usual properties that we’re used to: commutativity, associativity, and distributivity.

A well-known example of a field is the field of fractions. Here is why fractions form a field:

the elements of the field are the fractions;
addition is well-defined: if we add two fractions, we get a fraction out (example: $5/3 + 3/2 = 19/6$);
multiplication is also well-defined: if we multiply two fractions, we get a fraction out (example: $1/2 \cdot 8/3 = 4/3$);
the additive identity element is 0: if we add 0 to any fraction, we get the same fraction back;
the additive inverse element is the negated fraction (example: $5/1$ is the additive inverse of $-5/1$ because $5/1 + (-5/1) = 0$);
the multiplicative identity element is 1: multiplying any fraction by 1 yields the same fraction back;
the multiplicative inverse element is what we get if we swap the numerator with the denominator (example: $3/2$ is the multiplicative inverse of $2/3$ because $3/2 \cdot 2/3 = 1$)—except for 0, which does not have a multiplicative inverse.
and so on…

On top of addition, multiplication, and inverse elements, we can define derived operations like subtraction and division. Subtracting $a$ from $b$ is equivalent to adding $a$ to the additive inverse of $b$: $a - b = a + (-b)$. Similarly, division can be defined in terms of multiplication with multiplicative inverses ($a / b = a b^{-1}$).

Fields are a generalization of structures where addition, multiplication, subtraction, and division behave according to the rules that we’re used to. Field elements do not necessarily need to be numbers.

An example of something that is not a field is the integers. That’s because integers don’t have multiplicative inverses (for example, there’s no integer that multiplied by 5 makes the result equal to 1). However, there is a way to turn the integers into a field: if we take the integers and a prime number p, then we can construct the field of integers modulo p.

When we work with the integers modulo a prime p, whenever we see p appear in any of our expressions, we can replace it with 0. In other words, in such a field, p and 0 are two different ways to write the same element–they are two different representations of the same element.

Here is an example: in the field of integers modulo 7, the expression 5 + 3 equals 1, because:

5 + 3 evaluates to 8;
8, by definition, is 7 + 1;
if 7 and 0 are the same element, then 7 + 1 is equal to 0 + 1
0 + 1 evaluates to 1

What we have just seen is that 8 is just a different representation of 1, just like 7 is a different representation of 0. Different symbols, same object. Just like, in programming languages, we can have multiple variables pointing to the same memory location: here the numbers are like variables, and what they point to is what really matters.

In the field of integers modulo 7, the additive inverse for 5 is 2, because 5 + 2 = 7 = 0. If we manipulate the equation, we get that 5 = −2. In other words, 5 and −2 are two different representations for the same element, and similarly 2 and −5 are also two different representations of the same element. A similar story holds for multiplication: the multiplicative inverse for 5 is 3 because: 5 · 3 = 15 = 7 + 7 + 1 = 1, so we can write 5 = 3⁻¹ as well as 3 = 5⁻¹.

What we have just seen is an example of a finite field. It’s different from a general field because it contains a finite number of elements (unlike fractions, which do not have a limit). In the case of the integers modulo 7, the number of elements is 7, and the list of elements is: {0, 1, 2, 3, 4, 5, 6}, or {−3, −2, −1, 0, 1, 2, 3}, or {0, 1, 2, 3, 2⁻¹, 3⁻¹, 6}, depending on what representation we like the most.

A few words about terminology, notation, and equivalences of finite fields

There can be many ways to construct a finite field (or even a general field). I have given an example using numbers, but a field does not necessarily need to be formed from numbers. We can also use vectors, matrices, polynomials, and anything you would like. As long as addition, multiplication, identity elements, and inverse elements are well-defined, you can get a field. Using programming terms, you can think of a field as an interface or a trait that can have arbitrary implementations.

An important result in algebra is that finite fields with the same number of elements are “unique up to isomorphism”. This means that if two finite fields have the same number of elements, then there is an equivalence relation between the two. The number of elements of a field is therefore enough to define a field. It’s not enough to tell us what the elements of the field look like, or how they can be represented, but it’s enough to know how it behaves. To denote a field with $n$ elements, there are two major notations: $GF(n)$ and $\mathbb{F}_{n}$.

Another important result in algebra is that $n$ may be either a prime number, or a power of a prime. For example, we can have finite fields with 2 elements, or with 9 (= 3²) elements, but we cannot have a field with 6 (= 2·3) elements. For this reason, you will often find finite fields denoted as $GF(p^k)$ or $\mathbb{F}_{p^k}$, where $p$ is a prime and $k$ is an integer greater than 0. The prime $p$ is called characteristic of the field, while $n = p^k$ is called order of the field.

Some common fields also have their own notation: in particular, the field of integers modulo a prime $p$ is denoted as $Z/pZ$. This notation encodes the “building instructions” to construct the field, in fact:

$Z$ denotes the integers: $Z = \{\dots, -2, -1, 0, 1, 2, \dots\}$;
$pZ$ denotes the integers multiplied by $p$: $pZ = \{\dots, -2p, -p, 0, p, 2p\}$ (example: $2Z = \{\dots, -4, -2, 0, 2, 4, \dots\}$);
$A/B$ is a quotient. This is a way to define an equivalence relation between elements, and its meaning is: within $A/B$, all the elements of $B$ are equivalent to 0. In the case of $Z/pZ$, all the multiples of $p$ are equivalent to 0, which is indeed what happens with the integers modulo $p$. The way I described this equivalence relation earlier is by saying that multiples of $p$ are different representations for 0.

Note that the integers modulo a power of a prime ($Z/p^kZ$, with $k$ greater than 1) do not form a field. The problem is that elements in $Z/p^kZ$ sometimes do not have a multiplicative inverse. For example, in $Z/4Z$, the number 2 does not have a multiplicative inverse (there is no element that multiplied by 2 gives 1). A field $GF(p^k)$ with $k$ greater than 1 needs to be constructed in a different way. One such way is to use polynomials, as described in the next section.

Polynomial fields

Let’s now move our attention from integers to polynomials, like this one:

$$x^7 + 5x^3 - 9x^2 + 2x + 1$$

Polynomials are a sum of coefficients multiplied by a variable (usually denoted by the letter x) raised to an integral power.

Let’s restrict our view to polynomials that have integer coefficients, like the one shown above. Something that is not a polynomial with integer coefficients is $1/2 x^2 + x$, because it has a fraction in it.

Integers and polynomials with integer coefficients are somewhat similar to each other. They kinda behave the same in many aspects. One important property of integers is the unique factorization theorem: if we have an integer, there’s a way to write it as a multiplication of some primary factors. For example, the integer 350 can be factored as the multiplication of 2, 5, 5, and 7.

$$350 = 7 \cdot 5 \cdot 5 \cdot 2$$

This factorization is unique: we can change the order of the factors, but it’s not possible to obtain a different set of factors (there’s no way to make the number 3 appear in the factorization of 350, or to make the number 7 disappear).

Polynomials with integer coefficients also have a unique factorization. In the case of integers, We call the unique factors “prime numbers”; in the case of polynomials we have “irreducible polynomials”. And just like we can have a field of integers modulo a prime, we can have a field of polynomials modulo an irreducible polynomial.

Integers	Polynomials (with integer coefficients)
Unique factorization: $42 = 7 \cdot 3 \cdot 2$	Unique factorization: $x^3 - 1 = (x^2 + x + 1)(x - 1)$
Prime numbers: 2, 3, 5, 7, 11, …	Irreducible polynomials: $x + 1$, $x^2 - 2$, $x^2 + x + 1$, …
Integers modulo a prime number	Polynomials modulo an irreducible polynomial

Let’s take a look at how arithmetic in polynomial fields works. Let’s take, for example, the field of polynomials with integer coefficients modulo $x^3 + x + 1$, and try to compute the result of $(x^2 + 1)(x^2 + 2)$. If we expand the expression, we get:

$$(x^2 + 1)(x^2 + 2) = x^4 + 3x^2 + 2$$

This expression can be reduced. Reducing a polynomial expression is the equivalent of what we were doing with the integers modulo a prime, when we were saying that 8 = 7 + 1 = 1 (mod 7). That “conversion” from 8 to 1 is the equivalent of the reduction that we’re talking about here.

To reduce $x^4 + 3x^2 + 2$, first note that $x^4 = x \cdot x^3$. Also note that $x^3 = x^3 + x + 1 - x - 1$. Here we have just added and removed the term $x + 1$: the result hasn’t changed, but now the irreducible polynomial $x^3 + x + 1$ appears in the expression, and so we can substitute it with 0. Putting everything together, we get:

$$\begin{align*} (x^2 + 1)(x^2 + 2) & = x^4 + 3x^2 + 2 \\ & = x \cdot x^3 + 3x^2 + 2 \\ & = x \cdot (x^3 + x + 1 - x - 1) + 3x^2 + 2 \\ & = x \cdot (0 - x - 1) + 3x^2 + 2 \\ & = -x^2 - x + 3x^2 + 2 \\ & = 2x^2 - x + 2 \end{align*}$$

It’s interesting to note that, if the polynomial field is formed by an irreducible polynomial with degree $n$, then all the polynomials in that field will all have degree less than $n$. That’s because if any $x^n$ (or higher) appears in a polynomial expression, then we can use the substitution trick I just showed to reduce its degree.

Binary fields

Let’s now look at polynomials where coefficients are from the field of integers modulo 2, meaning that they can be either 0 or 1. This is an example of such a polynomial:

$$x^7 + x^4 + x^2 + 1$$

or, in a more explicit form, where we can clearly see all the coefficients:

$$1 x^7 + 0 x^6 + 0 x^5 + 1 x^4 + 0 x^3 + 1 x^2 + 0 x^1 + 1 x^0$$

These are called binary polynomials. It’s interesting to note that if we ignore the variables and the powers, and keep only the coefficients, then what we get is a bit string:

$$(1 0 0 1 0 1 0 1)$$

This suggests that there’s an interesting duality between binary polynomials and bit strings. This means, in particular, that binary polynomials can be represented in a very compact and natural way on computers.

The duality between binary polynomials and bit strings also suggests that perhaps we can use bitwise operations to perform arithmetic on binary polynomials. And this turns out to be true, in fact:

binary polynomial addition can be computed using the XOR operator on the two corresponding bit strings;
binary polynomial multiplication can be computed using XOR, AND and bit-shifting.

Computers are pretty fast at performing these bitwise operations, and this makes binary polynomials quite attractive for use in computer algorithms and cryptography.

Arithmetic with binary polynomials

The arithmetic of such polynomials is quite interesting: in fact, because $1 + 1 = 0$ (modulo 2), then also $x^k + x^k = 0$, in fact:

$$1 \cdot x^k + 1 \cdot x^k = (1 + 1) x^k = 0 \cdot x^k = 0$$

It’s easy to see that addition modulo 2 is equivalent to the XOR binary operator. And addition of two binary polynomials is equivalent to the bitwise XOR of their corresponding bit strings:

$$\begin{array}{ccccc} (x^3 + x^2 + 1) & + & (x^2 + x) & = & x^3 + x + 1 \\ \updownarrow & & \updownarrow & & \updownarrow \\ (1101) & \oplus & (0110) & = & (1011) \end{array}$$

Multiplication of binary polynomials can also be implemented as a bitwise operation on bit strings. First, note that multiplying a polynomial by a monomial is equivalent to bit-shifting:

$$\begin{array}{ccccc} (x^3 + x + 1) & \cdot & x^2 & = & x^5 + x^3 + x^2 \\ \updownarrow & & \updownarrow & & \updownarrow \\ (1011) & \ll & 2 & = & (101100) \end{array}$$

Then note that multiplication of two polynomials can be expressed as the sum of multiplications by monomials:

$$(x^3 + 1)(x^2 + x + 1) = (x^3 + 1) \cdot x^2 + (x^3 + 1) \cdot x^1 + (x^3 + 1) \cdot x^0$$

Putting everything together, we have multiplications by monomials (equivalent to bit-shifts) and sums (equivalent to bitwise XOR). This suggests that multiplication can be implemented on top of bitwise XOR and bit-shifting.

Here is some Python code to implement binary polynomial multiplication, where each polynomial is represented compactly as an int:

def multiply(a, b):
    """
    Compute a*b, where a and b are two integers representing binary
    polynomials.

    a and b are expected to have their most significant bit set to
    the monomial with the highest power. For example, the polynomial
    x^8 is represented as the integer 0b10000.
    """
    assert a >= 0
    assert b >= 0

    result = 0
    while b:
        result ^= a * (b & 1)
        a <<= 1
        b >>= 1
    return result

Other than XOR and bit-shifting, this code also uses AND to “query” whether a certain monomial is present or not.

Here is an example of how to use the code:

a = 0b0101_0111              # x^6 + x^4 + x^2 + x + 1
b = 0b0001_1010              # x^4 + x^3 + x
c = multiply(a, b)
assert c == 0b0111_0110_0110 # x^10 + x^9 + x^8 + x^6 + x^5 + x^2 + x

Now that we have introduced binary polynomials, we can of course form binary polynomials modulo a binary irreducible polynomial. These form a finite field, which is more concisely called: binary field.

Note that in a binary field where the modulo is an irreducible polynomial of degree $n$, all polynomials in the field can be represented as $n$-bit strings, and all $n$-bit strings have a corresponding binary polynomial in the field.

Arithmetic in binary fields

If we have three integers $a$, $b$, and $p$, we can compute $(a + b) \bmod{p}$ or $a \cdot b \bmod{p}$ by performing the binary operation (addition or multiplication) and then taking the remainder of the division by $p$. This is a method that returns the results of addition or multiplication using a representation with the lowest number of digits possible.

What if instead of having 3 integers we have three binary polynomials $A$, $B$, and $P$ and we want to compute $(A + B) \bmod{P}$ or $A \cdot B \bmod{P}$? It turns out that these operations can be implemented with code that is even easier than the integer counterpart: no division needs to be involved!

Let’s start with addition: we have already seen that addition with binary polynomials can be implemented with a simple XOR operation. This means that if the degree of $A$ and $B$ is lower than the degree of $P$, then the result of $A + B$ is also going to have degree less than $P$, hence no reduction is needed. We can use the result as-is, without any transformation: adding two binary field elements can be implemented with a single XOR operation.

With multiplication the story is different: the product $A \cdot B$ may have degree equal to or higher than $P$. For example, if $A = B = x$ and $P = x^2 + 1$, the product $A \cdot B$ is equal to $x^2$, which has the same degree as $P$. We need to find a way to efficiently reduce the higher-degree terms of this product. To see one way to do that, note that we can write $P$ like this:

$$P = x^n + Q$$

where $n$ is the degree of $P$ (the maximum power of $P$) and $Q$ is another binary polynomial, with degree strictly lower than $n$. Rearranging the equation, we get:

$$x^n = P + Q$$

Note that subtraction and addition are the same operations in a binary field. Because $P$ equals 0, we can write:

$$x^n = Q$$

This equivalence gives us a way to eliminate higher-level terms that appear during multiplication: whenever we see an $x^n$ appearing in the result, we can remove that term and add $Q$ instead. One way to do that, using binary strings, is to discard the highest bit (the one corresponding to $x^n$) and XOR with the binary string corresponding to $Q$.

Another way to do it is to just add $P$ (XOR by the binary string corresponding to $P$). This is equivalent to adding 0, results in the more compact representation that we’re interested in.

We could use similar tricks to eliminate terms like $x^{n+1}$, but these tricks are not necessary if we eliminate $x^n$ terms as soon as they appear in an iterative way.

Here is some Python code for multiplication in binary fields that uses the “add $P$” trick just described:

def multiply(a, b, p):
    """
    Compute a*b modulo p, where a, b and c are three integers representing
    binary polynomials.

    a, b and p are expected to have their most significant bit set to the
    highest power monomial. For example, the polynomial x^8 is represented as
    0b10000.
    """
    bit_length = p.bit_length() - 1
    assert a >= 0 and a < (1 << bit_length)
    assert b >= 0 and b < (1 << bit_length)

    result = 0
    for i in range(bit_length):
        result ^= a * (b & 1)
        a <<= 1
        a ^= p * ((a >> bit_length) & 1)
        b >>= 1
    return result

This code is essentially the same as the binary polynomial multiplication code we had before, except for this line in the for loop:

a ^= p * ((a >> bit_length) & 1)

This line is what “adds $P$” whenever adding the shifted $A$ would result in a $x^n$ term to appear.

Again, we achieved implementing multiplication using only XOR, AND and bit-shifting.

Note that the binary polynomial $P$ here does not necessarily need to be an irreducible polynomial for this algorithm to work. However, the resulting algebraic structure won’t be a field unless $P$ is irreducible. A similar story holds for integers: we can have integers modulo a non-prime number, but that’s not a field.

The GHASH keyed hash function

GCM uses a binary field. The irreducible binary polynomial that defines the binary field used by GCM is:

$$P = x^{128} + x^7 + x^2 + x + 1$$

We will call this field the GCM field. Note that this polynomial has degree 128, hence the GCM field elements can be represented as 128-bit strings, and each 128-bit string has a corresponding element in the GCM field.

The keyed hash function used by GCM is called GHASH and takes as input a 128-bit key. We will call this key $H$. This key is interpreted as an element of the GCM field.

The message to authenticate is split into blocks of 128 bits each: $M_1$, $M_2$, $M_3$, … $M_n$. If the length of the message is not a multiple of 128 bits, then the last block is padded with zeros. Each block of message is also interpreted as an element of the GCM field.

Here is how the authentication tag is computed from $H$ and the padded message blocks $M_1$, …, $M_n$:

The initial state (a GCM field element) is initialized to 0: $A_0 = 0$.
For every block of message $M_i$, the next state $A_i$ is computed as $A_i = (A_{i-1} + M_i) \cdot H \bmod{P}$.
The final state $A_n$ is returned as a 128-bit string.

What this function is doing is computing the following polynomial in $H$:

$$\begin{align*} Tag & = (((M_1 \cdot H + M_2) \cdot H + \cdots M_n) \cdot H) \bmod{P} \\ & = (M_1 H^n + M_2 H^{n-1} + \cdots M_n H) \bmod{P} \end{align*}$$

This construction is somewhat similar to the one from Poly1305, although there are important differences:

In Poly1305, the elements of the tag polynomial are integers modulo a prime, in GHASH they are elements of a binary field.
GHASH does not perform any step to encode the length of the message, hence the tag for an empty message will be the same as the tag for a sequence of zero blocks. We will see later that GCM fixes this problem by appending the length of the message to the end of the input passed to GHASH.
Most importantly, the final $Tag$ polynomial is a polynomial in one unknown, and as such $H$ may be easily recoverable using algebraic methods. For this reason, GHASH is not suitable as a secure one-time authenticator. We will see that GCM fixes this problem by encrypting the output of GHASH.

Use of GCM with AES (AES-GCM)

GCM is the combination of a block cipher, Counter Mode (CTR), and the GHASH function that we have just seen. The block cipher is often AES. When we combine AES with GCM, the what we get is AES-GCM, which is described below. However the block cipher does not necessarily need to be AES: what is important is that the block size of the cipher is 128 bits, and that’s because GHASH only works on 128-bit blocks.

The inputs to the AES-GCM encryption function are:

a secret key (the length of the key depends on the variant of AES used: if AES-128, this will be 128 bits);
a 96-bit nonce;
a variable-length plaintext message.

The outputs of the AES-GCM encryption function are:

a variable-length ciphertext (same length as the input plaintext);
a 128-bit authentication tag.

The AES-GCM decryption function will accept the same secret key, nonce, ciphertext, and authentication tag as the input, and produce either the plaintext or an error as the output. The error is returned in case the authentication fails.

Data flow during an AES-GCM encryption. This shows the inputs in blue, the outputs in green, and the intermediate objects in red.

AES-GCM works in the following way:

The GHASH subkey $H$ is generated by encrypting a zero-block: $H = \operatorname{Encrypt}(key, \underbrace{000\dots0}_\text{128 bits})$.
The block cipher AES is initialized in Counter Mode (AES-CTR) with the key, the nonce, and a 32-bit, big-endian counter starting at 2.
The plaintext is encrypted using the instance of AES-CTR just created.
The GHASH function is run with the following inputs:
- the subkey $H$, computed in step 1;
- the ciphertext padded with zeros to make its length a multiple of 16 bytes (128 bits), concatenated to the length (in bits) of the ciphertext represented as a 128-bit big-endian integer.
The result is a 128-bit block $S = \operatorname{GHASH}(H, ciphertext || padding || length)$.
The AES-CTR counter is set to 1.
The block $S$ is then encrypted using AES-CTR. The result of the encryption is the authentication tag.

Note that, because $S$ matches the block size of the cipher, this encryption won’t cause the counter value 2 to be reused.
The ciphertext and authentication tag are returned.

Here is how AES-GCM and GHASH can be implemented in Python, using the AES implementation from pycryptodome (usual disclaimer: this code is for educational purposes, and it’s not necessarily secure or optimized for performance):

from Crypto.Cipher import AES

def multiply(a: int, b: int) -> int:
    """
    Compute a*b in the GCM field, where a and b are two integers representing
    elements of the GCM field.

    a and b are expected to have their least significant bit set to the highest
    power monomial. For example, the polynomial x^125 is represented as 0b100.
    """
    bit_length = 128
    q = 0xe1000000000000000000000000000000
    assert a >= 0 and a < (1 << bit_length)
    assert b >= 0 and b < (1 << bit_length)

    result = 0
    for i in range(bit_length):
        result ^= a * ((b >> 127) & 1)
        a = (a >> 1) ^ (q * (a & 1))
        b <<= 1
    return result

def pad_block(data: bytes) -> bytes:
    """
    Pad data with zero bytes so that the resulting length is a multiple of 16
    bytes (128 bits).
    """
    if len(data) % 16 != 0:
        data += b'\0' * (16 - len(data) % 16)
    return data

def iter_blocks_padded(data: bytes):
    """
    Split the given data into blocks of 16 bytes (128 bits) each, padding the
    last block with zeros if necessary.
    """
    start = 0
    while start < len(data):
        yield pad_block(data[start:start+16])
        start += 16

def ghash(subkey: bytes, message: bytes) -> bytes:
    subkey = int.from_bytes(subkey, 'big')
    assert subkey < (1 << 128)

    state = 0
    for block in iter_blocks_padded(message):
        block = int.from_bytes(block, 'big')
        state = multiply(state ^ block, subkey)

    return state.to_bytes(16, 'big')

def aes_gcm_encrypt(key: bytes, nonce: bytes, message: bytes):
    assert len(key) in (16, 24, 32)
    assert len(nonce) == 12

    # Initialize a raw AES instance and encrypt a 16-byte block of all zeros to
    # derive the GHASH subkey H
    cipher = AES.new(mode=AES.MODE_ECB, key=key)
    h = cipher.encrypt(b'\0' * 16)

    # Encrypt the message with AES in CTR mode, with the counter composed by
    # the concatenation of the 12 byte (96 bits) nonce and a 4 byte (32 bits)
    # integer, starting from 2
    cipher = AES.new(mode=AES.MODE_CTR, key=key, nonce=nonce, initial_value=2)
    ciphertext = cipher.encrypt(message)

    # Compute the GHASH of the ciphertext plus the ciphertext length in bits
    s = ghash(h, pad_block(ciphertext) + (len(ciphertext) * 8).to_bytes(16, 'big'))
    # Encrypt the GHASH value using AES in CTR mode, with the counter composed
    # by the concatenation of the 12 byte (96 bits) nonce and a 4 byte (32
    # bits) integer set at 1. The GHASH value fits in one block, so the counter
    # won't be increased during this round of encryption
    cipher = AES.new(mode=AES.MODE_CTR, key=key, nonce=nonce, initial_value=1)
    tag = cipher.encrypt(s)

    return (ciphertext, tag)

def aes_gcm_decrypt(key: bytes, nonce: bytes, ciphertext: bytes, tag: bytes):
    assert len(key) in (16, 24, 32)
    assert len(nonce) == 12
    assert len(tag) == 16

    # Compute the GHASH subkey, the GHASH value, and the authentication tag, in
    # the same exact way as it was done during encryption
    cipher = AES.new(mode=AES.MODE_ECB, key=key)
    h = cipher.encrypt(b'\0' * 16)

    s = ghash(h, pad_block(ciphertext) + (len(ciphertext) * 8).to_bytes(16, 'big'))
    cipher = AES.new(mode=AES.MODE_CTR, key=key, nonce=nonce, initial_value=1)
    expected_tag = cipher.encrypt(s)

    # Compare the input tag with the generated tag. If they're different, the
    # plaintext must not be returned to the caller
    if tag != expected_tag:
        raise ValueError('authentication failed')

    # The two tags match; decrypt the plaintext and return it to the caller.
    # Note that, because AES-CTR is a symmetric cipher, there is no difference
    # between the encrypt and decrypt method: here we are reusing the same
    # exact code used during decryption
    cipher = AES.new(mode=AES.MODE_CTR, key=key, nonce=nonce, initial_value=2)
    message = cipher.encrypt(ciphertext)

    return message

And here is how the code can be used:

key = bytes.fromhex('0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef')
nonce = bytes.fromhex('0123456789abcdef01234567')
message = b'I went to the zoo yesterday but not today'

ciphertext, tag = aes_gcm_encrypt(key, nonce, message)
print(f'ciphertext: {ciphertext.hex()}')
print(f'       tag: {tag.hex()}')
decrypted_message = aes_gcm_decrypt(key, nonce, ciphertext, tag)
print(f' plaintext: {decrypted_message}')
assert message == decrypted_message

This snippet produces the following output:

ciphertext: e0c32db2962f9b729c69028d9a1fdfb2c93839fc1188f314c58ee97fd6a242404953bb208df609a33c
       tag: 9fa6fe2f77a0c98282868924ace0e4ec
 plaintext: b'I went to the zoo yesterday but not today'

This is the same output we would obtain by using the AES-GCM implementation from pycryptodome directly:

from Crypto.Cipher import AES

cipher = AES.new(mode=AES.MODE_GCM, key=key, nonce=nonce)
ciphertext, tag = cipher.encrypt_and_digest(message)
print(f'ciphertext: {ciphertext.hex()}')
print(f'       tag: {tag.hex()}')

Nonce reuse is catastrophic for AES-GCM in two ways:

Because the ciphertext produced by AES-GCM is just a variant of AES-CTR, nonce reuse with GCM can have the same consequences as nonce reuse with AES-CTR, or any other stream cipher: if someone is able to guess the plaintext, they can recover the random stream, and use that to decrypt other messages (or portions of them).
If the same nonce is used twice or more, the GHASH subkey $H$ will always be the same. Even if the output of GHASH is encrypted in step 7, we can use the XOR of two authentication tags to “cancel” the encryption and obtain a polynomial in $H$. From there, we can use algebraic methods to recover $H$. This gives us the ability to forge new, valid authentication tags.

It’s worth mentioning that there’s a variant of AES-GCM, called AES-GCM-SIV, (Synthetic Initialization Vector) specified in RFC 8452. This differs from AES-GCM in that it uses a little-endian version of GHASH called POLYVAL (which is faster on modern CPUs), and in that it allows nonce reuse without the two catastrophic consequences that I mentioned above.

(Nonce reuse with AES-GCM-SIV however still presents a problem, just not as serious as the two ones above: specifically, it breaks ciphertext indistinguishability.)

Authenticated Encryption with Associated Data (AEAD)

The way I have described authenticated encryption, and in particular the constructions ChaCha20-Poly1305 and AES-GCM, is accurate, but incomplete. What I have told you is that when you use an authenticated encryption cipher, the ciphertext is checked for integrity and authenticity. But we can use the same technique to authenticate anything, not just ciphertexts: we can, for example, authenticate some plaintext data, or authenticate a piece of plaintext data and a piece of ciphertext altogether.

When we use a method to authenticate a plaintext message only, what we get is a Message Authentication Code (MAC). We don’t use the word “encryption” in this context, because the confidentiality of the message is not ensured (only its authenticity).

When we use a method to authenticate both a ciphertext and a plaintext message, what we get is Authenticated Encryption with Associated Data (AEAD). In this construction, there are two messages involved: one to be encrypted (resulting in a ciphertext), and one to be kept in plaintext. The plaintext message is called “associated data” (AD) or “additional authenticated data” (AAD). Both the ciphertext and the associated data are authenticated at encryption time, so their integrity and authenticity will be enforced.

The inputs to the encryption function of an AEAD cipher are, generally speaking:

a key;
a nonce;
the additional data;
the message to encrypt.

The outputs of the encryption are:

the ciphertext;
the authentication tag.

Note that there’s only one authentication tag that covers both the additional data and the ciphertext.

The inputs to the decryption function are:

the key used for encryption;
the nonce used for encryption;
the additional data used for encryption;
the ciphertext.

And the output of the decryption is either an error or the decrypted message.

It’s important to note that the associated data must be both at encryption time and decryption time. Changing a single bit of it will make the entire decryption operation fail.

Both ChaCha20-Poly1305 and AES-GCM (and their variants, XChaCha20-Poly1305 and AES-GCM-SIV) are AEAD ciphers. Here’s how they implement AEAD:

When the Poly1305 or GHASH authenticator is first initialized, they are fed the additional data, padded with zeros to make its size a multiple of 16 bytes (128 bits).
Then the padded ciphertext is fed into the authenticator.
The length of the additional data and the length of the ciphertext are represented as two 64-bit integers, concatenated, and fed into the authenticator.

Updated data flow during a ChaCha20-Poly1305 encryption which shows where the Associated Data (AE) is placed.

Updated data flow during an AES-GCM encryption which shows where the Associated Data (AE) is placed.

If the additional data is empty, then what you get are exactly the constructions that I described earlier in this article.

Authenticated Encryption with Associated Data is useful in situations where you want to encode some metadata along with your encrypted data. For example: an identifier for the resource that is encrypted, or the type of data encrypted (text, image, video, …), or some information that indicates what key and algorithm was used to encrypt the resource, or maybe the expiration of the data. The associated data is in plaintext so systems that do not have access to the secret key can gather some properties about the encrypted resource. It must however be understood that the associated data cannot be trusted until it’s verified using the secret key. Systems that analyze the associated data must be designed in such a way that, if the associated data is tampered, nothing bad will happen, and such tampering attempt will be detected sooner or later.

A word of caution

Something very important to understand is that when using authenticated encryption ciphers like ChaCha20-Poly1305 or AES-GCM, decryption can in theory succeed even if the verification of the authentication tag fails.

For example, we can decrypt a ciphertext encrypted with ChaCha20-Poly1305 by using ChaCha20 and ignoring the authentication tag. Similarly, we can decrypt a ciphertext encrypted with AES-GCM by using AES-CTR and, again, ignoring the authentication tag. This possibility opens the doors to all the nasty scenarios that we have seen at the beginning of this article, removing all the benefits of authenticated encryption.

Perhaps the most important thing to remember when using authenticated encryption is: never use decrypted data until you have verified its authenticity.

Why am I emphasizing this? Because some AE or AEAD implementations do return plaintext bytes before verifying their authenticity.

The code samples that I have provided do the following: they first calculate the authentication tag, compare it to the input tag, and only if the comparison succeeds they perform the decryption. This is a simple approach, but it may be expensive when encrypting large amounts of data (for example: several gigabytes). The reason why this approach is expensive is that, if the ciphertext is too large, it may not fit all in memory, and the ciphertext would have to be read from the storage device twice: once for calculating the tag, and once for decrypting the ciphertext. Also, chances are that by the time the application has computed the tag, the underlying ciphertext may have changed without detection.

What some authenticated encryption implementations do when dealing with large amounts of data is that they calculate the tag and perform the decryption in parallel. They read the ciphertext chunk-by-chunk, and pass each chunk to both the authenticator and the decryption function, returning a chunk of decrypted bytes to the caller at each iteration. Only at the end, when the full ciphertext has been read, the authenticity is checked, and the application may return an error only at that point. With such implementations, it is imperative that the exit status of the application is checked before using any of the decrypted bytes.

An implementation that works like that (returning decrypted bytes before authentication is complete) is GPG. Here is an example of the output that GPG produces when decrypting a tampered message:

gpg: AES256.CFB encrypted data
gpg: encrypted with 1 passphrase
This is a very long message.
gpg: WARNING: encrypted message has been manipulated!

The decrypted message (“This is a very long message”) got printed, together with a warning, and the exit status is 2, indicating that an error occurred. It is important in this case that the decrypted message is not used in any way.

Other implementations avoid this problem by simply not encrypting large amounts of data. If given a large file to encrypt, the file is first split into multiple chunks of a few KiB, then each chunk is encrypted independently, with its own nonce and authentication tag. Because each chunk is small, authentication and decryption can happen in memory, one before the other. If a chunk was tampered, decryption would stop, returning truncated output, but never tampered output. It’s still important to check the exit status of such an implementation, but the consequences are less catastrophic than before. The drawback of this approach is that the total size of the ciphertext increases, because each chunk requires a nonce, an authentication tag, and some information about the position of the chunk (to prevent the chunks from being reordered). Storing the nonces or the positions can be avoided by using an algorithm to generate them on the fly, but storing the tag cannot be avoided.

The method of splitting that I have just described (of splitting long messages into chunks that are individually encrypted and authenticated) is used for example in TLS, as well as the command line tool AGE.

Summary and final considerations

At the beginning of this article we have seen some risks of using bare encryption ciphers: one of them in particular was malleability, that is: the property that ciphertexts may be modified without detection.

This problem was addressed by using Authenticated Encryption (AE) or Authenticated Encryption with Associated Data (AEAD), which are methods to provide integrity and authenticity in addition to confidentiality when encrypting data.

We have seen the details of the two most popular authenticated encryption ciphers and briefly mentioned some of their variants. Their features are summarized here:

Cipher	Cipher Type	Key Size	Nonce Size	Nonce Reuse	Tag Size
ChaCha20-Poly1305	Stream, AEAD	256 bits	96 bits	Catastrophic	128 bits
XChaCha20-Poly1305	Stream, AEAD	256 bits	192 bits	Catastrophic	128 bits
AES-GCM	Stream, AEAD	128, 192, 256 bits	96 bits	Catastrophic	128 bits
AES-GCM-SIV	Stream, AEAD	128 or 256 bits	96 bits	Reduced risk	128 bits

Authenticated encryption is used in most of our modern protocols, including TLS, S/MIME, PGP/GPG, and many more. Failure to implement authenticated encryption correctly has lead to some serious issues in the past.

Whenever you’re using encryption, ask yourself: how is integrity and authentication verified? And remember: it’s essential to verify the authenticity of data before using it.

I hope you enjoyed this article! As usual, if you have any suggestions or spotted some mistakes, let me know in the comments or by contacting me!

What time is it? A simple question with a complex answer. How computers synchronize time

andreacorbellini — Mon, 23 Jan 2023 19:15:00 +0000

Ever wondered how your computer or your phone displays the current date and time accurately? What keeps all the devices in the world (and in space) in agreement on what time it is? What makes applications that require precise timing possible?

In this article, I will explain some of the challenges with time synchronization and explore two of the most popular protocols that devices use to keep their time in sync: the Network Time Protocol (NTP) and the Precision Time Protocol (PTP).

What is time?

It wouldn’t be a good article about time synchronization without spending a few words about time. We all have an intuitive concept of time since childhood, but stating precisely what ‘time’ is can be quite a challenge. I’m going to give you my idea of it.

Here is a simple definition to start with: time is how we measure changes. If the objects in the universe didn’t change and appeared to be fixed, without ever moving or mutating, I think we could all agree that time wouldn’t be flowing. Here by ‘change’ I mean any kind of change: from objects falling or changing shape, to light diffusing through space, or our memories building up in our mind.

This definition may be a starting point but does not capture all we know about time. Something that it does not capture is our concept of past, present, and future. From our day-to-day experience, we know in fact that an apple would fall off the tree due to gravity, under the normal flow of time. If we observed an apple rising from the ground, attaching itself to the tree (without the action of external forces), we could perhaps agree that what we’re observing is time flowing backward. And yet, both the apple falling off the tree and the apple rising from the ground are two valid changes from an initial state. This is where causality comes into place: time flows in such a way that the cause must precede the effect.

We can now refine our definition of time as an ordered sequence of changes, where each change is linked to the previous one by causality.

How do we measure time?

Now we have a more precise definition of time, but we still don’t have enough tools to define what is a second, an hour, or a day. This is where things get more complicated.

If we look at the definition of ‘second’ from the international standard, we can see that it is currently defined from the emission frequency of caesium-133 (¹³³Cs) atoms. If you irradiate caesium-133 atoms with some light having sufficient energy, the atoms will absorb the light, get excited, and release the energy back in the form of light at a specific frequency. That frequency of emission is defined as 9192631770 Hz, and the second is defined as the inverse of that frequency. This definition is known as the caesium standard.

Here’s a problem to think about: how do we know that a caesium-133 atom, after getting excited, really emits light at a fixed frequency? The definition of second is implying that the frequency is constant and the same all over the world, but how do we know it’s really the case? This assumption is supported by quantum physics, according to which atoms can only transition between discrete (quantified) energy states. When an atom gets excited, it transitions from an energy state $E_1$ to an energy state $E_2$. Atoms like to be in the lowest energy state, so the atom will not stay in the state $E_2$ for long, and will want to go back to $E_1$. When doing that, it will release an amount of energy of exactly $E_2 - E_1$ in the form of a photon. According to the Planck formula, the photon will have frequency $f = (E_2 - E_1) / h$ where $h$ is the Planck constant. Because the energy levels are fixed, the resulting emission frequency is fixed as well.

By the way, this process of absorption and emission of photons is the same process that causes fluorescence.

Visualization of the absorption and emission process for an atom transitioning between a ground state $E_1$ to an excited state $E_2$.

Assuming that caesium-133 atoms emit light at a single, fixed frequency, we can now build extremely accurate caesium atomic clocks and measure spans of time with them. Existing caesium atomic clocks are estimated to be so precise that they may lose one second every 100 million years.

The same approach can be applied to other substances as well: atomic clocks have been constructed using rubidium (Rb), strontium (Sr), hydrogen (H), krypton (Kr), ammonia (NH₃), ytterbium (Yb), each having its own emission frequency, and their own accuracy. The most accurate clock ever built is a strontium clock which may lose one second every 15 billion years.

Time dilation

If we have two atomic clocks and we let them run for a while, will they show the same time? This might sound like a rhetorical question: we just established that the frequencies of emission of atoms are fixed, so why would two identical atomic clocks ever get out of sync? Well, as a matter of fact, two identical atomic clocks may get out of sync, and this problem is not due to the clocks, but with time itself: it appears that time does not always flow in the same way everywhere.

Many experiments have shown this effect on our planet, the most famous one probably being the Hafele-Keating experiment. In this experiment, a set of caesium clocks was placed on an airplane flying around the world west-to-east, another set was placed on an airplane flying east-to-west, and another set remained on ground. The 3 sets of clocks, which were initially in sync before the planes took off, were showing different times once reunited after the trip. This experiment and similar ones have been repeated and refined multiple times, and they all showed consistent results.

These effects were due to time dilation, and the results were consistent with the predictions of special relativity and general relativity.

Time dilation due to special relativity

Special relativity predicts that if two clocks are moving with two different velocities, they are going to measure different spans of time.

Special relativity is based on two principles:

the speed of light is constant;
there are no privileged reference frames.

To understand how these principles affect the flow of time, it’s best to look at an example: imagine that a passenger is sitting on a train with a laser and a mirror in front of them. Another person is standing on the ground next to the railroad and observing the train passing. The passenger points the laser perpendicular to the mirror and turns it on.

What the passenger will observe is the beam of light from the laser to hit the mirror and come back in a straight line:

Portion of the beam of light in the train reference frame, emitted from the laser (bottom) and bouncing from the mirror (top). Note how it follows a vertical path.

From the observer perspective, however, things are quite different. Because the train is moving relative to the observer, the beam looks like it’s taking a different, slightly longer path:

The same portion of light beam as before, but this time in the observer reference frame. Note how it follows a diagonal path, longer than the vertical path in the train reference frame.

If both the passenger and the observer measure how long it took for the light beam to hit back at the source, and if the principles of special relativity hold, then the two persons will record different measurements. If the speed of light is constant, and there is no privileged reference frame, then the speed of light $c$ must be the same in both reference frames. From the passenger’s perspective, the beam has traveled a distance of $2 L$, taking a time $2 L / c$. From the observer’s perspective, the beam has traveled a longer distance $2 M$, with $M > L$, taking a longer time $2 M / c$.

Comparison of the light beams as seen from the two reference frames. In the train reference frame, the light beam is a vertical line of length $L$ (therefore traveling a path of length $2 L$ after bouncing from the mirror). In the observer reference frame, the light beam is distorted due to the velocity of the train. If the train moves at speed $v$, then the light beam travels a total length of $2 M = 2 L c / \sqrt{c^2 - v^2}$.

How can we reconcile these counterintuitive measurements? Special relativity does it is by stating that time flows differently in the two reference frames. Time runs “slower” inside the train and runs “faster” for the observer. One consequence of that is that the passenger ages less than the observer.

Time dilation due to special relativity is not easily detectable in our day-to-day life, but it can still cause problems with high-precision clocks. This time dilation may in fact cause clock drifts in the order of hundreds of nanoseconds per day.

Time dilation due to general relativity

Experimental data shows that clocks in a gravitational field do not follow (solely) the rules of special relativity. This does not mean that special relativity is wrong, but it’s a sign that it is incomplete. This is where general relativity comes into play. In general relativity, gravity is not seen as a force, like in classical physics, but rather as a deformation of spacetime. All objects that have mass bend spacetime, and the path of objects traveling through spacetime is affected by its curvature.

An apple falling from a tree is not going towards the ground because there’s a force “pushing” it down, but rather because that’s the shortest path in spacetime (a straight line in bent spacetime).

Apple falling according to classical physics, following a parabolic motion.

Apple falling according to general relativity, following a straight path in distorted spacetime.

The larger the mass of objects, the larger the curvature of spacetime they produce. Time flows “slower” near large masses, and “faster” away from it. Interesting facts: people on a mountain age faster than people on the sea level, and it has been calculated that the core of the Earth is 2.5 years younger than the crust.

The time dilation caused by gravity on the surface of the Earth may amount to clock drifts in the order of hundreds of nanoseconds per day, just like special relativity.

Can we actually synchronize clocks?

Given what we have seen about time dilation, and that we may experience time differently, does it even make sense to talk about time synchronization? Can we agree on time if time flows differently for us?

The short answer is yes: the trick is to restrict our view to a closed system, like the surface of our planet. If we place some clocks scattered across the system, they will almost certainly experience different flows of time, due to different velocities, different altitudes, and other time dilation phenomena. We cannot make those clocks agree on how much time has passed since a specific event; what we can do is aggregate all the time measurements from the clocks and average them out. This way we end up with a value that is representative of how much time has passed on the entire system—in other words, we get an “overall time” for the system.

Very often, the system that we consider is not restricted to just the surface of our planet, but involves the Sun, and sometimes the moon as well. In fact, what we call one year is roughly the time it takes for the Earth to complete an orbit around the Sun; one day is roughly the time it takes for the Earth to spin around itself once and face the Sun in the same position again. Including the Sun (or the moon) in our time measurements is complicated: in part this complexity comes from the fact that precise measurements of the Earth’s position are difficult, and in part from the fact that the Earth’s rotation is not regular, not fully predictable, and it’s slowing down. It’s worth noting that climate and geological events affect the Earth’s rotation in a measurable way, and such events are very hard to model accurately.

What is important to understand here is that the word ‘time’ is often used to mean different things. Depending on how we measure it, we can end up with different definitions of time. To avoid ambiguity, I will classify ‘time’ into two big categories:

Elapsed time: this is the time measured directly by a clock, without using any extra information about the system where the clock lies into or about other clocks.

We can use elapsed time to measure durations, latencies, frequencies, as well as lengths.
Coordinated time: this is the time measured by using a clock, paired with information about the system where it’s located (like position, velocity, and gravity), and/or information from other clocks.

This notion of time is mostly useful for coordinating events across the system. Some practical examples: scheduling the execution of tasks in the future, checking the expiration of certificates, real-time communication.

Time standards

Over the centuries several time standards have been introduced to measure coordinated time. Nowadays there are three major standards in use: TAI, UTC, and GNSS. Let’s take a brief look at them.

TAI

International Atomic Time (TAI) is based on the weighted average of the elapsed time measured by several atomic clocks spread across the world. The more a clock in TAI is precise, the more it contributes to the weighted average. The fact that the clocks are spread in multiple locations, and the use of an average, mitigates relativistic effects and yields a value that we can think of as the overall time flow experienced by the surface of the Earth.

Note that the calculations for TAI does not include the Earth’s position with respect to the Sun.

Distribution of the laboratories that contribute to International Atomic Time (TAI) all over the world as of 2020. Map taken from the BIPM Annual Report on Time Activities.

UTC

Coordinated Universal Time (UTC) is built upon TAI. UTC, unlike TAI, is periodically adjusted to synchronize it with the Earth’s rotation around itself and the Sun. The goal is to make sure that 24 UTC hours are equivalent to a solar day (within a certain degree of precision). Because, as explained earlier, the Earth’s rotation is irregular, not fully predictable, and slowing down, periodic adjustments have to be made to UTC at irregular intervals.

The adjustments are performed by inserting leap seconds: these are extra seconds that are added to the UTC time to “slow down” the UTC time flow and keep it in sync with Earth’s rotation. On days when a leap second is inserted, UTC clocks go from 23:59:59 to 23:59:60.

A visualization of leap seconds inserted into UTC until the end of 2022. Each orange dot represents a leap second (not in scale). When UTC was started in 1972, it started with 10 seconds of offset from TAI. As you can see, the insertion of leap seconds is very irregular: some decades have seen many leap seconds, others have seen much more.

It’s worth noting that the practice of inserting leap seconds is most likely going to be discontinued in the future. The main reason is that leap seconds have been the source of complexity and bugs in computer systems, and the benefit-to-pain ratio of leap seconds is not considered high enough to keep adding them. If leap seconds are discontinued, UTC will become effectively equivalent to TAI, with an offset: UTC will always differ from TAI by a few seconds, but this difference will always be constant, if no more leap seconds are inserted.

GNSS

Global Navigation Satellite System (GNSS) is based on a mix of accurate atomic clocks on ground and less accurate atomic clocks on artificial satellites orbiting around the Earth. The clocks on the satellites, being less accurate and subject to a variety of relativistic effects, are updated about twice a day from ground stations to correct clock drifts. Nowadays there are several implementations of GNSS around the world, including:

the United States’ Global Positioning System (GPS);
the European Galileo system;
China’s BeiDou (BDS);
the Russian GLONASS.

When GPS was launched, it was synchronized with UTC, however GPS, unlike UTC, is not adjusted to follow the Earth’s rotation, and due to that, GPS today differs from UTC by 18 seconds (because 18 leap seconds have been inserted since GPS was launched in 1980). BeiDou also does not implement leap seconds. GPS and BeiDou are therefore compatible with TAI.

Other GNSS systems like Galileo and GLONASS do implement leap seconds and are therefore compatible with UTC.

Time synchronization protocols

Dealing with coordinated time is not trivial. Different ways to deal with relativistic effects and Earth’s irregular rotation result in different time standards that are not always immediately compatible with each other. Nonetheless, once we agree on a well-defined time standard, we have a way to ask the question “what time is it?” and receive an accurate answer all around the world (within a certain degree of precision).

Let’s now take a look at how computers on a network can obtain an accurate value for the coordinated time given by a time standard. I will describe two popular protocols: NTP and PTP. The two are using similar algorithms, but offer different precision: milliseconds (NTP) and nanoseconds (PTP). Both use UDP/IP as the transport protocol.

Network Time Protocol (NTP)

The way time synchronization works with NTP is the following: a computer that wants to synchronize its time periodically queries an NTP server (or multiple servers) to get the current coordinated time. The server that provides the current coordinated time may have obtained the time from an accurate source clock connected to the server (like an atomic clock synchronized with TAI or UTC, or a GNSS receiver), or from a previous synchronization from another NTP server.

To record how “fresh” the coordinated time from an NTP server is (how distant the NTP server is from the source clock), NTP has a concept of stratum: this is a number that indicates the number of ‘hops’ from the accurate clock source:

stratum 0 is used to indicate an accurate clock;
stratum 1 is a server that is directly connected to a stratum 0 clock;
stratum 2 is a server that is synchronized from a stratum 1 server;
stratum 3 is a server that is synchronized from a stratum 2 server;
and so on…

The maximum stratum allowed is 15. There’s also a special stratum 16: this is not a real stratum, but a special value used by clients to indicate that time synchronization is not happening (most likely because the NTP servers are unreachable).

Examples of different NTP strata in a distributed network. A stratum n server obtains its time from stratum n - 1 servers.

The major problem with synchronizing time over a network is latency. Networks can be composed of multiple links, some of which may be slow or overloaded. Simply requesting the current time from an NTP server without taking latency into account would lead to an imprecise response. Here is how NTP deals with this problem:

The NTP client sends a request via a UDP packet to an NTP server. The packet includes an originate timestamp $t_0$ that indicates the local time of the client when the packet was sent.
The NTP server receives the request and records the receive timestamp $t_1$, which indicates the local time of the server when the request was received.
The NTP server processes the request, prepares a response, and records the transmit timestamp $t_2$, which indicates the local time of the server when the response was sent. The timestamps $t_0$, $t_1$ and $t_2$ are all included in the response.
The NTP client receives the response and records the timestamp $t_3$, which indicates the local time of the client when the response was received.

The NTP synchronization algorithm.

Our goal is now to calculate an estimate for the network latency and processing delay and use that information to calculate, in the most accurate way possible, the offset between the NTP client clock and the NTP server clock.

The difference $t_3 - t_0$ is the duration of the overall exchange. The difference $t_2 - t_1$ is the duration of the NTP server processing delay. If we subtract these two durations, we get the total network latency experienced, also known as round-trip delay:

$$\delta = (t_3 - t_0) - (t_2 - t_1)$$

If we assume that the transmit delay and the receive delay are the same, then $\delta / 2$ is the average network latency (this assumption may not be true in a general network, but that’s the assumption that NTP makes).

Under this assumption, the time $t_0 + \delta/2$ is the time on the client’s clock that corresponds to $t_1$ on the server’s clock. Similarly, $t_3 - \delta/2$ on the client’s clock corresponds to $t_2$ on the server’s clock. These correspondences let us calculate two estimates for the offset between the client’s clock and the server’s clock:

$$\begin{align*} \theta_1 & = t_1 - (t_0 + \delta/2) \\ \theta_2 & = t_2 - (t_3 - \delta/2) \end{align*}$$

We can now calculate the client-server offset $\theta$ as an average of those two estimates:

$$\begin{align*} \theta & = \frac{\theta_1 + \theta_2}2 \\ & = \frac{t_1 - (t_0 + \delta/2) + t_2 - (t_3 - \delta/2)}2 \\ & = \frac{t_1 - t_0 - \delta/2 + t_2 - t_3 + \delta/2}2 \\ & = \frac{(t_1 - t_0) + (t_2 - t_3)}2 \\ \end{align*}$$

Note that the offset $\theta$ may be a positive duration (meaning that the client clock is in the past), a negative duration (meaning that the client clock is in the future) or zero (meaning that the client clock agrees with the server clock, which is unlikely).

After calculating the offset $\theta$, the client can update its local clock by shifting it by $\theta$ and from that point the client will be in sync with the server (within a certain degree of precision).

Once the synchronization is done, it is expected that the client’s clock will start drifting away from the server’s clock. This may happen due to relativistic effects and more importantly because often clients do not use high-precision clocks. For this reason, it is important that NTP clients synchronize their time periodically. Usually NTP clients start by synchronizing time every minute or so when they are started, and then progressively slow down until they synchronize time once every half an hour or every hour.

There are some drawbacks with this synchronization method:

The request and response delays may not be perfectly symmetric, resulting in inaccuracies in the calculations of the offset $\theta$. Network instabilities, packet retransmissions, change of routes, queuing may all cause unpredictable and inconsistent delays.
The timestamps $t_1$ and $t_3$ must be set as soon as possible (as soon as the packets are received), and similarly $t_0$ and $t_2$ must be set as late as possible. Because NTP is implemented at the software level, there may be non-negligible delays in acquiring and recording these timestamps. These delays may be exacerbated if the NTP implementation is not very performant, or if the client or server are under high load.
Errors propagate and add up when increasing the number of strata.

For all these reasons, NTP clients do not synchronize time just from a single NTP server, but from multiple ones. NTP clients take into account the round-trip delays, stratum, and jitter (the variance in round-trip delays) to decide the best NTP server to get their time from. Under ideal network conditions, an NTP client will always prefer a server with a low stratum. However, an NTP client may prefer an NTP server with high stratum and more reliable connectivity over an NTP server with low stratum but a very unstable network connection.

The precision offered by NTP is in the order of a few milliseconds.

Precision Time Protocol (PTP)

PTP is a time synchronization protocol for applications that require more accuracy than the one provided by NTP. The main differences between PTP and NTP are:

Precision: NTP offers millisecond precision, while PTP offers nanosecond precision.
Time standard: NTP transmits UTC time, while PTP transmits TAI time and the difference between TAI and UTC.
Scope: NTP is designed to be used over large networks, including the internet, while PTP is designed to be used in local area networks.
Implementation: NTP is mainly software based, while PTP can be implemented both via software and on specialized hardware. The use of specialized hardware considerably reduces delays and jitter introduced by software.

Time Card: an open-source hardware card with a PCIe interface that can be plugged into a computer that can serve as a PTP master. It can be optionally connected to a GNSS receiver and contains a rubidium (Rb) clock.

Hierarchy: NTP can support a complex hierarchy of NTP servers, organized via strata. While PTP does not put a limitation on the number of nodes involved, the hierarchy is usually only composed of master clocks (the source of time information) and slave clocks (the receivers of time information). Sometimes boundary clocks are used to relay time information to network segments that are unreachable by the master clocks.
Clock selection: in NTP, clients select the best NTP server to use based on the NTP server clock quality and the network connection quality. In PTP, slaves do not select the best master clock to use. Instead, master clocks perform a selection between themselves using a method called best master clock algorithm. This algorithm takes into account the clock’s quality and input from system administrators, and does not factor network quality at all. The master clock selected by the algorithm is called grandmaster clock.
Algorithm: in NTP, clients poll the time information from servers periodically and calculate the clock offset using the algorithm described above (based on the timestamps $t_0$, $t_1$, $t_2$ and $t_3$). With PTP, the algorithm used by slaves to calculate the offset from the grandmaster clock is somewhat similar to the one used in NTP, but the order of operations is different:
1. the grandmaster periodically broadcasts its time information $T_0$ over the network;
2. each slave records the time $T_1$ when the broadcasted time was received;
3. each slave sends a packet to the grandmaster at time $T_2$;
4. the grandmaster receives the packet at time $T_3$ and sends that value back to the slave.
The average network delay can be calculated as $\delta = ((T_3 - T_0) - (T_2 - T_1)) / 2$. The clock offset can be calculated as $\theta = ((T_1 - T_0) + (T_2 - T_3)) / 2$.

The PTP time synchronization algorithm.

Summary

Synchronizing time across a computer network is not an easy task, and first of all requires agreeing on a definition of ‘time’ and on a time standard.
Relativistic effects make it so that time may not flow at the same speed all over the globe, and this means that time has to be measured and aggregated across the planet in order to get a suitable value that can be agreed on.
Atomic clocks and GNSS are the clock sources used for most applications nowadays.
NTP is a time synchronization protocol that can be used on large and distributed networks like the internet and provides millisecond precision.
PTP is a time synchronization protocol for local area networks and provides nanosecond precision.

Can we encrypt data using Elliptic Curves?

andreacorbellini — Mon, 02 Jan 2023 06:30:00 +0000

From time to time, I hear people saying that Elliptic Curve Cryptography (ECC) cannot be used to directly encrypt data, and you can only do key agreement and digital signatures with it. This is a common misconception, but it’s not actually true: you can indeed use elliptic curve keys to encrypt arbitrary data. And I’m not talking about hybrid-encryption schemes (like ECIES or HPKE): I’m talking about pure elliptic curve encryption, and I’m going to show an example of it in this article. It’s true however that pure elliptic curve encryption is not widely used or standardized because, as I will explain at the end of the article, key agreement is more convenient for most applications.

Quick recap on Elliptic Curve Cryptography

I wrote an in-depth article about elliptic curve cryptography in the past on this blog, and here is a quick recap: points on an elliptic curve from an interesting algebraic structure: a cyclic group. This group lets us do some algebra with the points of the elliptic curve: if we have two points $A$ and $B$, we can add them ($A + B$) or subtract them ($A - B$). We can also multiply a point by an integer, which is the same as doing repeated addition ($n A$ = $A + A + \cdots + A$, $n$ times).

We know some efficient algorithms for doing multiplication, but the reverse of multiplication is believed to be a “hard” problem for certain elliptic curves, in the sense that we know efficient methods for computing $B = n A$ given $n$ and $A$, but we do not know very efficient methods to figure out $n$ given $A$ and $B$. This problem of reversing a multiplication is known as Elliptic Curve Discrete Logarithm Problem (ECDLP).

Elliptic Curve Cryptography is based on multiplication of elliptic curve points by integers and its security is given mainly by the difficulty of solving the ECDLP.

In order to use Elliptic Curve Cryptography, we first have to generate a private-public key pair:

the private key is a random integer $s$;
the public key is the result of multiplying the integer $s$ with the generator $G$ of the elliptic curve group: $P = s G$.

Let’s now see a method to use Elliptic Curve Cryptography to encrypt arbitrary data, so that we can demystify the common belief that elliptic curves cannot be used to encrypt.

Elliptic Curve ElGamal

One method to encrypt data with elliptic curve keys is ElGamal. This is not the only method, of course, but it’s the one that I chose because it’s well known and simple enough. ElGamal is a cryptosystem that takes the name from its author and works on any cyclic group, not just elliptic curve groups.

If we want to encrypt a message using the public key $P$ via ElGamal, we can do the following:

map the message to a point $M$ on the elliptic curve
generate a random integer $t$
compute $C_1 = t G$
compute $C_2 = t P + M$
return the tuple $(C_1, C_2)$

To decrypt an encrypted tuple $(C_1, C_2)$ using the private key $s$, we can do the following:

compute $M = C_2 - s C_1$
map the point $M$ back to a message

The scheme works because: $$\begin{align*} s C_1 & = s (t G) \\ & = t (s G) \\ & = t P \end{align*}$$ therefore: $$\begin{align*} C_2 - s C_1 & = (t P + M) - (t P) \\ & = M \end{align*}$$

There’s however a big problem with this scheme: how do we map a message to a point, and vice versa? How can we perform step 1 of the encryption algorithm, or step 2 of the decryption algorithm?

Mapping a message to a point

A message can be an arbitrary byte string. An elliptic curve point is, generally speaking, a pair of integers $(x, y)$ belonging to the elliptic curve field. How can we transform a byte string into a pair of field integers?

Well, as far as computers are concerned, both byte strings and integers have the same nature: they are just sequences of bits, so there’s a natural map between the two. We could take the message, split it into two parts, and interpret the first part as an integer $x$ and the second part as an integer $y$. This would work for obtaining two arbitrary integers, but there’s a problem: the coordinates $x$ and $y$ of an elliptic curve point are related by a mathematical equation (the curve equation), so we cannot choose two arbitrary $x$ and $y$ and expect them to identify a valid point on the curve. In fact, for curves in Weierstrass form, given $x$ there are at most two possible choices for $y$, so it’s very unlikely that this splitting method will yield a valid point.

Let’s change our strategy a little bit: instead of transforming the message to a pair $(x, y)$, we transform it to $x$ and then we compute a valid $y$ from the curve equation. This is a much better method, but there’s still a problem: generally speaking, not every $x$ will have a corresponding $y$. Not every $x$ can satisfy the curve equation.

Luckily, most of the popular elliptic curves used in cryptography have an interesting property: about half of the possible field integers are valid $x$-coordinates. To see this, let’s take a look at an example: the curve secp384r1. This is a Weierstrass curve that has the following order:

0xffffffffffffffffffffffffffffffffffffffffffffffffc7634d81f4372ddf581a0db248b0a77aecec196accc52973

I remind you that the order is the number of valid points that belong to the elliptic curve group. Because this is a Weierstrass curve, for each $x$ there are 2 possible points, so the number of valid $x$-coordinates is order / 2. Given an arbitrary 384-bit integer, what are the chances that this is a valid $x$-coordinate? The answer is (order / 2) / (2 ** 384) which is approximately 0.5 or 50%.

OK, but how does this help with our goal: mapping an arbitrary message to a valid $x$-coordinate? It’s simple: we can append a random byte (or multiple bytes) to the message. We call this extra byte (or bytes): padding. If the resulting padded message does not translate to a valid $x$-coordinate, we choose another random padding and try again, until we find one that works. Given that there’s 50% chance of finding a valid $x$ coordinate, this method will find a valid $x$-coordinate very quickly: on average, this will happen on the first or the second try.

Example of how to use padding to obtain a valid elliptic curve point from an arbitrary message.

This operation can be easily reversed: if you have a point $(x, y)$, in order to recover the message that generated it, just take the $x$ coordinate and remove the padding. That’s it!

It’s worth noting that there are some standard curves where all the possible byte strings (of the proper size) can be translated to elliptic curve points, without any random padding needed. For example, with Curve25519, every 32-byte string is a valid elliptic curve point. Another curve like that is Curve448.

It’s also important to note that the padding does not need to be truly random. In the image above I show a padding that is simply a constantly increasing sequence of numbers: 1, 2, 3, … That’s enough to find a valid point.

Putting everything together

We have seen how to map a message to a point and how ElGamal works, so now we have all the elements to write some working code. I’m choosing Python and the ECPy package to work with elliptic curves, which you can install with pip install ecpy.

import random
from ecpy.curves import Curve, Point


def message_to_point(curve: Curve, message: bytes) -> Point:
    # Number of bytes to represent a coordinate of a point
    coordinate_size = curve.size // 8
    # Minimum number of bytes for the padding. We need at least 1 byte so that
    # we can try different values and find a valid point. We also add an extra
    # byte as a delimiter between the message and the padding (see below)
    min_padding_size = 2
    # Maximum number of bytes that we can encode
    max_message_size = coordinate_size - min_padding_size

    if len(message) > max_message_size:
        raise ValueError('Message too long')

    # Add a padding long enough to ensure that the resulting padded message has
    # the same size as a point coordinate. Initially the padding is all 0
    padding_size = coordinate_size - len(message)
    padded_message = bytearray(message) + b'\0' * padding_size

    # Put a delimiter between the message and the padding, so that we can
    # properly remove the padding at decrypt time
    padded_message[len(message)] = 0xff

    while True:
        # Convert the padded message to an integer, which may or may not be a
        # valid x-coordinate
        x = int.from_bytes(padded_message, 'little')
        # Calculate the corresponding y-coordinate (if it exists)
        y = curve.y_recover(x)
        if y is None:
            # x was not a valid coordinate; increment the padding and try again
            padded_message[-1] += 1
        else:
            # x was a valid coordinate; return the point (x, y)
            return Point(x, y, curve)


def encrypt(public_key: Point, message: bytes) -> bytes:
    curve = public_key.curve
    # Map the message to an elliptic curve point
    message_point = message_to_point(curve, message)
    # Generate a randon number
    seed = random.randrange(0, curve.field)
    # Calculate c1 and c2 according to the ElGamal algorithm
    c1 = seed * curve.generator
    c2 = seed * public_key + message_point
    # Encode c1 and c2 and return them
    return bytes(curve.encode_point(c1) + curve.encode_point(c2))


def point_to_message(point: Point) -> bytes:
    # Number of bytes to represent a coordinate of a point
    coordinate_size = curve.size // 8
    # Convert the x-coordinate of the point to a byte string
    padded_message = point.x.to_bytes(coordinate_size, 'little')
    # Find the padding delimiter
    message_size = padded_message.rfind(0xff)
    # Remove the padding and return the resulting message
    message = padded_message[:message_size]
    return message


def decrypt(curve: Curve, secret_key: int, ciphertext: bytes) -> bytes:
    # Decode c1 and c2 and convert them to elliptic curve points
    c1_bytes = ciphertext[:len(ciphertext) // 2]
    c2_bytes = ciphertext[len(ciphertext) // 2:]
    c1 = curve.decode_point(c1_bytes)
    c2 = curve.decode_point(c2_bytes)

    # Calculate the message point according to the ElGamal algorithm
    message_point = c2 - secret_key * c1
    # Convert the message point to a message and return it
    return point_to_message(message_point)

And here is an usage example:

curve = Curve.get_curve('secp384r1')

secret_key = 0x123456789abcdef
public_key = secret_key * curve.generator

message = 'hello'
print('  Message:', message)

encrypted = encrypt(public_key, message.encode('utf-8'))
print('Encrypted:', encrypted.hex())

decrypted = decrypt(curve, secret_key, encrypted).decode('utf-8')
print('Decrypted:', decrypted)

Which produces the following output:

  Message: hello
Encrypted: 04fa333c6a03994c5bce4627de4447c5cdd358415f8db2745b67836932a0d5e81f19...
Decrypted: hello

Some considerations on padding and security

It’s important to note that padding is a very delicate problem in cryptography. There exist many padding schemes, and not all of them are secure. The padding scheme that I wrote in this article was just for demonstration purposes and may not be the most secure, so don’t use it in production systems. Take a look at OAEP if you’re looking for a modern and secure padding scheme.

Another thing to note is that the decryption method that I wrote does not check if the decryption was successful. If you try to decrypt an invalid ciphertext, or use the wrong key, you won’t get an error but instead a random result, which is not desiderable. A good padding scheme like OAEP will instead throw an error if decryption was unsuccessful.

(Receiving an error when decryption is not successful is very important due to the fact that schemes like ElGamal are malleable. Check out my post about authenticated encryption for examples and details about why this is important.)

Cost of elliptic curve encryption

With Elliptic Curve ElGamal, if we are using an n-bit elliptic curve, we can encrypt messages that are at most n-bit long (actually less than that, if we’re using padding), and the output is at least 2n-bit long (if the resulting points $C_1$ and $C_2$ are encoded using point compression). This means that encryption using Elliptic Curve ElGamal doubles the size of the data that we want to encrypt. It also requires a fair amount of compute resources, because it involves a random number generation and 2 point multiplications.

In short, Elliptic Curve ElGamal is expensive both in terms of space and in terms of time and compute power, and this makes it unattractive in applications like TLS or general purpose encryption.

So what can we use Elliptic Curve ElGamal for? We can use it to encrypt symmetric keys, such as AES keys or ChaCha20 keys, and then use these symmetric keys to encrypt our arbitrary data. Symmetric keys are relatively short (ranging from 128 to 256 bits nowadays), so they can be encrypted with one round of Elliptic Curve ElGamal with most curves. It’s worth noting that this is the same approach that we use with RSA encryption: for most applications, we don’t use RSA to encrypt data directly, but rather we use RSA to encrypt symmetric keys which are later used for encrypting data.

These are the reason why schemes like Elliptic Curve ElGamal, or other methods of encryption with elliptic curves, are not used in practice:

elliptic curve encryption is more expensive than hybrid encryption;
hybrid encryption scales better and is more performant;
elliptic curve key exchange is simpler and has fewer pitfalls than encryption.

In conclusion, there are no practical benefits from elliptic curve encryption compared to hybrid encryption with key agreement, and that’s why we don’t use it. However, the idea that elliptic curves cannot be used for encryption is a myth, and I hope this article will help clarify that confusion.

The curious case of bad blocks on an SSD, and how I got rid of them

andreacorbellini — Thu, 29 Dec 2022 04:00:00 +0000

I recently inherited a laptop that was broken by pouring some hot coffee on it. When I dissected it, it was pretty clear that most of it was unrecoverable: the CPU was completely fried, and its thermal paste splashed everywhere on the motherboard. (I wish I took a picture of it that I could share.) There were however a few pieces that looked in a good state. One of those components was a NVMe Solid State Drive (SSD). I decided to take this SSD and recycle it in my own laptop, maybe to join my LVM pool.

When I plugged it in my laptop however the SSD I tried to navigate the filesystem, and it appeared to be working quite slowly. Opening certain files sometimes would hang indefinitely. Upon inspection of the SMART data and the kernel logs, it was clear that the drive was returning plenty of read errors.

Here is a sample of the kernel logs:

$ dmesg
...
[  860.465707] ata2.00: exception Emask 0x0 SAct 0x8 SErr 0x0 action 0x0
[  860.465726] ata2.00: irq_stat 0x40000008
[  860.465733] ata2.00: failed command: READ FPDMA QUEUED
[  860.465737] ata2.00: cmd 60/08:18:58:c5:28/00:00:00:00:00/40 tag 3 ncq dma 4096 in
[  860.465737]          res 41/40:08:58:c5:28/00:00:00:00:00/00 Emask 0x409 (media error) <F>
[  860.465750] ata2.00: status: { DRDY ERR }
[  860.465754] ata2.00: error: { UNC }
[  860.467010] ata2.00: configured for UDMA/133
[  860.467046] sd 1:0:0:0: [sda] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[  860.467054] sd 1:0:0:0: [sda] tag#3 Sense Key : Medium Error [current]
[  860.467060] sd 1:0:0:0: [sda] tag#3 Add. Sense: Unrecovered read error - auto reallocate failed
[  860.467066] sd 1:0:0:0: [sda] tag#3 CDB: Read(10) 28 00 00 28 c5 58 00 00 08 00
[  860.467069] I/O error, dev sda, sector 2671960 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
...
[ 1057.914608] ata2: softreset failed (device not ready)
[ 1057.914623] ata2: hard resetting link
[ 1063.230631] ata2: found unknown device (class 0)
[ 1067.934891] ata2: softreset failed (device not ready)
[ 1067.934911] ata2: hard resetting link
[ 1073.270826] ata2: found unknown device (class 0)
[ 1078.486604] ata2: link is slow to respond, please be patient (ready=0)
[ 1102.970841] ata2: softreset failed (device not ready)
[ 1102.970860] ata2: limiting SATA link speed to 1.5 Gbps
[ 1102.970865] ata2: hard resetting link
[ 1108.034602] ata2: found unknown device (class 0)
[ 1108.194622] ata2: softreset failed (device not ready)
[ 1108.194638] ata2: reset failed, giving up
[ 1108.194642] ata2.00: disable device
[ 1108.194677] ata2: EH complete
[ 1108.194726] sd 1:0:0:0: [sda] tag#6 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=232s
[ 1108.194740] sd 1:0:0:0: [sda] tag#6 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[ 1108.194748] I/O error, dev sda, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
...

These logs show that the SSD was returning errors (exceptions) to the operating system, and also that the SSD would sometimes become so slow to respond that the kernel would attempt to reset it (which didn’t really work, I can tell you).

Here is an excerpt of the SMART data:

$ smartctl -a /dev/sda
...
SMART Attributes Data Structure revision number: 0
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   166   001   006    Pre-fail  Always   In_the_past 0
  5 Retired_Block_Count     0x0032   100   100   036    Old_age   Always       -       76
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1740
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       2247
100 Total_Erase_Count       0x0032   100   100   000    Old_age   Always       -       7654272
168 Min_Erase_Count         0x0032   253   096   000    Old_age   Always       -       0
169 Max_Erase_Count         0x0032   083   083   000    Old_age   Always       -       181
171 Program_Fail_Count      0x0032   253   253   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   253   253   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   100   100   000    Old_age   Offline      -       14
175 Program_Fail_Count_Chip 0x0032   253   253   000    Old_age   Always       -       0
176 Unused_Rsvd_Blk_Cnt_Tot 0x0032   253   253   000    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0032   090   090   000    Old_age   Always       -       116
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   000    Old_age   Always       -       399
179 Used_Rsvd_Blk_Cnt_Tot   0x0032   100   100   000    Old_age   Always       -       2460
180 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       2980
184 End-to-End_Error        0x0032   100   100   000    Old_age   Always       -       9919
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       10051
188 Command_Timeout         0x0032   253   253   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   038   000   000    Old_age   Always       -       38 (Min/Max 16/48)
195 Hardware_ECC_Recovered  0x0032   100   085   000    Old_age   Always       -       715203
196 Reallocated_Event_Count 0x0032   100   100   036    Old_age   Always       -       76
198 Offline_Uncorrectable   0x0032   253   253   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x0032   253   253   000    Old_age   Always       -       0
204 Soft_ECC_Correction     0x000e   100   001   000    Old_age   Always       -       13
212 Phy_Error_Count         0x0032   253   253   000    Old_age   Always       -       0
234 Unknown_SK_hynix_Attrib 0x0032   100   100   000    Old_age   Always       -       32297
241 Total_Writes_GB         0x0032   100   100   000    Old_age   Always       -       3715
242 Total_Reads_GB          0x0032   100   100   000    Old_age   Always       -       3680
250 Read_Retry_Count        0x0032   096   096   000    Old_age   Always       -       176835377
...

This table show various attributes for the operational status of the SSD. The meaning of the numeric values is pretty much vendor-specific, so trying to understand those number exactly is quite a challenge, but what matters is that the numbers under the VALUE column are higher than the THRESH (threshold) column. The WORST column indicates the lowest VALUE that has ever been observed.

To my surprise, despite all the errors and hangs that the SSD was experiencing, the SMART values looked pretty good. Sure, there’s a very low WORST value for Raw_Read_Error_Rate (001, much lower than the threshold 001), and there is also and indication that this attribute failed in the past, but besides that everything looked acceptable enough.

Of course the SMART log was recording the read errors as well. Here’s another excerpt from the output:

$ smartctl -a /dev/sda
...
SMART Error Log Version: 1
ATA Error Count: 1875 (device log contains only the most recent five errors)
...

Error 1875 occurred at disk power-on lifetime: 1737 hours (72 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 70 98 31 af 40 40      00:02:32.920  READ FPDMA QUEUED
  47 00 01 30 08 00 a0 a0      00:02:32.920  READ LOG DMA EXT
  47 00 01 30 00 00 a0 a0      00:02:32.920  READ LOG DMA EXT
  47 00 01 00 00 00 a0 a0      00:02:32.920  READ LOG DMA EXT
  ef 10 02 00 00 00 a0 a0      00:02:32.920  SET FEATURES [Enable SATA feature]

...

Give the lack of concrete signs of old age or extended damage to the SSD, I wondered if it could be a link problem: maybe I did not insert the drive correctly, or maybe a pin was dirty. But no: upon inspection I did not find any issue, and after carefully reseating the drive, the problem was persisting.

I proceeded to run a SMART self test, here are the results (from most recent to oldest):

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short captive       Completed: read failure       90%      1736         5712
# 2  Short offline       Completed: read failure       90%      1736         5712
# 3  Extended offline    Completed: read failure       90%      1733         50117792
# 4  Extended captive    Interrupted (host reset)      90%      1730         -
# 5  Short captive       Interrupted (host reset)      90%      1730         -

The first two tests were interrupted by Linux, which tried to reset the device while the tests were running. A self-test (as the name suggests) is completely self contained and does not involve sharing of data between the SSD and the operating system in the process. The fact that the self-test was failing due to bad blocks was therefore a sign that this was not a link error, but that the blocks were really damaged.

I decided therefore to give up on trying to fix the SSD, but I still wanted to use it. After all, it was working for the most part: as long as you didn’t access the bad blocks, the SSD would behave fine. So here is my plan: I would format the SSD and create an ext4 filesystem on it, using mkfs.ext4 -c, which would scan for and exclude bad blocks so that they wouldn’t be used. The resulting filesystem would have less storage available than the advertised capacity of the SSD, but that was an acceptable trade-off for me.

And here is the most interesting part: mkfs.ext4 -c discarded all blocks before creating the filesystem. After that, it scanned for bad blocks and, shockingly, it found none!

SMART self-tests also did not report any error:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1740         -
# 2  Short offline       Completed without error       00%      1738         -

All the read errors, exceptions and the hanging problem that kept appearing before disappeared!

I’m not fully sure how to explain how this happened, but I did some research and the general consensus is that discarding bad blocks won’t recover them. My theory is that, when the coffee was poured on the laptop, a spike of voltage led to incorrect values to be written to a few blocks that were in use at that time, causing uncorrectable discrepancies between the data and the error-correcting-codes of the SSD. Discarding the blocks reset both the data cells and the ECC cells, removing all the inconsistencies.

Do you have a better explanation? Let me know in the comments!

How to use the same DNS for all connections in Ubuntu (and other network privacy tricks)

andreacorbellini — Tue, 28 Apr 2020 06:30:00 +0000

Problem

Currently Ubuntu does not offer an easy way to set up a “global” DNS for all network connections: whenever you connect to a new WiFi network, if you don’t want to use the DNS server provided by the WiFi, you are forced to go to the network settings and manually set your preferred DNS server.

With this brief guide I want to show how you can setup a global DNS to be used for all the WiFi and network connections, both old and new ones. I will also show you how to use DNSSEC, DNS-over-TLS and randomized MAC addresses for all connections.

This guide is written for Ubuntu 20.04, but in general it will work on every distribution using systemd-resolved and NetworkManager.

Step 1: setup the Global DNS in resolved

In Ubuntu (as well as many other distributions), DNS is managed by systemd-resolved. Its configuration is in /etc/systemd/resolved.conf. Open that file and add a DNS= line inside the [Resolve] section listing your preferred DNS servers. For example, if you want to use 1.1.1.1, your resolved.conf should look like this:

[Resolve]
DNS=1.1.1.1 1.0.0.1 2606:4700:4700::1111 2606:4700:4700::1001
#FallbackDNS=
#Domains=
#LLMNR=no
#MulticastDNS=no
#Cache=yes
#DNSStubListener=yes
#ReadEtcHosts=yes

Once you are done with the changes, reload systemd-resolved:

sudo systemctl restart systemd-resolved.service

You can check your changes with resolvectl status: you should see your DNS servers on top of the output, under the Global section:

$ resolvectl status
Global
       LLMNR setting: no
MulticastDNS setting: no
  DNSOverTLS setting: opportunistic
      DNSSEC setting: allow-downgrade
    DNSSEC supported: no
  Current DNS Server: 1.1.1.1
         DNS Servers: 1.1.1.1
                      1.0.0.1
                      2606:4700:4700::1111
                      2606:4700:4700::1001
...

This however won’t be enough to use that DNS! In fact, the Global DNS of systemd-resolved is just a default option that is used whenever no DNS servers are configured for an interface. When you connect to a WiFi network, NetworkManager will ask the access point for a list of DNS servers and will communicate that list to systemd-resolved, effectively overriding the settings that we just edited. If you scroll down the output of resolvectl status, you will see the DNS servers added by NetworkManager. We have to tell NetworkManager to stop doing that.

Step 2: Disable DNS processing in NetworkManager

In order for systemd-resolved to consider our global DNS, we need to tell NetworkManager not to provide any DNS information for new connections. Doing that is easy: just create a new file /etc/NetworkManager/conf.d/dns.conf (or any name you like) with this content:

[main]
# do not use the dhcp-provided dns servers, but rather use the global
# ones specified in /etc/systemd/resolved.conf
dns=none
systemd-resolved=false

To apply the settings either restart your computer or run:

sudo systemctl reload NetworkManager.service

Now, when you connect to a new network connection, NetworkManager won’t push the list of DNS servers to systemd-resolved and only the global ones will be used. If you check resolvectl status, you should see that, for every interface, there is no DNS server specified. If you specified 1.1.1.1 as your DNS servers, then you can also head over to https://1.1.1.1/help to verify that they’ve been correctly set up.

DNSSEC and DNS-over-TLS

If you would like to enable DNSSEC and/or DNS-over-TLS, the file to edit is /etc/systemd/resolved.conf. You can add the following options:

DNSSEC=true if you want all queries to be DNSSEC-validated. The default is DNSSEC=allow-downgrade, which attempts to use DNSSEC if it works properly, and falls back to disabling validation otherwise.
DNSOverTLS=true if you want all queries to go through TLS. You can also specify DNSOverTLS=opportunistic to attempt to use TLS if it supported, and fall back to the plaintext DNS protocol if it’s not.

With those options, my /etc/systemd/resolved.conf looks like this:

[Resolve]
DNS=1.1.1.1 1.0.0.1 2606:4700:4700::1111 2606:4700:4700::1001
#FallbackDNS=
#Domains=
#LLMNR=no
#MulticastDNS=no
DNSSEC=true
DNSOverTLS=opportunistic
#Cache=yes
#DNSStubListener=yes
#ReadEtcHosts=yes

Note that I’m using DNSOverTLS=opportunistic because I found that some access points with captive portals don’t work properly when using DNSOverTLS=true. Also note that DNSSEC=true may cause some pain because there are still many misconfigured domain records out there that will make make DNSSEC validation fail.

Like before, to apply the changes, run:

sudo systemctl restart systemd-resolved.service

And to verify the changes:

resolvectl status

If you’re using 1.1.1.1, you can also go to https://1.1.1.1/help to verify DNS-over-TLS.

Random MAC address

NetworkManager supports 3 options to have a random MAC address (also known as “cloned” or “spoofed” MAC address):

wifi.scan-rand-mac-address controls the MAC address used when scanning for WiFi devices. This goes into the [device] section
wifi.cloned-mac-address controls the MAC address for WiFi connections. This goes into the [connection] section
ethernet.cloned-mac-address controls the MAC address for Ethernet connections. This goes into the [connection] section

The first option can take either yes or no. The last two can take various values, but if you want a randomized MAC address you are interested in these two:

random: generate a new random MAC address each time you establish a connection
stable: this generates a MAC address that is kinda random (it’s a hash), but will be reused when you connect to the same network again.

random is better if you don’t want to be tracked, but it has the disadvantage that captive portals won’t remember you. Instead stable allows captive portals to remember you and therefore won’t show up whenever you reconnect.

Whatever options you want to go with, put them into a file /etc/NetworkManager/conf.d/mac.conf (or any other name you like). Mine looks like this:

[device]
# use a random mac address when scanning for wifi networks
wifi.scan-rand-mac-address=yes

[connection]
# use a random mac address when connecting to a network
ethernet.cloned-mac-address=random
wifi.cloned-mac-address=random

To apply the settings either run restart your computer or run:

sudo systemctl reload NetworkManager.service

You can test your changes with:

ip link

11 years of Ubuntu membership

andreacorbellini — Sat, 12 May 2018 21:30:00 +0000

It’s been 11 years and 1 month since I was awarded with official Ubuntu membership. I will never forget that day: as a kid I had to write about myself on IRC, in front of the Community Council members and answer their questions in a language that was not my primary one. I must confess that I was a bit scared that evening, but once I made it, it felt so good. It felt good not just because of the award itself, but rather because that was the recognition that I did something that mattered. I did something useful that other people could benefit from. And for me, that meant a lot.

So much time has passed since then. So many things have changed both in my life and around me, for better or worse. So many that I cannot even enumerate all of them. Nonetheless, deep inside of me, I still feel like that young kid: curious, always ready to experiment, full of hopes and uncertain (but never scared) about the future.

Through the years I received the support of a bunch of people who believed in me, and I thank them all. But if today I feel so hopeful it’s undoubtedly thanks to one person in particular, a person who holds a special place in my life. A big thank you goes to you.

Running Docker Swarm inside LXC (outdated)

andreacorbellini — Wed, 13 Apr 2016 18:00:00 +0000

UPDATE: This article was written in 2016 and refers to a version of Docker Swarm that is now known as “legacy Swarm”. The newer Docker Swarm won’t work in LXC as described in this article.

I’ve been using Docker Swarm inside LXC containers for a while now, and I thought that I could share my experience with you. Due to their nature, LXC containers are pretty lightweight and require very few resources if compared to virtual machines. This makes LXC ideal for development and simulation purposes. Running Docker Swarm inside LXC requires a few steps that I’m going to show you in this tutorial.

Before we begin, a quick premise: LXC, Docker and Swarm can be configured in many different ways. Here I’m showing just my preferred setup: LXC with AppArmor disabled, Docker with the OverlayFS storage driver, Swarm with etcd discovery. There exist many other kind of configurations that can work under LXC — leave a comment if you want to know more.

Overview:

Create the Swarm Manager container
Modify configuration for the Swarm Manager container
Load the OverlayFS module
Start the container and install Docker
Check if Docker is working
Set up the Swarm Manager
Create the Swarm Agents
Play with the Swarm

Terminology:

the host is the system that will create and start the LXC containers (e.g. your laptop);
the manager is the LXC container that will run the Swarm manager (it’ll run the swarm manage command);
an agent is one of the many LXC containers that will run a Swarm agent node (it’ll run the swarm join command);

To avoid ambiguity, all commands will be prefixed with a prompt such as root@host:~#, root@swarm-manager:~# and root@swarm-agent-1:~#.

Prerequisites:

This tutorial assumes that you have at least a vague idea of what Docker and Docker Swarm are. You should also be familiar with the shell.

This tutorial has been successfully tested on Ubuntu 15.10 (that ships with Docker 1.6) and Ubuntu 16.04 LTS (Docker 1.10), but it may work on other distributions and Docker versions as well.

Step 1: Create the Swarm Manager container

Create a new LXC container with:

root@host:~# lxc-create -t download -n swarm-manager

When prompted, choose your favorite distribution and architecture. I chose ubuntu / xenial / amd64.

lxc-create needs to run as root, unprivileged containers won’t work. We could actually make Docker start inside an unprivileged container, the problem is that we wouldn’t be allowed to create block and character devices, and many Docker containers need this ability.

Step 2: Modify the configuration for the Swarm Manager container

Before starting the LXC container, open the file /var/lib/lxc/swarm-manager/config on the host and add the following configuration to the bottom of the file:

# Distribution configuration
# ...

# Container specific configuration
# ...

# Network configuration
# ...

# Allow running Docker inside LXC
lxc.aa_profile = unconfined
lxc.cap.drop =

The first rule (lxc.aa_profile = unconfined) disables AppArmor confinement. The second one (lxc.cap.drop =) gives all capabilities to the processes in LXC container.

These two rules may seem harmful from a security standpoint, and in fact they are. However we must remember that we will be running Docker inside the LXC container. Docker already ships with its own AppArmor profile and the two rules above are needed exactly for the purposes of letting Docker talk to AppArmor.

So, while Docker itself won’t be confined, Docker containers will be confined, and this is an encouraging fact.

Step 3: Load the OverlayFS module

OverlayFS is shipped with Ubuntu, but not enabled by default. To enable it:

root@host:~# modprobe overlay

It is important to do this step before installing Docker. Docker supports various storage drivers and when Docker is installed for the first time it tries to detect the most appropriate one for the system. If Docker detects that OverlayFS is not loaded, it’ll fall back to the device mapper. There’s nothing wrong with the device mapper, we can make it work, however, as I said at the beginning, in this tutorial I’m focusing only on OverlayFS.

If you want to load OverlayFS at boot, instead of doing it manually after every reboot, add it to /etc/modules-load.d/modules.conf:

root@host:~# echo overlay >> /etc/modules-load.d/modules.conf

Step 4: Start the container and install Docker

It’s time to see if we did everything right!

root@host:~# lxc-start -n swarm-manager
root@host:~# lxc-attach -n swarm-manager
root@swarm-manager:~# apt update
root@swarm-manager:~# apt install docker.io

Installation should complete without any problem. If you get an error like this:

Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
invoke-rc.d: initscript docker, action "start" failed.
dpkg: error processing package docker.io (--configure):
 subprocess installed post-installation script returned error exit status 1

It means that Docker failed to start. Try checking systemctl status docker as suggested, or run docker daemon manually. You might get an error like this:

root@swarm-manager:~# docker daemon
WARN[0000] devmapper: Udev sync is not supported. This will lead to unexpected behavior, data loss and errors. For more information, see https://docs.docker.com/reference/commandline/daemon/#daemon-storage-driver-option
ERRO[0000] There are no more loopback devices available.
ERRO[0000] [graphdriver] prior storage driver "devicemapper" failed: loopback attach failed
FATA[0000] Error starting daemon: error initializing graphdriver: loopback attach failed

In this case, Docker is using the devicemapper storage driver and is complaining about the lack of loopback devices. If that’s the case, check whether OverlayFS is loaded and reinstall Docker.

Or you might get an error like this:

root@swarm-manager:~# docker daemon
...
FATA[0000] Error starting daemon: AppArmor enabled on system but the docker-default profile could not be loaded.

It this other case, Docker is complaining about the fact that it can’t talk to AppArmor. Check the configuration for the LXC container.

Step 5: Check if Docker is working

Once you are all set, you should be able to use Docker: try running docker info, docker ps or launch a container:

root@swarm-manager:~# docker run --rm docker/whalesay cowsay burp!
Unable to find image 'docker/whalesay:latest' locally
latest: Pulling from docker/whalesay
...
Status: Downloaded newer image for docker/whalesay:latest
 _______
< burp! >
 -------
    \
     \
      \
                    ##        .
              ## ## ##       ==
           ## ## ## ##      ===
       /""""""""""""""""___/ ===
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~
       \______ o          __/
        \    \        __/
          \____\______/

It appears to be working. By the way, we can check whether Docker is correctly confining containers. Try running a Docker container and check on the host the output of aa-status: you should see a process running with the docker-default profile. For example:

root@swarm-manager:~# docker run --rm ubuntu bash -c 'while true; do sleep 1; echo -n zZ; done'
zZzZzZzZzZzZzZzZ...

# On another shell
root@host:~# aa-status
apparmor module is loaded.
5 profiles are loaded.
5 profiles are in enforce mode.
   /sbin/dhclient
   /usr/lib/NetworkManager/nm-dhcp-client.action
   /usr/lib/NetworkManager/nm-dhcp-helper
   /usr/lib/connman/scripts/dhclient-script
   docker-default
0 profiles are in complain mode.
4 processes have profiles defined.
4 processes are in enforce mode.
   /sbin/dhclient (797)
   /sbin/dhclient (2832)
   docker-default (6956)
   docker-default (6973)
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.

root@host:~# ps -ef | grep 6956
root      6956  4982  0 17:17 ?        00:00:00 bash -c while true; do sleep 1; echo -n zZ; done
root      6973  6956  0 17:17 ?        00:00:00 sleep 1
root      6982  6808  0 17:17 pts/3    00:00:00 grep --color=auto 6956

Yay! Everything is running as expected: we launched a process inside a Docker container, and that process is running with the docker-default AppArmor profile. Once again: even if LXC is running unconfined, our Docker containers are not.

Step 6: Set up the Swarm Manager

That was the hardest part. Now we can proceed setting up Swarm as we would usually do.

As I said at the beginning, Swarm can be configured in many ways. In this tutorial I’ll show how to set it up with etcd discovery. First of all, we need the IP address of the LXC container:

root@swarm-manager:~# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:16:3e:8e:cb:43
          inet addr:10.0.3.154  Bcast:10.0.3.255  Mask:255.255.255.0
          inet6 addr: fe80::216:3eff:fe8e:cb43/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:23177 errors:0 dropped:0 overruns:0 frame:0
          TX packets:20859 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:147652946 (147.6 MB)  TX bytes:1455613 (1.4 MB)

10.0.3.154 is my IP address. Let’s start etcd:

root@swarm-manager:~# SWARM_MANAGER_IP=10.0.3.154

root@swarm-manager:~# docker run -d --restart=always --name=etcd -p 4001:4001 -p 2380:2380 -p 2379:2379 \
                            quay.io/coreos/etcd -name etcd0 \
                            -advertise-client-urls http://$SWARM_MANAGER_IP:2379,http://$SWARM_MANAGER_IP:4001 \
                            -listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
                            -initial-advertise-peer-urls http://$SWARM_MANAGER_IP:2380 \
                            -listen-peer-urls http://0.0.0.0:2380 \
                            -initial-cluster-token etcd-cluster-1 \
                            -initial-cluster etcd0=http://$SWARM_MANAGER_IP:2380 \
                            -initial-cluster-state new
Unable to find image 'quay.io/coreos/etcd:latest' locally
latest: Pulling from coreos/etcd
...
Status: Downloaded newer image for quay.io/coreos/etcd:latest
e742278a97d2ad3f88658aa871903d20b4094e551969a03aa8332d3876fe5d0d

root@swarm-manager:~# docker ps
CONTAINER ID        IMAGE                 COMMAND                  CREATED             STATUS              PORTS                                                                NAMES
e742278a97d2        quay.io/coreos/etcd   "/etcd -name etcd0 -a"   32 seconds ago      Up 31 seconds       0.0.0.0:2379-2380->2379-2380/tcp, 0.0.0.0:4001->4001/tcp, 7001/tcp   etcd

Replace 10.0.3.154 with the IP address of your LXC container.

Note that I’ve started etcd with --restart=always, so that every time etcd is automatically started when the LXC container starts. With this option, etcd will restart even if you explicitly stop it. Drop --restart=always if that’s not what you want.

Now we can start the Swarm manager:

root@swarm-manager:~# docker run -d --restart=always --name=swarm -p 3375:3375 \
                            swarm manage -H 0.0.0.0:3375 etcd://$SWARM_MANAGER_IP:2379
Unable to find image 'swarm:latest' locally
latest: Pulling from library/swarm
...
Status: Downloaded newer image for swarm:latest
8080c93c544ff92cc2cf682ff0bbc82e0d2dfb01e1f98f202c3a0801d3427330

root@swarm-manager:~# docker ps
CONTAINER ID        IMAGE                 COMMAND                  CREATED             STATUS              PORTS                                                                NAMES
46b556e73e87        swarm                 "/swarm manage -H 0.0"   3 seconds ago       Up 2 seconds        2375/tcp, 0.0.0.0:3375->3375/tcp                                     swarm
e742278a97d2        quay.io/coreos/etcd   "/etcd -name etcd0 -a"   7 minutes ago       Up 7 minutes        0.0.0.0:2379-2380->2379-2380/tcp, 0.0.0.0:4001->4001/tcp, 7001/tcp   etcd

Our Swarm manager is up and running. We can connect to it and issue a few commands:

root@swarm-manager:~# docker -H localhost:3375 info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: swarm/1.1.3
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 0
Plugins:
 Volume:
 Network:
Kernel Version: 4.4.0-15-generic
Operating System: linux
Architecture: amd64
CPUs: 0
Total Memory: 0 B
Name: d39c33295ef3

As you can see there are no nodes connected, as we would expect. Everything looks good.

Step 7: Create the Swarm Agents

Our Swarm manager can’t do anything interesting without agent nodes. Creating new LXC containers for the agents is not much different from what we already did with the manager. To set up new agents in an automatic fashion I’ve created a script, so that you don’t need to repeat the steps manually:

#!/bin/bash

set -eu

SWARM_MANAGER_IP=10.0.3.154
DOWNLOAD_DIST=ubuntu
DOWNLOAD_RELEASE=xenial
DOWNLOAD_ARCH=amd64

for LXC_NAME in "$@"
do
    LXC_PATH="/var/lib/lxc/$LXC_NAME"
    LXC_ROOTFS="$LXC_PATH/rootfs"

    # Create the container.
    lxc-create -t download -n "$LXC_NAME" -- \
        -d "$DOWNLOAD_DIST" -r "$DOWNLOAD_RELEASE" -a "$DOWNLOAD_ARCH"

    cat <<EOF >> "$LXC_PATH/config"
# Allow running Docker inside LXC
lxc.aa_profile = unconfined
lxc.cap.drop =
EOF

    # Start the container and wait for networking to start.
    lxc-start -n "$LXC_NAME"
    sleep 10s

    # Install Docker.
    lxc-attach -n "$LXC_NAME" -- apt-get update
    lxc-attach -n "$LXC_NAME" -- apt-get install -y docker.io

    # Tell Docker to listen on all interfaces.
    sed -i -e 's/^#DOCKER_OPTS=.*$/DOCKER_OPTS="-H 0.0.0.0:2375"/' "$LXC_ROOTFS/etc/default/docker"
    lxc-attach -n "$LXC_NAME" -- systemctl restart docker

    # Join the Swarm.
    SWARM_AGENT_IP="$(lxc-attach -n "$LXC_NAME" -- ifconfig eth0 | grep -Po '(?<=inet addr:)\S+')"
    lxc-attach -n "$LXC_NAME" -- docker run -d --restart=always --name=swarm \
        swarm join --addr="$SWARM_AGENT_IP:2375" "etcd://$SWARM_MANAGER_IP:2379"
done

Be sure to change the values for SWARM_MANAGER_IP, DOWNLOAD_DIST, DOWNLOAD_RELEASE and DOWNLOAD_ARCH to fit your needs.

Thanks to this script, creating 10 new agents is as simple as running one command:

root@host:~# ./swarm-agent-create swarm-agent-{0..9}

Here’s an explanation of what the script does:

It first sets up a new LXC container following steps 1-5 above, that is: create a new LXC container (with lxc-create), apply the LXC configuration (lxc.aa_profile and lxc.cap.drop rules), start the container and install Docker.

LXC_PATH="/var/lib/lxc/$LXC_NAME"
LXC_ROOTFS="$LXC_PATH/rootfs"

# Create the container.
lxc-create -t download -n "$LXC_NAME" -- \
    -d "$DOWNLOAD_DIST" -r "$DOWNLOAD_RELEASE" -a "$DOWNLOAD_ARCH"

cat <<EOF >> "$LXC_PATH/config"
# Allow running Docker inside LXC
lxc.aa_profile = unconfined
lxc.cap.drop =
EOF

# Start the container and wait for networking to start.
lxc-start -n "$LXC_NAME"
sleep 10s

# Install Docker.
lxc-attach -n "$LXC_NAME" -- apt-get update
lxc-attach -n "$LXC_NAME" -- apt-get install -y docker.io

Our Swarm agents need to be reachable by the manager. For this reason we need to configure them so that they bind to a public interface. To do so, the script adds DOCKER_OPTS="-H 0.0.0.0:2375" and restarts Docker.
```
# Tell Docker to listen on all interfaces.
sed -i -e 's/^#DOCKER_OPTS=.*$/DOCKER_OPTS="-H 0.0.0.0:2375"/' "$LXC_ROOTFS/etc/default/docker"
lxc-attach -n "$LXC_NAME" -- systemctl restart docker
```

Lastly, the script checks the IP address for the LXC container and it launches Swarm.

# Join the Swarm.
SWARM_AGENT_IP="$(lxc-attach -n "$LXC_NAME" -- ifconfig eth0 | grep -Po '(?<=inet addr:)\S+')"
lxc-attach -n "$LXC_NAME" -- docker run -d --restart=always --name=swarm \
    swarm join --addr="$SWARM_AGENT_IP:2375" "etcd://$SWARM_MANAGER_IP:2379"

Step 8: Play with the Swarm

Now, if we check docker info on the Swarm manager, we should see 10 healthy nodes:

root@swarm-manager:~# docker -H localhost:3375 info
Containers: 10
 Running: 10
 Paused: 0
 Stopped: 0
Images: 10
Server Version: swarm/1.1.3
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 10
 swarm-agent-0: 10.0.3.73:2375
  └ Status: Healthy
  └ Containers: 1
  └ Reserved CPUs: 0 / 4
  └ Reserved Memory: 0 B / 4.052 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay
  └ Error: (none)
  └ UpdatedAt: 2016-04-13T15:32:35Z
 swarm-agent-1: 10.0.3.97:2375
  └ Status: Healthy
  └ Containers: 1
  └ Reserved CPUs: 0 / 4
  └ Reserved Memory: 0 B / 4.052 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay
  └ Error: (none)
  └ UpdatedAt: 2016-04-13T15:31:49Z
 swarm-agent-2: 10.0.3.58:2375
  └ Status: Healthy
  └ Containers: 1
  └ Reserved CPUs: 0 / 4
  └ Reserved Memory: 0 B / 4.052 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay
  └ Error: (none)
  └ UpdatedAt: 2016-04-13T15:31:54Z
 swarm-agent-3: 10.0.3.195:2375
  └ Status: Healthy
  └ Containers: 1
  └ Reserved CPUs: 0 / 4
  └ Reserved Memory: 0 B / 4.052 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay
  └ Error: (none)
  └ UpdatedAt: 2016-04-13T15:32:03Z
 swarm-agent-4: 10.0.3.235:2375
  └ Status: Healthy
  └ Containers: 1
  └ Reserved CPUs: 0 / 4
  └ Reserved Memory: 0 B / 4.052 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay
  └ Error: (none)
  └ UpdatedAt: 2016-04-13T15:32:22Z
 swarm-agent-5: 10.0.3.174:2375
  └ Status: Healthy
  └ Containers: 1
  └ Reserved CPUs: 0 / 4
  └ Reserved Memory: 0 B / 4.052 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay
  └ Error: (none)
  └ UpdatedAt: 2016-04-13T15:32:16Z
 swarm-agent-6: 10.0.3.222:2375
  └ Status: Healthy
  └ Containers: 1
  └ Reserved CPUs: 0 / 4
  └ Reserved Memory: 0 B / 4.052 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay
  └ Error: (none)
  └ UpdatedAt: 2016-04-13T15:32:21Z
 swarm-agent-7: 10.0.3.140:2375
  └ Status: Healthy
  └ Containers: 1
  └ Reserved CPUs: 0 / 4
  └ Reserved Memory: 0 B / 4.052 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay
  └ Error: (none)
  └ UpdatedAt: 2016-04-13T15:31:43Z
 swarm-agent-8: 10.0.3.95:2375
  └ Status: Healthy
  └ Containers: 1
  └ Reserved CPUs: 0 / 4
  └ Reserved Memory: 0 B / 4.052 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay
  └ Error: (none)
  └ UpdatedAt: 2016-04-13T15:32:17Z
 swarm-agent-9: 10.0.3.125:2375
  └ Status: Healthy
  └ Containers: 1
  └ Reserved CPUs: 0 / 4
  └ Reserved Memory: 0 B / 4.052 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay
  └ Error: (none)
  └ UpdatedAt: 2016-04-13T15:32:30Z
Plugins:
 Volume:
 Network:
Kernel Version: 4.4.0-15-generic
Operating System: linux
Architecture: amd64
CPUs: 40
Total Memory: 40.52 GiB
Name: d39c33295ef3

Let’s try running a command on the Swarm:

root@swarm-manager:~# docker -H localhost:3375 run -i --rm docker/whalesay cowsay 'It works!'
 ___________
< It works! >
 -----------
    \
     \
      \
                    ##        .
              ## ## ##       ==
           ## ## ## ##      ===
       /""""""""""""""""___/ ===
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~
       \______ o          __/
        \    \        __/
          \____\______/

Conclusion

We created a Swarm cluster consisting of one manager and 10 agents, and we kept memory and disk usage low thanks to LXC containers. We also succeeded in confining our Docker containers with AppArmor. Overall, this setup is probably not ideal for use in a production environment, but very useful for simulating clusters on your laptop.

I hope you enjoyed the tutorial. Feel free to leave a comment if you have questions!

When bureaucracy hits the web: the cookie law

andreacorbellini — Tue, 22 Sep 2015 18:35:00 +0000

For a few years now, every first of April I hoped to read between the news something on the lines of “the cookie law was a joke, sorry for that”. You know, bureaucracy is slow, and it’s reasonable to think that it takes time for them to reveal jokes. Yet, many firsts of April have passed, and no such announcement has been made. Many missed opportunities for Europe to show their love for progress and their competence with the web.

Being compliant with the EU cookie law is hard to do. It’s not just a matter of showing a boring banner, it’s a matter of defacing your web pages, writing long privacy policies that nobody will read, implementing ways to prevent certain cookies from being set.

The truth is: if you, as a webmaster, want to avoid wasting time and avoid headaches, you just have to avoid cookies. This is what I have done with most websites I maintain: I have removed all analytics, all social sharing buttons, all YouTube videos, all comments. This was a sad thing to do, but it was the only thing I could do: I maintain websites for free mainly as a favor for friends and no-profits I’m involved with — it’s not my day job. Also, I do not want other people being sued because of mistakes from my side: cookies may be set in the most unexpected situations and disabling every feature that could potentially set them seems the safest choice.

The only exception is this blog. Here, I use cookies for Google Analytics, for social sharing buttons and for Disqus. I may live without Google Analytics (even though it gives useful insights, such as performance statistics and tips), but I can’t really remove social buttons and Disqus: this is a blog and it wouldn’t make any sense to remove social features and comments.

Being compliant with the EU cookie law has been on my todo list for a while, and I never found the time (nor the desire) to look into it. Today I did. I spent a few hours of my time to discover that Google Analytics is “OK” (in the sense that I do not have to display an ugly banner, nor have to ask for explicit permission from the user before setting the cookies) and to discover that social buttons and Disqus are “bad” (in the sense that I have to display a banner and ask for explicit consent from the user before setting the cookies). In the end, the only service that I could remove is the less problematic service.

As I said, I really do not want to remove social buttons, Disqus or whatever third-party content I’ll want to display in the future. Therefore, in order to comply with the cookie law, I’m forced to write code, write a privacy policy, waste another bunch of hours of my time. But not today, as I’ve already had enough sense of sadness and impotence.

At least for now, I guess that the EU cookie law compliance will stay on my todo list for some more time. Probably if I worked on compliance instead of writing this rant, I could have already finished (but then what’s the point of having a blog if you don’t blog?)

The cookie law wants to be “on the side of the users,” and it is based on noble principles: it wants users to be well-informed about how their data is used and by whom. However, as it is today, it’s against both users and webmasters. Webmasters have to lose their time working on compliance, and users receive a degraded experience due to silly regulations.

I’d like to do what Silktide did: actively protesting against the law, but I wouldn’t be so happy if I were sued. I’d like to read “the cookie law was a joke” in the news, but I’m starting to believe that it’s not going to happen any time soon. It seems that accepting the sadness of the reality is the only option I’m left with.

End of rant, let’s move on.

Hello Pelican!

andreacorbellini — Sun, 02 Aug 2015 18:55:00 +0000

Today I switched from WordPress.com to Pelican and GitHub Pages.

First off, let me say: almost all URLs that were previously working should still work. Only the feed URLs are broken, and this is not something I can fix. If you were following my blog via a feed reader, you should update to the new feed. Sorry for the inconvenience.

Having said that, I’d like to share with you the motivation that made me move and the details of the migration.

The bad things of WordPress

Now, this doesn’t want to be a rant, so I’ll be pretty concise. WordPress, the content management system, is an excellent platform for blogging. Easy to start with, easy to maintain, easy to use. WordPress.com makes things even easier. It also comes with many useful features, like comments and social networks integration.

The problem is: you can’t customize things or add features without paying. Of course, this is business, and I do not want to discuss business decisions made at WordPress.com. Not only that, but I could live fine with most of the major limitations. Also, I was perfectly conscious of this kind of problems with WordPress.com when I started (after all, this is not the first blog I started).

I actually become upset of WordPress.com when writing the series of blog posts about Elliptic Curve Cryptography. When writing these articles, I spent a lot of time employing workarounds to overcome WordPress.com limitations. Being used to Vim and its advanced features, I also found the editors (both the old and the new one) as a great obstacle for getting things done quickly. I do not want to enter the details of the problems I’m referring to, what matters is that, eventually, I gave up and I realized it was time to move on and seek for an alternative.

Why Pelican

Pelican is a static site generator. I’ve always thought that a static site had too many limitations for me. But while seeking an alternative to WordPress.com, I realized that many of those limitations were not affecting me in any way. Actually, with a static site I can do everything I want: edit my articles with Vim, render my equations with MathJax, customize my theme, version control my content, write scripts to post process my content.

The only bad thing about Pelican is that it does not come with any theme I truly like. I decided to make my own. I’m not entirely satisfied with it, as I feel it is too “anonymous”, but I believe it is fully responsive, fast, readable and offers all the features I want. Perhaps I’ll tweak it a little more to make it more “personal”.

Setting up Pelican and migrating everything required some time, but at least this time I worked on true solutions, not on ugly hacks and workarounds like I did with WordPress. This implies that when writing articles I will be able to focus more on content than other details.

Why not other static site generators

In short: Pelican is written in Python and to my eyes it looked better than the other Python static site generators. I’ll be honest and say that I did not truly evaluate all of the alternatives: I knew list.org switched to Pelican and that made me try Pelican before all other solutions.

Conclusion

In the end I decided to leave WordPress for Pelican hosted on GitHub Pages. I’m pretty satisfied with the result I got. The nature of GitHub Pages prevents me from using HTTP redirects (and therefore the old feed links are broken), however in exchange I’ve got much more freedom, and this is what matters to me.

Let's Encrypt is going to start soon

andreacorbellini — Tue, 16 Jun 2015 18:20:00 +0000

Let’s Encrypt (the free, automated and open certificate authority) has just announced its launch schedule. According to it, certificates will be released to the public starting from the week of September 14, 2015.

Their intermediate certificates, which were generated a few days ago, will be signed by IdenTrust. What this means is that if you browse a web page secured by Let’s Encrypt, you won’t get any scary message, but the usual green lock.

You will see this...

... not this.

In case you are curious: the root certificate is a 4096-bit RSA key, the two intermediate certificates are both 2048-bit RSA keys. But they are also planning to generate ECDSA keys later this year as well.

Technical aspects aside, this will be a great opportunity for the entire web. As I have already written, I always dreamed of an encrypted web, and I truly believe that Let’s Encrypt — or at least its approach to the problem — is the way to go.

So, will you get a Let’s Encrypt certificate when the time comes? I will do. Not for this blog (I can’t put a certificate without paying), but for other websites I manage.

Perhaps I’ll also show a “Proudly secured by Let’s Encrypt” badge.

Elliptic Curve Cryptography: breaking security and a comparison with RSA

andreacorbellini — Mon, 08 Jun 2015 13:28:00 +0000

This post is the fourth and last in the series ECC: a gentle introduction.

In the last post we have seen two algorithms, ECDH and ECDSA, and we have seen how the discrete logarithm problem for elliptic curves plays an important role for their security. But, if you remember, we said that we have no mathematical proofs for the complexity of the discrete logarithm problem: we believe it to be “hard”, but we can’t be sure. In the first part of this post, we’ll try to get an idea of how “hard” it is in practice with today’s techniques.

Then, in the second part, we will try to answer the question: why do we need elliptic curve cryptography if RSA (and the other cryptosystems based on modular arithmetic) work well?

Breaking the discrete logarithm problem

We will now see the two most efficient algorithms for computing discrete logarithms on elliptic curve: the baby-step, giant-step algorithm, and Pollard’s rho method.

Before starting, as a reminder, here is what the discrete logarithm problem is about: given two points $P$ and $Q$ find out the integer $x$ that satisfies the equation $Q = xP$. The points belong to a subgroup of an elliptic curve, which has a base point $G$ and which order is $n$.

Baby-step, giant-step

Before entering the details of the algorithm, a quick consideration: we can always write any integer $x$ as $x = am + b$, where $a$, $m$ and $b$ are three arbitrary integers. For example, we can write $10 = 2 \cdot 3 + 4$.

With this in mind, we can rewrite the equation for the discrete logarithm problem as follows: $$\begin{align*} Q & = xP \\ Q & = (am + b) P \\ Q & = am P + b P \\ Q - am P & = b P \end{align*}$$

The baby-step giant-step is a “meet in the middle” algorithm. Contrary to the brute-force attack (which forces us to calculate all the points $xP$ for every $x$ until we find $Q$), we will calculate “few” values for $bP$ and “few” values for $Q - amP$ until we find a correspondence. The algorithm works as follows:

Calculate $m = \left\lceil{\sqrt{n}}\right\rceil$
For every $b$ in ${0, \dots, m}$, calculate $bP$ and store the results in a hash table.
For every $a$ in ${0, \dots, m}$:
1. calculate $amP$;
2. calculate $Q - amP$;
3. check the hash table and look if there exist a point $bP$ such that $Q - amP = bP$;
4. if such point exists, then we have found $x = am + b$.

As you can see, initially we calculate the points $bP$ with little (i.e. “baby”) increments for the coefficient $b$ ($1P$, $2P$, $3P$, …). Then, in the second part of the algorithm, we calculate the points $amP$ with huge (i.e. “giant”) increments for $am$ ($1mP$, $2mP$, $3mP$, …, where $m$ is a huge number).

The baby-step, giant-step algorithm: initially we calculate few points via small steps and store them in a hash table. Then we perform the giant steps and compare the new points with the points in the hash table. Once a match is found, calculating the discrete logarithm is a matter of rearranging terms.

To understand why this algorithm works, forget for a moment that the points $bP$ are cached and take the equation $Q = amP + bP$. Consider what follows:

When $a = 0$ we are checking whether $Q$ is equal to $bP$, where $b$ is one of the integers from 0 to $m$. This way, we are comparing $Q$ against all points from $0P$ to $mP$.
When $a = 1$ we are checking whether $Q$ is equal to $mP + bP$. We are comparing $Q$ against all points from $mP$ to $2mP$.
When $a = 2$ we are comparing $Q$ against all the points from $2mP$ to $3mP$.
…
When $a = m - 1$, we are comparing $Q$ against all points from $(m - 1)mP$ to $m^2 P = nP$.

In conclusion, we are checking all points from $0P$ to $nP$ (that is, all the possible points) performing at most $2m$ additions and multiplications (exactly $m$ for the baby steps, at most $m$ for the giant steps).

If you consider that a lookup on a hash table takes $O(1)$ time, it’s easy to see that this algorithm has both time and space complexity $O(\sqrt{n})$ (or $O(2^{k / 2})$ if you consider the bit length). It’s still exponential time, but much better than a brute-force attack.

Baby-step giant-step in practice

It may make sense to see what the complexity $O(\sqrt{n})$ means in practice. Let’s take a standardized curve: prime192v1 (aka secp192r1, ansiX9p192r1). This curve has order $n$ = 0xffffffff ffffffff ffffffff 99def836 146bc9b1 b4d22831. The square root of $n$ is approximately 7.922816251426434 · 10²⁸ (almost eighty octillions).

Now imagine storing $\sqrt{n}$ points in a hash table. Suppose that each point requires exactly 32 bytes: our hash table would need approximately 2.5 · 10³⁰ bytes of memory. Looking on the web, it seems that the total world storage capacity is in the order of the zettabyte (10²¹ bytes). This is almost ten orders of magnitude lower than the memory required by our hash table! Even if our points took 1 byte each, we would be still very far from being able to store all of them.

This is impressive, and is even more impressive if you consider that prime192v1 is one of the curves with the lowest order. The order of secp521r1 (another standard curve from NIST) is approximately 6.9 · 10¹⁵⁶!

Playing with baby-step giant-step

I made a Python script that computes discrete logarithms using the baby-step giant-step algorithm. Obviously it only works with curves with small orders: don’t try it with secp521r1, unless you want to receive a MemoryError.

It should produce an output like this:

Curve: y^2 = (x^3 + 1x - 1) mod 10177
Curve order: 10331
p = (0x1, 0x1)
q = (0x1a28, 0x8fb)
325 * p = q
log(p, q) = 325
Took 105 steps

Pollard’s ρ

Pollard’s rho is another algorithm for computing discrete logarithms. It has the same asymptotic time complexity $O(\sqrt{n})$ of the baby-step giant-step algorithm, but its space complexity is just $O(1)$. If baby-step giant-step can’t solve discrete logarithms because of the huge memory requirements, will Pollard’s rho make it? Let’s see…

First of all, another reminder of the discrete logarithm problem: given $P$ and $Q$ find $x$ such that $Q = xP$. With Pollard’s rho, we will solve a sightly different problem: given $P$ and $Q$, find the integers $a$, $b$, $A$ and $B$ such that $aP + bQ = AP + BQ$.

Once the four integers are found, we can use the equation $Q = xP$ to find out $x$: $$\begin{align*} aP + bQ & = AP + BQ \\ aP + bxP & = AP + BxP \\ (a + bx) P & = (A + Bx) P \\ (a - A) P & = (B - b) xP \end{align*}$$

Now we can get rid of $P$. But before doing so, remember that our subgroup is cyclic with order $n$, therefore the coefficients used in point multiplication are modulo $n$: $$\begin{align*} a - A & \equiv (B - b) x \pmod{n} \\ x & = (a - A)(B - b)^{-1} \bmod{n} \end{align*}$$

The principle of operation of Pollard’s rho is simple: we generate a pseudo-random sequence of points $X_1$, $X_2$, … where each $X = a_i P + b_i Q$. The sequence can be generated using a pseudo-random function $f$ like this: $$(a_{i + 1}, b_{i + 1}) = f(X_i)$$

That is: the pseudo-random function $f$ takes the latest point $X_i$ in the sequence as the input, and gives the coefficients $a_{i + 1}$ and $b_{i + 1}$ as the output. From there, we can calculate $X_{i + 1} = a_{i + 1} P + b_{i + 1} Q$; we can then input $X_{i + 1}$ into $f$ again and repeat.

It doesn’t really matter how $f$ works internally (although certain functions may yield results faster than others), what matters is that $f$ determines the next point in the sequence based on the previous one, and that all the $a_i$ and $b_i$ coefficients are known by us.

By using such $f$, sooner or later we will see a loop in our sequence. That is, we will see a point $X_j = X_i$.

A visualization of what a cycle in the sequence might look like: have some initial points ($X_0$, $X_1$, $X_2$), and then the cycle itself, formed by the points $X_3$ to $X_8$. After that, $X_9 = X_3$, $X_{10} = X_4$ and so on.
This picture resembles the Greek letter ρ (rho), hence the name.

The reason why we must see the cycle is simple: the number of points is finite, hence they must repeat sooner or later. Once we see where the cycle is, we can use the equations above to figure out the discrete logarithm.

The problem now is: how do we detect the cycle in an efficient way?

Tortoise and Hare

To detect cycles, we have an efficient method: the tortoise and hare algorithm (also known as Floyd’s cycle-finding algorithm). The picture below shows the principle of operation of the tortoise and hare method, which is at the core of Pollard’s rho.

We have the curve $y^2 \equiv x^3 + 2x + 3 \pmod{97}$ and the points $P = (3, 6)$ and $Q = (80, 87)$. The points belong to a cyclic subgroup of order 5.
We walk a sequence of pairs at different speeds until we find two different pairs $(a, b)$ and $(A, B)$ that produce the same point. In this case, we have found the pairs $(3, 3)$ and $(2, 0)$ that allow us to calculate the logarithm as $x = (3 - 2)(0 - 3)^{-1} \bmod{5} = 3$. And in fact we correctly have $Q = 3P$.

We take two pets, the tortoise and the hare, and make them walk our sequence of points from left to right. The tortoise (the green spot in the picture) is slow and reads each point one by one; the hare (represented in red) is fast and skips a point at every step.

After some time both the tortoise and the hare will have found the same point, but with different coefficient pairs. Or, to express that with equations, the tortoise will have found a pair $(a, b)$ and the hare will have found a pair $(A, B)$ such that $aP + bQ = AP + BQ$.

It’s easy to see that this algorithm requires constant memory ($O(1)$ space complexity). Calculating the asymptotic time complexity is not that easy, but we can build a probabilistic proof that shows how the time complexity is $O(\sqrt{n})$, as we have already said. The proof is based on the “birthday paradox”, which is about the probability of two people having the same birthday, where here we are concerned about the probability of two $(a, b)$ pairs yielding the same point.

Playing with Pollard’s ρ

I’ve built a Python script that computes discrete logarithms using Pollard’s rho. It is not the implementation of the original Pollard’s rho, but a slight variation of it (I’ve used a more efficient method for generating the pseudo-random sequence of pairs). The script contains some useful comments, so read it if you are interested in the details of the algorithm.

This script, like the baby-step giant-step one, works on a tiny curve, and produces the same kind of output.

Pollard’s ρ in practice

We said that baby-step giant-step can’t be used in practice, because of the huge memory requirements. Pollard’s rho, on the other hand, requires very few memory. So, how practical is it?

Certicom launched a challenge in 1998 to compute discrete logarithms on elliptic curves with bit lengths ranging from 109 to 359. As of today, only 109-bit long curves have been successfully broken. The latest successful attempt was made in 2004. Quoting Wikipedia:

The prize was awarded on 8 April 2004 to a group of about 2600 people represented by Chris Monico. They also used a version of a parallelized Pollard rho method, taking 17 months of calendar time.

As we have already said, prime192v1 is one of the “smallest” elliptic curves. We also said that Pollard’s rho has $O(\sqrt{n})$ time complexity. If we used the same technique as Chris Monico (the same algorithm, on the same hardware, with the same number of machines), how much would it take to compute a logarithm on prime192v1? $$17\ \text{months}\ \times \frac{\sqrt{2^{192}}}{\sqrt{2^{109}}} \approx 5 \cdot 10^{13}\ \text{months}$$

This number is pretty self-explanatory and gives a clear idea of how hard it can be to break a discrete logarithm using such techniques.

Pollard’s ρ vs Baby-step giant-step

I decided to put the baby-step giant-step script and the Pollard’s rho script together with a brute-force script into a fourth script to compare their performances.

This fourth script computes all the logarithms for all the points on the “tiny” curve using different algorithms and reports how much time it did take:

Curve order: 10331
Using bruteforce
Computing all logarithms: 100.00% done
Took 2m 31s (5193 steps on average)
Using babygiantstep
Computing all logarithms: 100.00% done
Took 0m 6s (152 steps on average)
Using pollardsrho
Computing all logarithms: 100.00% done
Took 0m 21s (138 steps on average)

As we could expect, the brute-force method is tremendously slow if compared to the others two. Baby-step giant-step is the faster, while Pollard’s rho is more than three times slower than baby-step giant-step (although it uses far less memory and fewer number of steps on average).

Also look at the number of steps: brute force used 5193 steps on average for computing each logarithm. 5193 is very near to 10331 / 2 (half the curve order). Baby-step giant-steps and Pollard’s rho used 152 steps and 138 steps respectively, two numbers very close to the square root of 10331 (101.64).

Final consideration

While discussing these algorithms, I have presented many numbers. It’s important to be cautious when reading them: algorithms can be greatly optimized in many ways. Hardware can improve. Specialized hardware can be built.

The fact that an approach today seems impractical, does not imply that the approach can’t be improved. It also does not imply that other, better approaches exist (remember, once again, that we have no proofs for the complexity of the discrete logarithm problem).

Shor’s algorithm

If today’s techniques are unsuitable, what about tomorrow’s techniques? Well, things are a bit more worrisome: there exist a quantum algorithm capable of computing discrete logarithms in polynomial time: Shor’s algorithm, which has time complexity $O((\log n)^3)$ and space complexity $O(\log n)$.

Quantum computers are still far from becoming sophisticated enough to run algorithms like Shor’s, still the need for quantum-resistant algorithms may be something worth investigating now. What we encrypt today might not be safe tomorrow.

ECC and RSA

Now let’s forget about quantum computing, which is still far from being a serious problem. The question I’ll answer now is: why bothering with elliptic curves if RSA works well?

A quick answer is given by NIST, which provides with a table that compares RSA and ECC key sizes required to achieve the same level of security.

RSA key size (bits)	ECC key size (bits)
1024	160
2048	224
3072	256
7680	384
15360	521

Note that there is no linear relationship between the RSA key sizes and the ECC key sizes (in other words: if we double the RSA key size, we don’t have to double the ECC key size). This table tells us not only that ECC uses less memory, but also that key generation and signing are considerably faster.

But why is it so? The answer is that the faster algorithms for computing discrete logarithms over elliptic curves are Pollard’s rho and baby-step giant-step, while in the case of RSA we have faster algorithms. One in particular is the general number field sieve: an algorithm for integer factorization that can be used to compute discrete logarithms. The general number field sieve is the fastest algorithm for integer factorization to date.

All of this applies to other cryptosystems based on modular arithmetic as well, including DSA, D-H and ElGamal.

Hidden threats of NSA

An now the hard part. So far we have discussed algorithms and mathematics. Now it’s time to discuss people, and things get more complicated.

If you remember, in the last post we said that certain classes of elliptic curves are weak, and to solve the problem of trusting curves from dubious sources we added a random seed to our domain parameters. And if we look at standard curves from NIST we can see that they are all verifiably random.

If we read the Wikipedia page for “nothing up my sleeve”, we can see that:

The random numbers for MD5 come from the sine of integers.
The random numbers for Blowfish come from the first digits of $\pi$.
The random numbers for RC5 come from both $e$ and the golden ratio.

These numbers are random because their digits are uniformly distributed. And they are also unsuspicious, because they have a justification.

Now the question is: where do the random seeds for NIST curves come from? The answer is, sadly: we don’t know. Those seeds have no justification at all.

Is it possible that NIST has discovered a “sufficiently large” class of weak elliptic curves and has tried many possible seeds until they found a vulnerable curve? I can’t answer this question, but this is a legit and important question. We know that NIST has succeeded in standardizing at least a vulnerable random number generator (a generator which, oddly enough, is based on elliptic curves). Perhaps they also succeeded in standardizing a set of weak elliptic curves. How do we know? We can’t.

What’s important to understand is that “verifiably random” and “secure” are not synonyms. And it doesn’t matter how hard the logarithm problem is, or how long our keys are, if our algorithms are broken, there’s nothing we can do.

With respect to this, RSA wins, as it does not require special domain parameters that can be tampered. RSA (as well as other modular arithmetic systems) may be a good alternative if we can’t trust authorities and if we can’t construct our own domain parameters. And in case you are asking: yes, TLS may use NIST curves. If you check https://google.com, you’ll see that the connection is using ECDHE and ECDSA, with a certificate based on prime256v1 (aka secp256p1).

That’s all!

I hope you have enjoyed this series. My aim was to give you the basic knowledge, terminology and conventions to understand what elliptic curve cryptography today is. If I reached my aim, you should now be able to understand existing ECC-based cryptosystems and to expand your knowledge by reading “not so gentle” documentation. When writing this series, I could have skipped over many details and use a simpler terminology, but I felt that by doing so you would have not been able to understand what the web has to offer. I believe I have found a good compromise between simplicity and completeness.

Note though that by reading just this series, you are not able to implement secure ECC cryptosystems: security requires us to know many subtle but important details. Remember the requirements for Smart’s attack and Sony’s mistake — these are just two examples that should teach you how easy is to produce insecure algorithms and how easy it is to exploit them.

So, if you are interested in diving deeper into the world of ECC, where to go from here?

First off, so far we have seen Weierstrass curves over prime fields, but you must know that there exist other kinds of curve and fields, in particular:

Koblitz curves over binary fields. Those are elliptic curves in the form $y^2 + xy = x^3 + ax^2 + 1$ (where $a$ is either 0 or 1) over finite fields containing $2^m$ elements (where $m$ is a prime). They allow particularly efficient point additions and scalar multiplications. Examples of standardized Koblitz curves are nistk163, nistk283 and nistk571 (three curves defined over a field of 163, 283 and 571 bits).
Binary curves. They are very similar to Koblitz curves and are in the form $x^2 + xy = x^3 + x^2 + b$ (where $b$ is an integer often generated from a random seed). As the name suggests, binary curves are restricted to binary fields too. Examples of standardized curves are nistb163, nistb283 and nistb571. It must be said that there are growing concerns that both Koblitz and Binary curves may not be as safe as prime curves.
Edwards curves, in the form $x^2 + y^2 = 1 + d x^2 y^2$ (where $d$ is either 0 or 1). These are particularly interesting not only because point addition and scalar multiplication are fast, but also because the formula for point addition is always the same, in any case ($P \ne Q$, $P = Q$, $P = -Q$, …). This feature leverages the possibility of side-channel attacks, where you measure the time used for scalar multiplication and try to guess the scalar coefficient based on the time it took to compute. Edwards curves are relatively new (they were presented in 2007) and no authority such as Certicom or NIST have yet standardized any of them.
Curve25519 and Ed25519 are two particular elliptic curves designed for ECDH and a variant of ECDSA respectively. Like Edwards curves, these two curves are fast and help preventing side-channel attacks. And like Edwards curves, these two curves have not been standardized yet and we can’t find them in any popular software (except OpenSSH, that supports Ed25519 key pairs since 2014).

If you are interested in the implementation details of ECC, then I suggest you read the sources of OpenSSL and GnuTLS.

Finally, if you are interested in the mathematical details, rather than the security and efficiency of the algorithms, you must know that:

Elliptic curves are algebraic varieties with genus one.
Points at infinity are studied in projective geometry and can be represented using homogeneous coordinates (although most of the features of projective geometry are not needed for elliptic curve cryptography).

And don’t forget to study finite fields and field theory.

These are the keywords that you should look up if you’re interested in the topics.

Now the series is officially concluded. Thank you for all your friendly comments, tweets and mails. Many have asked me if I’m going to write other series on other closely related topics. The answer is: maybe. I accept suggestions, but I can’t promise anything.

Thanks for reading and see you next time!

Elliptic Curve Cryptography: ECDH and ECDSA

andreacorbellini — Sat, 30 May 2015 19:23:00 +0000

This post is the third in the series ECC: a gentle introduction.

In the previous posts, we have seen what an elliptic curve is and we have defined a group law in order to do some math with the points of elliptic curves. Then we have restricted elliptic curves to finite fields of integers modulo a prime. With this restriction, we have seen that the points of elliptic curves generate cyclic subgroups and we have introduced the terms base point, order and cofactor.

Finally, we have seen that scalar multiplication in finite fields is an “easy” problem, while the discrete logarithm problem seems to be “hard”. Now we’ll see how all of this applies to cryptography.

Domain parameters

Our elliptic curve algorithms will work in a cyclic subgroup of an elliptic curve over a finite field. Therefore, our algorithms will need the following parameters:

The prime $p$ that specifies the size of the finite field.
The coefficients $a$ and $b$ of the elliptic curve equation.
The base point $G$ that generates our subgroup.
The order $n$ of the subgroup.
The cofactor $h$ of the subgroup.

In conclusion, the domain parameters for our algorithms are the sextuple $(p, a, b, G, n, h)$.

Random curves

When I said that the discrete logarithm problem was “hard”, I wasn’t entirely right. There are some classes of elliptic curves that are particularly weak and allow the use of special purpose algorithms to solve the discrete logarithm problem efficiently. For example, all the curves that have $p = hn$ (that is, the order of the finite field is equal to the order of the elliptic curve) are vulnerable to Smart’s attack, which can be used to solve discrete logarithms in polynomial time on a classical computer.

Now, suppose that I give you the domain parameters of a curve. There’s the possibility that I’ve discovered a new class of weak curves that nobody knows, and probably I have built a “fast” algorithm for computing discrete logarithms on the curve I gave you. How can I convince you of the contrary, i.e. that I’m not aware of any vulnerability? How can I assure you that the curve is “safe” (in the sense that it can’t be used for special purpose attacks by me)?

In an attempt to solve this kind of problem, sometimes we have an additional domain parameter: the seed $S$. This is a random number used to generate the coefficients $a$ and $b$, or the base point $G$, or both. These parameters are generated by computing the hash of the seed $S$. Hashes, as we know, are “easy” to compute, but “hard” to reverse.

A simple sketch of how a random curve is generated from a seed: the hash of a random number is used to calculate different parameters of the curve.

If we wanted to cheat and try to construct a seed from the domain parameters, we would have to solve a "hard" problem: hash inversion.

A curve generated through a seed is said to be verifiably random. The principle of using hashes to generate parameters is known as “nothing up my sleeve”, and is commonly used in cryptography.

This trick should give some sort of assurance that the curve has not been specially crafted to expose vulnerabilities known to the author. In fact, if I give you a curve together with a seed, it means I was not free to arbitrarily choose the parameters $a$ and $b$, and you should be relatively sure that the curve cannot be used for special purpose attacks by me. The reason why I say “relatively” will be explained in the next post.

A standardized algorithm for generating and checking random curves is described in ANSI X9.62 and is based on SHA-1. If you are curious, you can read the algorithms for generating verifiable random curves on a specification by SECG (look for “Verifiably Random Curves and Base Point Generators”).

I’ve created a tiny Python script that verifies all the random curves currently shipped with OpenSSL. I strongly recommend you to check it out!

Elliptic Curve Cryptography

It took us a long time, but finally here we are! Therefore, pure and simple:

The private key is a random integer $d$ chosen from $\{1, \dots, n - 1\}$ (where $n$ is the order of the subgroup).
The public key is the point $H = dG$ (where $G$ is the base point of the subgroup).

You see? If we know $d$ and $G$ (along with the other domain parameters), finding $H$ is “easy”. But if we know $H$ and $G$, finding the private key $d$ is “hard”, because it requires us to solve the discrete logarithm problem.

Now we are going to describe two public-key algorithms based on that: ECDH (Elliptic curve Diffie-Hellman), which is used for encryption, and ECDSA (Elliptic Curve Digital Signature Algorithm), used for digital signing.

Encryption with ECDH

ECDH is a variant of the Diffie-Hellman algorithm for elliptic curves. It is actually a key-agreement protocol, more than an encryption algorithm. This basically means that ECDH defines (to some extent) how keys should be generated and exchanged between parties. How to actually encrypt data using such keys is up to us.

The problem it solves is the following: two parties (the usual Alice and Bob) want to exchange information securely, so that a third party (the Man In the Middle) may intercept them, but may not decode them. This is one of the principles behind TLS, just to give you an example.

Here’s how it works:

First, Alice and Bob generate their own private and public keys. We have the private key $d_A$ and the public key $H_A = d_AG$ for Alice, and the keys $d_B$ and $H_B = d_BG$ for Bob. Note that both Alice and Bob are using the same domain parameters: the same base point $G$ on the same elliptic curve on the same finite field.
Alice and Bob exchange their public keys $H_A$ and $H_B$ over an insecure channel. The Man In the Middle would intercept $H_A$ and $H_B$, but won’t be able to find out neither $d_A$ nor $d_B$ without solving the discrete logarithm problem.
Alice calculates $S = d_A H_B$ (using her own private key and Bob’s public key), and Bob calculates $S = d_B H_A$ (using his own private key and Alice’s public key). Note that $S$ is the same for both Alice and Bob, in fact: $$S = d_A H_B = d_A (d_B G) = d_B (d_A G) = d_B H_A$$

The Man In the Middle, however, only knows $H_A$ and $H_B$ (together with the other domain parameters) and would not be able to find out the shared secret $S$. This is known as the Diffie-Hellman problem, which can be stated as follows:

Given three points $P$, $aP$ and $bP$, what is the result of $abP$?

Or, equivalently:

Given three integers $k$, $k^x$ and $k^y$, what is the result of $k^{xy}$?

(The latter form is used in the original Diffie-Hellman algorithm, based on modular arithmetic.)

The Diffie-Hellman key exchange: Alice and Bob can "easily" calculate the shared secret, the Man in the Middle has to solve a "hard" problem.

The principle behind the Diffie-Hellman problem is also explained in a great YouTube video by Khan Academy, which later explains the Diffie-Hellman algorithm applied to modular arithmetic (not to elliptic curves).

The Diffie-Hellman problem for elliptic curves is assumed to be a “hard” problem. It is believed to be as “hard” as the discrete logarithm problem, although no mathematical proofs are available. What we can tell for sure is that it can’t be “harder”, because solving the logarithm problem is a way of solving the Diffie-Hellman problem.

Now that Alice and Bob have obtained the shared secret, they can exchange data with symmetric encryption.

For example, they can use the $x$ coordinate of $S$ as the key to encrypt messages using secure ciphers like AES or 3DES. This is more or less what TLS does, the difference is that TLS concatenates the $x$ coordinate with other numbers relative to the connection and then computes a hash of the resulting byte string.

Playing with ECDH

I’ve created another Python script for computing public/private keys and shared secrets over an elliptic curve.

Unlike all the examples we have seen till now, this script makes use of a standardized curve, rather than a simple curve on a small field. The curve I’ve chosen is secp256k1, from SECG (the “Standards for Efficient Cryptography Group”, founded by Certicom). This same curve is also used by Bitcoin for digital signatures. Here are the domain parameters:

$p$ = 0xffffffff ffffffff ffffffff ffffffff ffffffff ffffffff fffffffe fffffc2f
$a$ = 0
$b$ = 7
$x_G$ = 0x79be667e f9dcbbac 55a06295 ce870b07 029bfcdb 2dce28d9 59f2815b 16f81798
$y_G$ = 0x483ada77 26a3c465 5da4fbfc 0e1108a8 fd17b448 a6855419 9c47d08f fb10d4b8
$n$ = 0xffffffff ffffffff ffffffff fffffffe baaedce6 af48a03b bfd25e8c d0364141
$h$ = 1

(These numbers were taken from OpenSSL source code.)

Of course, you are free to modify the script to use other curves and domain parameters, just be sure to use prime fields and curves Weierstrass normal form, otherwise the script won’t work.

The script is really simple and includes some of the algorithms we have described so far: point addition, double and add, ECDH. I recommend you to read and run it. It will produce an output like this:

Curve: secp256k1
Alice's private key: 0xe32868331fa8ef0138de0de85478346aec5e3912b6029ae71691c384237a3eeb
Alice's public key: (0x86b1aa5120f079594348c67647679e7ac4c365b2c01330db782b0ba611c1d677, 0x5f4376a23eed633657a90f385ba21068ed7e29859a7fab09e953cc5b3e89beba)
Bob's private key: 0xcef147652aa90162e1fff9cf07f2605ea05529ca215a04350a98ecc24aa34342
Bob's public key: (0x4034127647bb7fdab7f1526c7d10be8b28174e2bba35b06ffd8a26fc2c20134a, 0x9e773199edc1ea792b150270ea3317689286c9fe239dd5b9c5cfd9e81b4b632)
Shared secret: (0x3e2ffbc3aa8a2836c1689e55cd169ba638b58a3a18803fcf7de153525b28c3cd, 0x43ca148c92af58ebdb525542488a4fe6397809200fe8c61b41a105449507083)

Ephemeral ECDH

Some of you may have heard of ECDHE instead of ECDH. The “E” in ECDHE stands for “Ephemeral” and refers to the fact that the keys exchanged are temporary, rather than static.

ECDHE is used, for example, in TLS, where both the client and the server generate their public-private key pair on the fly, when the connection is established. The keys are then signed with the TLS certificate (for authentication) and exchanged between the parties.

Signing with ECDSA

The scenario is the following: Alice wants to sign a message with her private key ($d_A$), and Bob wants to validate the signature using Alice’s public key ($H_A$). Nobody but Alice should be able to produce valid signatures. Everyone should be able to check signatures.

Again, Alice and Bob are using the same domain parameters. The algorithm we are going to see is ECDSA, a variant of the Digital Signature Algorithm applied to elliptic curves.

ECDSA works on the hash of the message, rather than on the message itself. The choice of the hash function is up to us, but it should be obvious that a cryptographically-secure hash function should be chosen. The hash of the message ought to be truncated so that the bit length of the hash is the same as the bit length of $n$ (the order of the subgroup). The truncated hash is an integer and will be denoted as $z$.

The algorithm performed by Alice to sign the message works as follows:

Take a random integer $k$ chosen from $\{1, \dots, n - 1\}$ (where $n$ is still the subgroup order).
Calculate the point $P = kG$ (where $G$ is the base point of the subgroup).
Calculate the number $r = x_P \bmod{n}$ (where $x_P$ is the $x$ coordinate of $P$).
If $r = 0$, then choose another $k$ and try again.
Calculate $s = k^{-1} (z + rd_A) \bmod{n}$ (where $d_A$ is Alice’s private key and $k^{-1}$ is the multiplicative inverse of $k$ modulo $n$).
If $s = 0$, then choose another $k$ and try again.

The pair $(r, s)$ is the signature.

Alice signs the hash $z$ using her private key $d_A$ and a random $k$. Bob verifies that the message has been correctly signed using Alice's public key $H_A$.

In plain words, this algorithm first generates a secret ($k$). This secret is hidden in $r$ thanks to point multiplication (that, as we know, is “easy” one way, and “hard” the other way round). $r$ is then bound to the message hash by the equation $s = k^{-1} (z + rd_A) \bmod{n}$.

Note that in order to calculate $s$, we have computed the inverse of $k$ modulo $n$. We have already said in the previous post that this is guaranteed to work only if $n$ is a prime number. If a subgroup has a non-prime order, ECDSA can’t be used. It’s not by chance that almost all standardized curves have a prime order, and those that have a non-prime order are unsuitable for ECDSA.

Verifying signatures

In order to verify the signature we’ll need Alice’s public key $H_A$, the (truncated) hash $z$ and, obviously, the signature $(r, s)$.

Calculate the integer $u_1 = s^{-1} z \bmod{n}$.
Calculate the integer $u_2 = s^{-1} r \bmod{n}$.
Calculate the point $P = u_1 G + u_2 H_A$.

The signature is valid only if $r = x_P \bmod{n}$.

Correctness of the algorithm

The logic behind this algorithm may not seem obvious at a first sight, however if we put together all the equations we have written so far, things will be clearer.

Let’s start from $P = u_1 G + u_2 H_A$. We know, from the definition of public key, that $H_A = d_A G$ (where $d_A$ is the private key). We can write: $$\begin{align*} P & = u_1 G + u_2 H_A \\ & = u_1 G + u_2 d_A G \\ & = (u_1 + u_2 d_A) G \end{align*}$$

Using the definitions of $u_1$ and $u_2$, we can write: $$\begin{align*} P & = (u_1 + u_2 d_A) G \\ & = (s^{-1} z + s^{-1} r d_A) G \\ & = s^{-1} (z + r d_A) G \end{align*}$$

Here we have omitted “$\text{mod}\ n$” both for brevity, and because the cyclic subgroup generated by $G$ has order $n$, hence “$\text{mod}\ n$” is superfluous.

Previously, we defined $s = k^{-1} (z + rd_A) \bmod{n}$. Multiplying each side of the equation by $k$ and dividing by $s$, we get: $k = s^{-1} (z + rd_A) \bmod{n}$. Substituting this result in our equation for $P$, we get: $$\begin{align*} P & = s^{-1} (z + r d_A) G \\ & = k G \end{align*}$$

This is the same equation for $P$ we had at step 2 of the signature generation algorithm! When generating signatures and when verifying them, we are calculating the same point $P$, just with a different set of equations. This is why the algorithm works.

Playing with ECDSA

Of course, I’ve created a Python script for signature generation and verification. The code shares some parts with the ECDH script, in particular the domain parameters and the public/private key pair generation algorithm.

Here is the kind of output produced by the script:

Curve: secp256k1
Private key: 0x9f4c9eb899bd86e0e83ecca659602a15b2edb648e2ae4ee4a256b17bb29a1a1e
Public key: (0xabd9791437093d377ca25ea974ddc099eafa3d97c7250d2ea32af6a1556f92a, 0x3fe60f6150b6d87ae8d64b78199b13f26977407c801f233288c97ddc4acca326)

Message: b'Hello!'
Signature: (0xddcb8b5abfe46902f2ac54ab9cd5cf205e359c03fdf66ead1130826f79d45478, 0x551a5b2cd8465db43254df998ba577cb28e1ee73c5530430395e4fba96610151)
Verification: signature matches

Message: b'Hi there!'
Verification: invalid signature

Message: b'Hello!'
Public key: (0xc40572bb38dec72b82b3efb1efc8552588b8774149a32e546fb703021cf3b78a, 0x8c6e5c5a9c1ea4cad778072fe955ed1c6a2a92f516f02cab57e0ba7d0765f8bb)
Verification: invalid signature

As you can see, the script first signs a message (the byte string “Hello!”), then verifies the signature. Afterwards, it tries to verify the same signature against another message (“Hi there!”) and verification fails. Lastly, it tries to verify the signature against the correct message, but using another random public key and verification fails again.

The importance of k

When generating ECDSA signatures, it is important to keep the secret $k$ really secret. If we used the same $k$ for all signatures, or if our random number generator were somewhat predictable, an attacker would be able to find out the private key!

This is the kind of mistake made by Sony a few years ago. Basically, the PlayStation 3 game console can run only games signed by Sony with ECDSA. This way, if I wanted to create a new game for PlayStation 3, I couldn’t distribute it to the public without a signature from Sony. The problem is: all the signatures made by Sony were generated using a static $k$.

(Apparently, Sony’s random number generator was inspired by either XKCD or Dilbert.)

In this situation, we could easily recover Sony’s private key $d_S$ by buying just two signed games, extracting their hashes ($z_1$ and $z_2$) and their signatures ($(r_1, s_1)$ and $(r_2, s_2)$), together with the domain parameters. Here’s how:

First off, note that $r_1 = r_2$ (because $r = x_P \bmod{n}$ and $P = kG$ is the same for both signatures).
Consider that $(s_1 - s_2) \bmod{n} = k^{-1} (z_1 - z_2) \bmod{n}$ (this result comes directly from the equation for $s$).
Now multiply each side of the equation by $k$: $k (s_1 - s_2) \bmod{n} = (z_1 - z_2) \bmod{n}$.
Divide by $(s_1 - s_2)$ to get $k = (z_1 - z_2)(s_1 - s_2)^{-1} \bmod{n}$.

The last equation lets us calculate $k$ using only two hashes and their corresponding signatures. Now we can extract the private key using the equation for $s$: $$s = k^{-1}(z + rd_S) \bmod{n}\ \ \Rightarrow\ \ d_S = r^{-1} (sk - z) \bmod{n}$$

Similar techniques may be employed if $k$ is not static but predictable in some way.

Have a great weekend

I really hope you enjoyed what I’ve written here. As usual, don’t hesitate to leave a comment or send me a poke if you need help with something.

Next week I’ll publish the fourth and last article of this series. It’ll be about techniques for solving discrete logarithms, some important problems of Elliptic Curve cryptography, and how ECC compares with RSA. Don’t miss it!

Read the next post of the series »

Elliptic Curve Cryptography: finite fields and discrete logarithms

andreacorbellini — Sat, 23 May 2015 14:08:00 +0000

This post is the second in the series ECC: a gentle introduction.

In the previous post, we have seen how elliptic curves over the real numbers can be used to define a group. Specifically, we have defined a rule for point addition: given three aligned points, their sum is zero ($P + Q + R = 0$). We have derived a geometric method and an algebraic method for computing point additions.

We then introduced scalar multiplication ($nP = P + P + \cdots + P$) and we found out an “easy” algorithm for computing scalar multiplication: double and add.

Now we will restrict our elliptic curves to finite fields, rather than the set of real numbers, and see how things change.

The field of integers modulo p

A finite field is, first of all, a set with a finite number of elements. An example of finite field is the set of integers modulo $p$, where $p$ is a prime number. It is generally denoted as $\mathbb{Z}/p$, $GF(p)$ or $\mathbb{F}_p$. We will use the latter notation.

In fields we have two binary operations: addition (+) and multiplication (·). Both are closed, associative and commutative. For both operations, there exist a unique identity element, and for every element there’s a unique inverse element. Finally, multiplication is distributive over the addition: $x \cdot (y + z) = x \cdot y + x \cdot z$.

The set of integers modulo $p$ consists of all the integers from 0 to $p - 1$. Addition and multiplication work as in modular arithmetic (also known as “clock arithmetic”). Here are a few examples of operations in $\mathbb{F}_{23}$:

Addition: $(18 + 9) \bmod{23} = 4$
Subtraction: $(7 - 14) \bmod{23} = 16$
Multiplication: $4 \cdot 7 \bmod{23} = 5$
Additive inverse: $-5 \bmod{23} = 18$

Indeed: $(5 + (-5)) \bmod{23} = (5 + 18) \bmod{23} = 0$
Multiplicative inverse: $9^{-1} \bmod{23} = 18$

Indeed: $9 \cdot 9^{-1} \bmod{23} = 9 \cdot 18 \bmod{23} = 1$

If these equations don’t look familiar to you and you need a primer on modular arithmetic, check out Khan Academy.

As we already said, the integers modulo $p$ are a field, and therefore all the properties listed above hold. Note that the requirement for $p$ to be prime is important! The set of integers modulo 4 is not a field: 2 has no multiplicative inverse (i.e. the equation $2 \cdot x \bmod{4} = 1$ has no solutions).

Division modulo p

We will soon define elliptic curves over $\mathbb{F}_p$, but before doing so we need a clear idea of what $x / y$ means in $\mathbb{F}_p$. Simply put: $x / y = x \cdot y^{-1}$, or, in plain words, $x$ over $y$ is equal to $x$ times the multiplicative inverse of $y$. This fact is not surprising, but gives us a basic method to perform division: find the multiplicative inverse of a number and then perform a single multiplication.

Computing the multiplicative inverse can be “easily” done with the extended Euclidean algorithm, which is $O(\log p)$ (or $O(k)$ if we consider the bit length) in the worst case.

We won’t enter the details of the extended Euclidean algorithm, as it is off-topic, however here’s a working Python implementation:

def extended_euclidean_algorithm(a, b):
    """
    Returns a three-tuple (gcd, x, y) such that
    a * x + b * y == gcd, where gcd is the greatest
    common divisor of a and b.

    This function implements the extended Euclidean
    algorithm and runs in O(log b) in the worst case.
    """
    s, old_s = 0, 1
    t, old_t = 1, 0
    r, old_r = b, a

    while r != 0:
        quotient = old_r // r
        old_r, r = r, old_r - quotient * r
        old_s, s = s, old_s - quotient * s
        old_t, t = t, old_t - quotient * t

    return old_r, old_s, old_t


def inverse_of(n, p):
    """
    Returns the multiplicative inverse of
    n modulo p.

    This function returns an integer m such that
    (n * m) % p == 1.
    """
    gcd, x, y = extended_euclidean_algorithm(n, p)
    assert (n * x + p * y) % p == gcd

    if gcd != 1:
        # Either n is 0, or p is not a prime number.
        raise ValueError(
            '{} has no multiplicative inverse '
            'modulo {}'.format(n, p))
    else:
        return x % p

Elliptic curves in $\mathbb{F}_p$

Now we have all the necessary elements to restrict elliptic curves over $\mathbb{F}_p$. The set of points, that in the previous post was: $$\begin{array}{rcl} \left\{(x, y) \in \mathbb{R}^2 \right. & \left. | \right. & \left. y^2 = x^3 + ax + b, \right. \\ & & \left. 4a^3 + 27b^2 \ne 0\right\}\ \cup\ \left\{0\right\} \end{array}$$ now becomes: $$\begin{array}{rcl} \left\{(x, y) \in (\mathbb{F}_p)^2 \right. & \left. | \right. & \left. y^2 \equiv x^3 + ax + b \pmod{p}, \right. \\ & & \left. 4a^3 + 27b^2 \not\equiv 0 \pmod{p}\right\}\ \cup\ \left\{0\right\} \end{array}$$

where 0 is still the point at infinity, and $a$ and $b$ are two integers in $\mathbb{F}_p$.

The curve $y^2 \equiv x^3 - 7x + 10 \pmod{p}$ with $p = 19, 97, 127, 487$. Note that, for every $x$, there are at most two points. Also note the symmetry about $y = p / 2$.

The curve $y^2 \equiv x^3 \pmod{29}$ is singular and has a triple point in $(0, 0)$. It is not a valid elliptic curve.

What previously was a continuous curve is now a set of disjoint points in the $xy$-plane. But we can prove that, even if we have restricted our domain, elliptic curves in $\mathbb{F}_p$ still form an abelian group.

Point addition

Clearly, we need to change a bit our definition of addition in order to make it work in $\mathbb{F}_p$. With reals, we said that the sum of three aligned points was zero ($P + Q + R = 0$). We can keep this definition, but what does it mean for three points to be aligned in $\mathbb{F}_p$?

We can say that three points are aligned if there’s a line that connects all of them. Now, of course, lines in $\mathbb{F}_p$ are not the same as lines in $\mathbb{R}$. We can say, informally, that a line in $\mathbb{F}_p$ is the set of points $(x, y)$ that satisfy the equation $ax + by + c \equiv 0 \pmod{p}$ (this is the standard line equation, with the addition of “$(\text{mod}\ p)$”).

Point addition over the curve $y^2 \equiv x^3 - x + 3 \pmod{127}$, with $P = (16, 20)$ and $Q = (41, 120)$. Note how the line $y \equiv 4x + 83 \pmod{127}$ that connects the points "repeats" itself in the plane.

Given that we are in a group, point addition retains the properties we already know:

$Q + 0 = 0 + Q = Q$ (from the definition of identity element).
Given a non-zero point $Q$, the inverse $-Q$ is the point having the same abscissa but opposite ordinate. Or, if you prefer, $-Q = (x_Q, -y_Q \bmod{p})$. For example, if a curve in $\mathbb{F}_{29}$ has a point $Q = (2, 5)$, the inverse is $-Q = (2, -5 \bmod{29}) = (2, 24)$.
Also, $P + (-P) = 0$ (from the definition of inverse element).

Algebraic sum

The equations for calculating point additions are exactly the same as in the previous post, except for the fact that we need to add “$\text{mod}\ p$” at the end of every expression. Therefore, given $P = (x_P, y_P)$, $Q = (x_Q, y_Q)$ and $R = (x_R, y_R)$, we can calculate $P + Q = -R$ as follows: $$\begin{align*} x_R & = (m^2 - x_P - x_Q) \bmod{p} \\ y_R & = [y_P + m(x_R - x_P)] \bmod{p} \\ & = [y_Q + m(x_R - x_Q)] \bmod{p} \end{align*}$$

If $P \ne Q$, the the slope $m$ assumes the form: $$m = (y_P - y_Q)(x_P - x_Q)^{-1} \bmod{p}$$

Else, if $P = Q$, we have: $$m = (3 x_P^2 + a)(2 y_P)^{-1} \bmod{p}$$

It’s not a coincidence that the equations have not changed: in fact, these equations work in every field, finite or infinite (with the exception of $\mathbb{F}_2$ and $\mathbb{F}_3$, which are special cased). Now I feel I have to provide a justification for this fact. The problem is: proofs for the group law generally involve complex mathematical concepts. However, I found a proof from Stefan Friedl that uses only elementary concepts. Read it if you are interested in why these equations work in (almost) every field.

Back to us — we won’t define a geometric method: in fact, there are a few problems with that. For example, in the previous post, we said that to compute $P + P$ we needed to take the tangent to the curve in $P$. But without continuity, the word “tangent” does not make any sense. We can workaround this and other problems, however a pure geometric method would just be too complicated and not practical at all.

Instead, you can play with the interactive tool I’ve written for computing point additions.

The order of an elliptic curve group

We said that an elliptic curve defined over a finite field has a finite number of points. An important question that we need to answer is: how many points are there exactly?

Firstly, let’s say that the number of points in a group is called the order of the group.

Trying all the possible values for $x$ from 0 to $p - 1$ is not a feasible way to count the points, as it would require $O(p)$ steps, and this is “hard” if $p$ is a large prime.

Luckily, there’s a faster algorithm for computing the order: Schoof’s algorithm. I won’t enter the details of the algorithm — what matters is that it runs in polynomial time, and this is what we need.

Scalar multiplication and cyclic subgroups

As with reals, multiplication can be defined as: $$n P = \underbrace{P + P + \cdots + P}_{n\ \text{times}}$$

And, again, we can use the double and add algorithm to perform multiplication in $O(\log n)$ steps (or $O(k)$, where $k$ is the number of bits of $n$). I’ve written an interactive tool for scalar multiplication too.

Multiplication over points for elliptic curves in $\mathbb{F}_p$ has an interesting property. Take the curve $y^2 \equiv x^3 + 2x + 3 \pmod{97}$ and the point $P = (3, 6)$. Now calculate all the multiples of $P$:

The multiples of $P = (3, 6)$ are just five distinct points ($0$, $P$, $2P$, $3P$, $4P$) and they are repeating cyclically. It's easy to spot the similarity between scalar multiplication on elliptic curves and addition in modular arithmetic.

$0P = 0$
$1P = (3, 6)$
$2P = (80, 10)$
$3P = (80, 87)$
$4P = (3, 91)$
$5P = 0$
$6P = (3, 6)$
$7P = (80, 10)$
$8P = (80, 87)$
$9P = (3, 91)$
…

Here we can immediately spot two things: firstly, the multiples of $P$ are just five: the other points of the elliptic curve never appear. Secondly, they are repeating cyclically. We can write:

$5kP = 0$
$(5k + 1)P = P$
$(5k + 2)P = 2P$
$(5k + 3)P = 3P$
$(5k + 4)P = 4P$

for every integer $k$. Note that these five equations can be “compressed” into a single one, thanks to the modulo operator: $kP = (k \bmod{5})P$.

Not only that, but we can immediately verify that these five points are closed under addition. Which means: however I add $0$, $P$, $2P$, $3P$ or $4P$, the result is always one of these five points. Again, the other points of the elliptic curve never appear in the results.

The same holds for every point, not just for $P = (3, 6)$. In fact, if we take a generic $P$: $$nP + mP = \underbrace{P + \cdots + P}_{n\ \text{times}} + \underbrace{P + \cdots + P}_{m\ \text{times}} = (n + m)P$$

Which means: if we add two multiples of $P$, we obtain a multiple of $P$ (i.e. multiples of $P$ are closed under addition). This is enough to prove that the set of the multiples of $P$ is a cyclic subgroup of the group formed by the elliptic curve.

A “subgroup” is a group which is a subset of another group. A “cyclic subgroup” is a subgroup which elements are repeating cyclically, like we have shown in the previous example. The point $P$ is called generator or base point of the cyclic subgroup.

Cyclic subgroups are the foundations of ECC and other cryptosystems. We will see why in the next post.

Subgroup order

We can ask ourselves what the order of a subgroup generated by a point $P$ is (or, equivalently, what the order of $P$ is). To answer this question we can’t use Schoof’s algorithm, because that algorithm only works on whole elliptic curves, not on subgroups. Before approaching the problem, we need a few more bits:

So far, we have the defined the order as the number of points of a group. This definition is still valid, but within a cyclic subgroup we can give a new, equivalent definition: the order of $P$ is the smallest positive integer $n$ such that $nP = 0$. In fact, if you look at the previous example, our subgroup contained five points, and we had $5P = 0$.
The order of $P$ is linked to the order of the elliptic curve by Lagrange’s theorem, which states that the order of a subgroup is a divisor of the order of the parent group. In other words, if an elliptic curve contains $N$ points and one of its subgroups contains $n$ points, then $n$ is a divisor of $N$.

These two information together give us a way to find out the order of a subgroup with base point $P$:

Calculate the elliptic curve’s order $N$ using Schoof’s algorithm.
Find out all the divisors of $N$.
For every divisor $n$ of $N$, compute $nP$.
The smallest $n$ such that $nP = 0$ is the order of the subgroup.

For example, the curve $y^2 = x^3 - x + 3$ over the field $\mathbb{F}_{37}$ has order $N = 42$. Its subgroups may have order $n = 1$, $2$, $3$, $6$, $7$, $14$, $21$ or $42$. If we try $P = (2, 3)$ we can see that $P \ne 0$, $2P \ne 0$, …, $7P = 0$, hence the order of $P$ is $n = 7$.

Note that it’s important to take the smallest divisor, not a random one. If we proceeded randomly, we could have taken $n = 14$, which is not the order of the subgroup, but one of its multiples.

Another example: the elliptic curve defined by the equation $y^2 = x^3 - x + 1$ over the field $\mathbb{F}_{29}$ has order $N = 37$, which is a prime. Its subgroups may only have order $n = 1$ or $37$. As you can easily guess, when $n = 1$, the subgroup contains only the point at infinity; when $n = N$, the subgroup contains all the points of the elliptic curve.

Finding a base point

For our ECC algorithms, we want subgroups with a high order. So in general we will choose an elliptic curve, calculate its order ($N$), choose a high divisor as the subgroup order ($n$) and eventually find a suitable base point. That is: we won’t choose a base point and then calculate its order, but we’ll do the opposite: we will first choose an order that looks good enough and then we will hunt for a suitable base point. How do we do that?

Firstly, we need to introduce one more term. Lagrange’s theorem implies that the number $h = N / n$ is always an integer (because $n$ is a divisor of $N$). The number $h$ has a name: it’s the cofactor of the subgroup.

Now consider that for every point of an elliptic curve we have $NP = 0$. This happens because $N$ is a multiple of any candidate $n$. Using the definition of cofactor, we can write: $$n(hP) = 0$$

Now suppose that $n$ is a prime number (for reason that will be explained in the next post, we prefer prime orders). This equation, written in this form, is telling us that the point $G = hP$ generates a subgroup of order $n$ (except when $G = hP = 0$, in which case the subgroup has order 1).

In the light of this, we can outline the following algorithm:

Calculate the order $N$ of the elliptic curve.
Choose the order $n$ of the subgroup. For the algorithm to work, this number must be prime and must be a divisor of $N$.
Compute the cofactor $h = N / n$.
Choose a random point $P$ on the curve.
Compute $G = hP$.
If $G$ is 0, then go back to step 4. Otherwise we have found a generator of a subgroup with order $n$ and cofactor $h$.

Note that this algorithm only works if $n$ is a prime. If $n$ wasn’t a prime, then the order of $G$ could be one of the divisors of $n$.

Discrete logarithm

As we did when working with continuous elliptic curves, we are now going to discuss the question: if we know $P$ and $Q$, what is $k$ such that $Q = kP$?

This problem, which is known as the discrete logarithm problem for elliptic curves, is believed to be a “hard” problem, in that there is no known polynomial time algorithm that can run on a classical computer. There are, however, no mathematical proofs for this belief.

This problem is also analogous to the discrete logarithm problem used with other cryptosystems such as the Digital Signature Algorithm (DSA), the Diffie-Hellman key exchange (D-H) and the ElGamal algorithm — it’s not a coincidence that they have the same name. The difference is that, with those algorithms, we use modulo exponentiation instead of scalar multiplication. Their discrete logarithm problem can be stated as follows: if we know $a$ and $b$, what’s $k$ such that $b = a^k \bmod{p}$?

Both these problems are “discrete” because they involve finite sets (more precisely, cyclic subgroups). And they are “logarithms” because they are analogous to ordinary logarithms.

What makes ECC interesting is that, as of today, the discrete logarithm problem for elliptic curves seems to be “harder” if compared to other similar problems used in cryptography. This implies that we need fewer bits for the integer $k$ in order to achieve the same level of security as with other cryptosystems, as we will see in details in the fourth and last post of this series.

More next week!

Enough for today! I really hope you enjoyed this post. Leave a comment if you didn’t.

Next week’s post will be the third in this series and will be about ECC algorithms: key pair generation, ECDH and ECDSA. That will be one of the most interesting parts of this series. Don’t miss it!

Read the next post of the series »

Elliptic Curve Cryptography: a gentle introduction

andreacorbellini — Sun, 17 May 2015 11:24:00 +0000

Those of you who know what public-key cryptography is may have already heard of ECC, ECDH or ECDSA. The first is an acronym for Elliptic Curve Cryptography, the others are names for algorithms based on it.

Today, we can find elliptic curves cryptosystems in TLS, PGP and SSH, which are just three of the main technologies on which the modern web and IT world are based. Not to mention Bitcoin and other cryptocurrencies.

Before ECC become popular, almost all public-key algorithms were based on RSA, DSA, and DH, alternative cryptosystems based on modular arithmetic. RSA and friends are still very important today, and often are used alongside ECC. However, while the magic behind RSA and friends can be easily explained, is widely understood, and rough implementations can be written quite easily, the foundations of ECC are still a mystery to most.

With a series of blog posts I’m going to give you a gentle introduction to the world of elliptic curve cryptography. My aim is not to provide a complete and detailed guide to ECC (the web is full of information on the subject), but to provide a simple overview of what ECC is and why it is considered secure, without losing time on long mathematical proofs or boring implementation details. I will also give helpful examples together with visual interactive tools and scripts to play with.

Specifically, here are the topics I’ll touch:

Elliptic curves over real numbers and the group law (covered in this blog post)
Elliptic curves over finite fields and the discrete logarithm problem
Key pair generation and two ECC algorithms: ECDH and ECDSA
Algorithms for breaking ECC security, and a comparison with RSA

In order to understand what’s written here, you’ll need to know some basic stuff of set theory, geometry and modular arithmetic, and have familiarity with symmetric and asymmetric cryptography. Lastly, you need to have a clear idea of what an “easy” problem is, what a “hard” problem is, and their roles in cryptography.

Ready? Let’s start!

Elliptic Curves

First of all: what is an elliptic curve? Wolfram MathWorld gives an excellent and complete definition. But for our aims, an elliptic curve will simply be the set of points described by the equation: $$y^2 = x^3 + ax + b$$

where $4a^3 + 27b^2 \ne 0$ (this is required to exclude singular curves). The equation above is what is called Weierstrass normal form for elliptic curves.

Different shapes for different elliptic curves ($b = 1$, $a$ varying from 2 to -3).

Types of singularities: on the left, a curve with a cusp ($y^2 = x^3$). On the right, a curve with a self-intersection ($y^2 = x^3 - 3x + 2$). None of them is a valid elliptic curve.

Depending on the value of $a$ and $b$, elliptic curves may assume different shapes on the plane. As it can be easily seen and verified, elliptic curves are symmetric about the $x$-axis.

For our aims, we will also need a point at infinity (also known as ideal point) to be part of our curve. From now on, we will denote our point at infinity with the symbol 0 (zero).

If we want to explicitly take into account the point at infinity, we can refine our definition of elliptic curve as follows: $$\left\{ (x, y) \in \mathbb{R}^2\ |\ y^2 = x^3 + ax + b,\ 4 a^3 + 27 b^2 \ne 0 \right\}\ \cup\ \left\{ 0 \right\}$$

Groups

A group in mathematics is a set for which we have defined a binary operation that we call “addition” and indicate with the symbol +. In order for the set $\mathbb{G}$ to be a group, addition must defined so that it respects the following four properties:

closure: if $a$ and $b$ are members of $\mathbb{G}$, then $a + b$ is a member of $\mathbb{G}$;
associativity: $(a + b) + c = a + (b + c)$;
there exists an identity element 0 such that $a + 0 = 0 + a = a$;
every element has an inverse, that is: for every $a$ there exists $b$ such that $a + b = 0$.

If we add a fifth requirement:

commutativity: $a + b = b + a$,

then the group is called abelian group.

With the usual notion of addition, the set of integer numbers $\mathbb{Z}$ is a group (moreover, it’s an abelian group). The set of natural numbers $\mathbb{N}$ however is not a group, as the fourth property can’t be satisfied.

Groups are nice because, if we can demonstrate that those four properties hold, we get some other properties for free. For example: the identity element is unique; also the inverses are unique, that is: for every $a$ there exists only one $b$ such that $a + b = 0$ (and we can write $b$ as $-a$). Either directly or indirectly, these and other facts about groups will be very important for us later.

The group law for elliptic curves

We can define a group over elliptic curves. Specifically:

the elements of the group are the points of an elliptic curve;
the identity element is the point at infinity 0;
the inverse of a point $P$ is the one symmetric about the $x$-axis;
addition is given by the following rule: given three aligned, non-zero points $P$, $Q$ and $R$, their sum is $P + Q + R = 0$.

The sum of three aligned point is 0.

Note that with the last rule, we only require three aligned points, and three points are aligned without respect to order. This means that, if $P$, $Q$ and $R$ are aligned, then $P + (Q + R) = Q + (P + R) = R + (P + Q) = \cdots = 0$. This way, we have intuitively proved that our + operator is both associative and commutative: we are in an abelian group.

So far, so great. But how do we actually compute the sum of two arbitrary points?

Geometric addition

Thanks to the fact that we are in an abelian group, we can write $P + Q + R = 0$ as $P + Q = -R$. This equation, in this form, lets us derive a geometric method to compute the sum between two points $P$ and $Q$: if we draw a line passing through $P$ and $Q$, this line will intersect a third point on the curve, $R$ (this is implied by the fact that $P$, $Q$ and $R$ are aligned). If we take the inverse of this point, $-R$, we have found the result of $P + Q$.

Draw the line through $P$ and $Q$. The line intersects a third point $R$. The point symmetric to it, $-R$, is the result of $P + Q$.

This geometric method works but needs some refinement. Particularly, we need to answer a few questions:

What if $P = 0$ or $Q = 0$? Certainly, we can’t draw any line (0 is not on the $xy$-plane). But given that we have defined 0 as the identity element, $P + 0 = P$ and $0 + Q = Q$, for any $P$ and for any $Q$.
What if $P = -Q$? In this case, the line going through the two points is vertical, and does not intersect any third point. But if $P$ is the inverse of $Q$, then we have $P + Q = P + (-P) = 0$ from the definition of inverse.
What if $P = Q$? In this case, there are infinitely many lines passing through the point. Here things start getting a bit more complicated. But consider a point $Q’ \ne P$. What happens if we make $Q’$ approach $P$, getting closer and closer to it?

As the two points become closer together, the line passing through them becomes tangent to the curve.

As $Q’$ tends towards $P$, the line passing through $P$ and $Q’$ becomes tangent to the curve. In the light of this we can say that $P + P = -R$, where $R$ is the point of intersection between the curve and the line tangent to the curve in $P$. * What if $P \ne Q$, but there is no third point $R$? We are in a case very similar to the previous one. In fact, we are in the case where the line passing through $P$ and $Q$ is tangent to the curve.

If our line intersects just two points, then it means that it's tangent to the curve. It's easy to see how the result of the sum becomes symmetric to one of the two points.

Let’s assume that $P$ is the tangency point. In the previous case, we would have written $P + P = -Q$. That equation now becomes $P + Q = -P$. If, on the other hand, $Q$ were the tangency point, the correct equation would have been $P + Q = -Q$.

The geometric method is now complete and covers all cases. With a pencil and a ruler we are able to perform addition involving every point of any elliptic curve. If you want to try, take a look at the HTML5/JavaScript visual tool I’ve built for computing sums on elliptic curves!

Algebraic addition

If we want a computer to perform point addition, we need to turn the geometric method into an algebraic method. Transforming the rules described above into a set of equations may seem straightforward, but actually it can be really tedious because it requires solving cubic equations. For this reason, here I will report only the results.

First, let’s get get rid of the most annoying corner cases. We already know that $P + (-P) = 0$, and we also know that $P + 0 = 0 + P = P$. So, in our equations, we will avoid these two cases and we will only consider two non-zero, non-symmetric points $P = (x_P, y_P)$ and $Q = (x_Q, y_Q)$.

If $P$ and $Q$ are distinct ($x_P \ne x_Q$), the line through them has slope: $$m = \frac{y_P - y_Q}{x_P - x_Q}$$

The intersection of this line with the elliptic curve is a third point $R = (x_R, y_R)$: $$\begin{align*} x_R & = m^2 - x_P - x_Q \\ y_R & = y_P + m(x_R - x_P) \end{align*}$$

or, equivalently: $$y_R = y_Q + m(x_R - x_Q)$$

Hence $(x_P, y_P) + (x_Q, y_Q) = (x_R, -y_R)$ (pay attention at the signs and remember that $P + Q = -R$).

If we wanted to check whether this result is right, we would have had to check whether $R$ belongs to the curve and whether $P$, $Q$ and $R$ are aligned. Checking whether the points are aligned is trivial, checking that $R$ belongs to the curve is not, as we would need to solve a cubic equation, which is not fun at all.

Instead, let’s play with an example: according to our visual tool, given $P = (1, 2)$ and $Q = (3, 4)$ over the curve $y^2 = x^3 - 7x + 10$, their sum is $P + Q = -R = (-3, 2)$. Let’s see if our equations agree: $$\begin{align*} m & = \frac{y_P - y_Q}{x_P - x_Q} = \frac{2 - 4}{1 - 3} = 1 \\ x_R & = m^2 - x_P - x_Q = 1^2 - 1 - 3 = -3 \\ y_R & = y_P + m(x_R - x_P) = 2 + 1 \cdot (-3 - 1) = -2 \\ & = y_Q + m(x_R - x_Q) = 4 + 1 \cdot (-3 - 3) = -2 \end{align*}$$

Yes, this is correct!

Note that these equations work even if one of $P$ or $Q$ is a tangency point. Let’s try with $P = (-1, 4)$ and $Q = (1, 2)$. $$\begin{align*} m & = \frac{y_P - y_Q}{x_P - x_Q} = \frac{4 - 2}{-1 - 1} = -1 \\ x_R & = m^2 - x_P - x_Q = (-1)^2 - (-1) - 1 = 1 \\ y_R & = y_P + m(x_R - x_P) = 4 + -1 \cdot (1 - (-1)) = 2 \end{align*}$$

We get the result $P + Q = (1, -2)$, which is the same result given by the visual tool.

The case $P = Q$ needs to be treated a bit differently: the equations for $x_R$ and $y_R$ are the same, but given that $x_P = x_Q$, we must use a different equation for the slope: $$m = \frac{3 x_P^2 + a}{2 y_P}$$

Note that, as we would expect, this expression for $m$ is the first derivative of: $$y_P = \pm \sqrt{x_P^3 + ax_P + b}$$

To prove the validity of this result it is enough to check that $R$ belongs to the curve and that the line passing through $P$ and $R$ has only two intersections with the curve. But again, we don’t prove this fact, and instead try with an example: $P = Q = (1, 2)$. $$\begin{align*} m & = \frac{3x_P^2 + a}{2 y_P} = \frac{3 \cdot 1^2 - 7}{2 \cdot 2} = -1 \\ x_R & = m^2 - x_P - x_Q = (-1)^2 - 1 - 1 = -1 \\ y_R & = y_P + m(x_R - x_P) = 2 + (-1) \cdot (-1 - 1) = 4 \end{align*}$$

Which gives us $P + P = -R = (-1, -4)$. Correct!

Although the procedure to derive them can be really tedious, our equations are pretty compact. This is thanks to Weierstrass normal form: without it, these equations could have been really long and complicated!

Scalar multiplication

Other than addition, we can define another operation: scalar multiplication, that is: $$nP = \underbrace{P + P + \cdots + P}_{n\ \text{times}}$$

where $n$ is a natural number. I’ve written a visual tool for scalar multiplication too, if you want to play with that.

Written in that form, it may seem that computing $nP$ requires $n$ additions. If $n$ has $k$ binary digits, then our algorithm would be $O(2^k)$, which is not really good. But there exist faster algorithms.

One of them is the double and add algorithm. Its principle of operation can be better explained with an example. Take $n = 151$. Its binary representation is $10010111_2$. This binary representation can be turned into a sum of powers of two: $$\begin{align*} 151 & = 1 \cdot 2^7 + 0 \cdot 2^6 + 0 \cdot 2^5 + 1 \cdot 2^4 + 0 \cdot 2^3 + 1 \cdot 2^2 + 1 \cdot 2^1 + 1 \cdot 2^0 \\ & = 2^7 + 2^4 + 2^2 + 2^1 + 2^0 \end{align*}$$

(We have taken each binary digit of $n$ and multiplied it by a power of two.)

In view of this, we can write: $$151 \cdot P = 2^7 P + 2^4 P + 2^2 P + 2^1 P + 2^0 P$$

What the double and add algorithm tells us to do is:

Take $P$.
Double it, so that we get $2P$.
Add $2P$ to $P$ (in order to get the result of $2^1P + 2^0P$).
Double $2P$, so that we get $2^2P$.
Add it to our result (so that we get $2^2P + 2^1P + 2^0P$).
Double $2^2P$ to get $2^3P$.
Don’t perform any addition involving $2^3P$.
Double $2^3P$ to get $2^4P$.
Add it to our result (so that we get $2^4P + 2^2P + 2^1P + 2^0P$).
…

In the end, we can compute $151 \cdot P$ performing just seven doublings and four additions.

If this is not clear enough, here’s a Python script that implements the algorithm:

def bits(n):
    """
    Generates the binary digits of n, starting
    from the least significant bit.

    bits(151) -> 1, 1, 1, 0, 1, 0, 0, 1
    """
    while n:
        yield n & 1
        n >>= 1

def double_and_add(n, x):
    """
    Returns the result of n * x, computed using
    the double and add algorithm.
    """
    result = 0
    addend = x

    for bit in bits(n):
        if bit == 1:
            result += addend
        addend *= 2

    return result

If doubling and adding are both $O(1)$ operations, then this algorithm is $O(\log n)$ (or $O(k)$ if we consider the bit length), which is pretty good. Surely much better than the initial $O(n)$ algorithm!

Logarithm

Given $n$ and $P$, we now have at least one polynomial time algorithm for computing $Q = nP$. But what about the other way round? What if we know $Q$ and $P$ and need to find out $n$? This problem is known as the logarithm problem. We call it “logarithm” instead of “division” for conformity with other cryptosystems (where instead of multiplication we have exponentiation).

I don’t know of any “easy” algorithm for the logarithm problem, however playing with multiplication it’s easy to see some patterns. For example, take the curve $y^2 = x^3 - 3x + 1$ and the point $P = (0, 1)$. We can immediately verify that, if $n$ is odd, $nP$ is on the curve on the left semiplane; if $n$ is even, $nP$ is on the curve on the right semiplane. If we experimented more, we could probably find more patterns that eventually could lead us to write an algorithm for computing the logarithm on that curve efficiently.

But there’s a variant of the logarithm problem: the discrete logarithm problem. As we will see in the next post, if we reduce the domain of our elliptic curves, scalar multiplication remains “easy”, while the discrete logarithm becomes a “hard” problem. This duality is the key brick of elliptic curve cryptography.

See you next week

That’s all for today, I hope you enjoyed this post! Next week we will discover finite fields and the discrete logarithm problem, along with examples and tools to play with. If this stuff sounds interesting to you, then stay tuned!

Read the next post of the series »

Let's Encrypt: the road towards a better web?

andreacorbellini — Sun, 12 Apr 2015 16:07:00 +0000

I’ve always dreamed of a encrypted web, where HTTPS is the standard and plain HTTP is no more. A web where eavesdropping or manipulating information is not possible, or at least much harder than today.

I remember that I got excited when I first heard of CAcert: “a community-driven Certificate Authority that issues certificates to the public at large for free”. Unfortunately, CAcert’s root certificate never made it into the major web browsers and operating systems. Whatever the reasons, the result is that visiting a HTTPS website with a certificate released by CAcert produces nothing but a scary warning with a call to leave the site, making CAcert unsuitable for most.

StarCom, on the other hand, has made it into the major browsers. But despite its certificates are released for free, it has never become much widespread. Also, StarCom has been heavily criticized for how the Heartbleed vulnerability was handled, and AFAIK this has led many customers away.

Let’s Encrypt

Recently, I learned about Let’s Encrypt: a “free, automated, and open” Certificate Authority arriving in mid-2015. There are many important facts that make Let’s Encrypt different and better from all the other Certificate Authorities out there. I’ll let you discover all of them. Probably, the most important fact is that Let’s Encrypt has important sponsors, including Mozilla. And this is what matters today, because it gives Let’s Encrypt a chance to be included in at least one major browser.

Let's Encrypt logo.

Another interesting fact about Let’s Encrypt is that its certificates are released in a way that is both secure and automated at the same time. This gives the opportunity for other (potential) Certificate Authorities to adopt the same automated system.

If Let’s Encrypt wins, then everyone will have an easy way to obtain a free HTTPS certificate for their website. The next big step would be making Let’s Encrypt increase in adoption and the final step would be deprecating plain HTTP. There are however a few open questions:

What will be the answer from Google, Apple, Microsoft and other major browser/operating systems makers?
What will be the reaction of Verisign and Comodo? (That together hold more than 50% of all the certificates currently used on the web.)
Will they declare war to Let’s Encrypt or will they consolidate their efforts on customer services and Extended Validation?
Will the technology behind Let’s Encrypt allow the creation of a new model for certificate management? Will we see web servers and providers with built-in support for it?

I do not have an answer to these questions, time will tell. However I really hope my dream to become a reality soon. If you, like me, want Let’s Encrypt to be a success, then please share and discuss about it. Perhaps, one day, we will find ourselves teaching juniors that HTTPS has not always been the standard… :)

Running Ubuntu Snappy inside Docker

andreacorbellini — Wed, 25 Mar 2015 20:46:00 +0000

Many of you may have already heard of Ubuntu Core. For those who haven’t, it’s a minimal Ubuntu version, running only a few essential services and ships with a new package manager (snappy) that provides transactional updates. Ubuntu Core provides a lightweight base operating system which is fast to deploy and easy to maintain up to date. It also uses a nice security model.

All these characteristics make it particularly appealing for the cloud. And, in fact, people are starting considering it for building their (micro)services architectures. Some weeks ago, a user on Ask Ubuntu asked: Can I run Snappy Ubuntu Core as a guest inside Docker? The problem is that Ubuntu Core does not ship with an official Docker image that we can pull, so we are forced to set it up manually. Here’s how.

Creating the Docker image

Step 1: get the latest Ubuntu Core

As of writing, the latest Ubuntu Core image is alpha 3 and can be downloaded with:

$ wget http://cdimage.ubuntu.com/ubuntu-core/releases/alpha-3/ubuntu-core-WEBDM-alpha-03_amd64-generic.img.xz

(If you browse to cdimage.ubuntu.com, you can also find the signed hashsums.)

The downloaded image is XZ-compressed and we need to extract it:

$ unxz ubuntu-core-WEBDM-alpha-03_amd64-generic.img.xz

Step 2: connect the image using qemu-nbd

The file we have just downloaded and extracted is a filesystem dump. The previous version of the image (Alpha 2) was a QCOW2 image (the format used by QEMU). In order to access its contents, we have a few options. Here I’ll show one that works with both filesystem dumps and QCOW2 images. The trick consists in using qemu-nbd (a tool from the qemu-utils package):

# qemu-nbd -rc /dev/nbd0 ubuntu-core-WEBDM-alpha-03_amd64-generic.img

This command will create a virtual device named /dev/nbd0, with virtual partitions named /dev/nbd0p1, /dev/nbd0p2, … Use fdisk -l /dev/nbd0 to get an idea of what partitions are inside the QCOW2 image.

Step 3: mount the filesystem

The partition we are interested in is /dev/nbd0p3, so we need to mount it:

# mkdir nbd0p3
# mount -r /dev/nbd0p3 nbd0p3

Step 4: create a base Docker image

As suggested on the Docker documentation, creating a base Docker image from a directory is pretty straightforward:

# tar -C nbd0p3 -c . | docker import - ubuntu-core alpha-3

Our newly created image will now appear when running docker images:

# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
ubuntu-core         alpha-3             f6df3c0e2d74        5 seconds ago       543.5 MB

Let’s verify if we did a good job:

# docker run ubuntu-core:alpha-3 snappy
Usage:snappy [-h] [-v]
             {info,versions,search,update-versions,update,rollback,install,uninstall,tags,config,build,booted,chroot,framework,fake-version,nap}
             ...

Yes! We have successfully added Ubuntu Core to the available Docker images and we have run our first snappy container!

Installing and running software

Without wasting too many words, here’s how to install and run the xkcd-webserver snappy package inside docker:

# docker run -p 8000:80 ubuntu-core:alpha-3 /bin/sh -c 'snappy install xkcd-webserver && cd /apps/xkcd-webserver/0.3.1 && ./bin/xkcd-webserver'
WARN: AppArmor not available when processing AppArmor hook
Failed to get D-Bus connection: Operation not permitted
Failed to get D-Bus connection: Operation not permitted

** (process:13): WARNING **: user.vala:637: Can not connect to logind
xkcd-webserver     21 kB     [======================================]    OK
WARNING: failed to connect to dbus: org.freedesktop.DBus.Error.FileNotFound: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
Part            Tag   Installed  Available  Fingerprint     Active
xkcd-webserver  edge  0.3.1      -          3a9152b8bff494  *

Now, if you visit http://localhost:8000/ you should see a random XKCD comic.

If you have payed attention, you may have noticed a few warnings about AppArmor, DBus and logind. The reason why you are seeing these warnings is pretty simple: we did not start neither AppArmor nor DBus nor logind. Now, generally speaking, we could run init inside Docker and fix these and other warnings. However that’s not what Docker is meant for. So if you want to run AppArmor or similar stuff from inside Docker or LXC, then probably you should consider virtualization.

Dockerfile

Once you have created the base Docker image, you can start creating some Dockerfiles, if you need to. Here’s an example:

FROM ubuntu-core:alpha-3
RUN snappy install xkcd-webserver
EXPOSE 8000:80
CMD cd /apps/xkcd-webserver/0.3.1 && ./bin/xkcd-webserver

This Dockerfile does the same job as the previous command: it installs and runs xkcd-webserver on port 8000. In order to use it, first build it:

# docker build -t xkcd-webserver .

Check that it has been correctly installed:

# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
xkcd-webserver      latest              260e0116e9e3        3 minutes ago       543.5 MB
ubuntu-core         alpha-3             f6df3c0e2d74        About an hour ago   543.5 MB

Then run it:

# docker run xkcd-webserver

Again, you should see a random XKCD comic on http://localhost:8000/.

Conclusion

That’s all folks! I hope you enjoyed this tiny guide, and if you need help, please ask a question on Ask Ubuntu with the ubuntu-core tag, which I’m subscribed to.

Are LXC and Docker secure?

andreacorbellini — Fri, 20 Feb 2015 16:36:00 +0000

Since its initial release in 2008, LXC has become widespread among servers. Today, it is becoming the preferred deployment strategy in many contexts, also thanks to Docker and, more recently, LXD.

LXC and Docker are used not only to achieve modular architecture design, but also as a way to run untrusted code in an isolated environment.

We can agree that the LXC and Docker ecosystems are great and work well, but there’s an important question that I believe everyone should ask, but too few people are asking: are LXC and Docker secure?

A system is as safe as its weakest component.

In order to answer this question, I won’t go deep into the details of what LXC and Docker are. The web is full of information on namespaces and cgroups. Rather, I’d like to show what LXC and Docker can do, what they cannot do, and what their default configuration allows them to do. My hope is to provide a quick checklist for those who want to go with LXC/Docker, but are unsure on what they need to pay attention to.

What LXC and Docker can do

As we all know, LXC confines processes mainly thanks to two Linux kernel features: namespaces and cgroups. These provide ways to control and limit access to resource such as memory or filesystem. So, for example, you can limit the bandwidth used by processes inside a container, you can limit the priority of the CPU scheduler, and so on.

As it is well known, processes inside a LXC guest cannot:

directly interact with the host processes, or with other LXC containers;
access the root filesystem, unless configured otherwise;
access special devices (block devices, network interfaces, …), unless configured otherwise;
mount arbitrary filesystems;
execute special ioctls, special syscalls or special interrupts, that would affect the behavior host.

And at the same time, processes inside an LXC guest can find an environment that is perfectly suitable to run a working operating system: I can run init, I can read from /proc, I can access the internet.

This is most of what LXC can do, and it’s also what you get by default. Docker (when used with the LXC backend) is a wrapper around LXC that provides utilities for easy deployment and management of the containers, so everything that applies to LXC, applies to Docker too.

If this sounds great, then beware that there are the things you should know…

You need a security context

LXC is somewhat incomplete. What I mean is that some parts of special filesystems like procfs or sysfs are not faked. For example, as of now, I can successfully change the value of host’s /proc/sys/kernel/panic or /sys/class/thermal/cooling_device0/cur_state.

The reason why LXC is “incomplete” doesn’t really matter (it’s actually the kernel to be incomplete, but anyhow…). What matters is that certain nasty actions can be forbade, not by LXC itself, but by an AppArmor/SELinux profile that blocks read and write access certain /proc and /sys components. The AppArmor rules were shipped in Ubuntu since 12.10 (Quantal), and have been included upstream since early 2014, together with the SELinux rules.

Therefore, a security context like AppArmor or SELinux is required to run LXC safely. Without it, the root user inside a guest can take control of the host.

Check that AppArmor or SELinux are running and are configured properly. If you want to go with Grsecurity, then remember to configure it manually.

Limit resource consumption

LXC offers ways to limit resource usage, but no special restrictions are put in place by default. You have to configure them by yourself.

With the default configuration, I can run fork-bombs, request huge memory maps, keep all CPUs busy, doing high loads of I/O. All of this without special privileges. Remember this when running untrusted code.

To limit resource consumption in LXC, open the configuration file for your container and set the lxc.cgroup.<system> values you need.

For example, if you want to limit the container memory usage to 512 MiB, set lxc.cgroup.memory.limit_in_bytes = 512M. Note that the container with that option, once it exceeds the 512 MiB cap, will start using the swap without limits. If this is not what you want, then set lxc.cgroup.memory.memsw.max_usage_in_bytes = 512M. Note that to use both options you may need to add cgroup_enable=memory and swapaccount=1 to the kernel command line.

To have an overview of all possible options, check out Red Hat’s documentation or the Kernel documentation.

With Docker, the story is similar: just use --lxc-conf from the command line to set LXC’s options.

Limit disk usage

Something that LXC cannot do is limiting mass storage usage. Luckily, LXC integrates nicely with LVM (and brtfs, and zfs, and overlayfs), and you can use that for easily limiting disk usage. You can, for example, create a logical volume for each of your guests, and give that volume a limited size, so that space usage inside a guest cannot grow indefinitely.

The same holds for Docker.

Pay attention at `/dev/random`

Processes inside LXC guests, by default, can read from /dev/random and can consume the entropy of the host. This may cause troubles if you need big amounts of randomness (to generate keys or whatever).

If this is something that you don’t want, then configure LXC so that it denies access to the character devices 1:8 (random) and 1:9 (urandom). Denying access to the path /dev/random is not enough, as mknod is allowed inside guests.

Note however that doing so may break many applications inside the LXC guest that need randomness. Maybe consider using a different machine for processes that require randomness for security purposes.

Use unprivileged containers

Containers can be run from an unprivileged user. This means UID 0 of the guest can’t match UID 0 of the host, and many potential security holes can’t simply be exploited. Unfortunately, Docker has not support for unprivileged containers yet.

However, if Docker is not a requirement and you can do well with LXC, start experimenting with unprivileged containers and consider using them in production.

Programs like Apache will complain that it’s unable to change its ulimit (because setting the ulimit is a privilege of the real root user). If you need to run programs that require special privileges, either configure them so that they do not complain, or consider using capabilities (but do not abuse them, and be cautious, or you risk introducing more problems than the ones your are trying to solve!)

Conclusion

LXC, Docker and the entire ecosystem around them can be considered quite mature and stable. They’re surely production ready, and, if the right configuration is put in place, it can be pretty difficult to cause troubles to the host.

However, whether they can be considered secure or not is up to you: what are you using containers for? Who are you giving access to? What privileges are you giving, what actions are you restricting?

Always remember what LXC and Docker do by default, and what they do not do, especially when you use them to run untrusted code. Those that I have listed may only be a few of the problems that LXC, Docker and friends may expose. Remember to carefully review your configuration before opening the doors to others.

Prime numbers and universe factories

andreacorbellini — Sun, 15 Feb 2015 16:54:00 +0000

I’m a XKCD fan, and I look it up regularly. There’s a comic that I particularly enjoyed: Pi Equals.

The comic Pi Equals, from XKCD.com (CC-BY-NC 2.5).

Well, it appears that Randall was right in that there’s a help message hidden somewhere. And I just found it in a prime number:

245178888024581899558766786108789912235672909204719666025638877624752119760547413887830514281649480308707369249

That number corresponds to the ASCII encoding of this message:

help!! i'm trapped in a universe factory!!!!!!

Apparently, universe factory workers speak English and write ASCII. Nice coincidence, huh?

The discovery

Yesterday I was playing with the two illegal primes listed on Wikipedia. I was already aware of them, but I had never decoded them till yesterday. While doing so I wondered: how many prime numbers can be directly mapped to an executable file? Also, how many prime numbers can be directly mapped to plain English texts? Perhaps, while digging prime numbers, could we find something like the Iliad or a fully working operating system?

Well, while asking myself those highly philosophical questions, Randall’s comic quickly came to my mind, and I decided to start looking for help requests hidden in primes. You can’t imagine how many of them I found!

At first I tried looking for all prime numbers corresponding to strings starting with HELP! I'M TRAPPED IN A UNIVERSE FACTORY!, with an arbitrary suffix. I found many of them, but I wasn’t satisfied with the result: I wanted something that was purely English/ASCII, without any garbage. Therefore I tried appending hashtags like #help or #universe, but could not find any interesting combination that was also a prime number (apparently, use of Twitter is forbidden inside universe factories).

So I decided to change approach: I looked for all primes corresponding to HELP, followed by a variable number of exclamation marks, followed by I'M TRAPPED IN A UNIVERSE FACTORY, followed by other exclamation marks. I could not find anything.

But then I tried with a lower case string, and… I found lots of such primes!

help i'm trapped in a universe factory!!!!!!!
help! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!! i'm trapped in a universe factory!!!!!!
help!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!! i'm trapped in a universe factory!!!!
help!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!! i'm trapped in a universe factory!
help!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!
help!!!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!
...

I picked the one I liked most and verified its primality with Wolfram|Alpha and numberempire.com.

I’m not 100% sure that all the others are primes, as I used Fermat primality test. However I’m impressed by what I found. Now I can’t stop wondering how much literature, physics or technology could be hidden in prime numbers, in plain English and UTF-8 encoded. :D

(Obviously, I’m perfectly conscious on what’s happening here, but I though this was a nice fact to share. It could also be a nice number to print on a shirt.)

Dear universe factory worker, I’m going to rescue you, sooner or later. Just tell me how.

New blog, again

andreacorbellini — Sun, 15 Feb 2015 12:23:00 +0000

This must be the third blog I start from scratch. But this time, I’m taking a serious commitment: I’m going to write here regularly.

Wish me luck!