Andrea Corbellinihttp://andrea.corbellini.name/Wed, 13 Apr 2016 18:00:00 +0000Running Docker Swarm inside LXChttp://andrea.corbellini.name/2016/04/13/docker-swarm-inside-lxc/<p>I've been using <a href="https://docs.docker.com/swarm/">Docker Swarm</a> inside <a href="https://linuxcontainers.org/lxc/introduction/">LXC</a> containers for a while now, and I thought that I could share my experience with you. Due to their nature, LXC containers are pretty lightweight and require very few resources if compared to virtual machines. This makes LXC ideal for development and simulation purposes. Running Docker Swarm inside LXC requires a few steps that I'm going to show you in this tutorial.</p>
<p>Before we begin, a quick premise: LXC, Docker and Swarm can be configured in many different ways. Here I'm showing just my preferred setup: LXC with AppArmor disabled, Docker with the OverlayFS storage driver, Swarm with etcd discovery. There exist many other kind of configurations that can work under LXC — leave a comment if you want to know more.</p>
<p><strong>Overview:</strong></p>
<ol>
<li><a href="#step-1">Create the Swarm Manager container</a></li>
<li><a href="#step-2">Modify configuration for the Swarm Manager container</a></li>
<li><a href="#step-3">Load the OverlayFS module</a></li>
<li><a href="#step-4">Start the container and install Docker</a></li>
<li><a href="#step-5">Check if Docker is working</a></li>
<li><a href="#step-6">Set up the Swarm Manager</a></li>
<li><a href="#step-7">Create the Swarm Agents</a></li>
<li><a href="#step-8">Play with the Swarm</a></li>
</ol>
<p><strong>Terminology:</strong></p>
<ul>
<li>the <em>host</em> is the system that will create and start the LXC containers (e.g. your laptop);</li>
<li>the <em>manager</em> is the LXC container that will run the Swarm manager (it'll run the <code>swarm manage</code> command);</li>
<li>an <em>agent</em> is one of the many LXC containers that will run a Swarm agent node (it'll run the <code>swarm join</code> command);</li>
</ul>
<p>To avoid ambiguity, all commands will be prefixed with a prompt such as <code>root@host:~#</code>, <code>root@swarm-manager:~#</code> and <code>root@swarm-agent-1:~#</code>.</p>
<p><strong>Prerequisites:</strong></p>
<p>This tutorial assumes that you have at least a vague idea of what Docker and Docker Swarm are. You should also be familiar with the shell.</p>
<p>This tutorial has been succesfully tested on Ubuntu 15.10 (that ships with Docker 1.6) and Ubuntu 16.04 LTS (Docker 1.10), but it may work on other distributions and Docker versions as well.</p>
<h2 id="step-1">Step 1: Create the Swarm Manager container</h2>
<p>Create a new LXC container with:</p>
<div class="highlight"><pre><span></span><span class="gp">root@host:~#</span> lxc-create -t download -n swarm-manager
</pre></div>
<p>When prompted, choose your favorite distribution and architecture. I chose <code>ubuntu</code> / <code>xenial</code> / <code>amd64</code>.</p>
<p><code>lxc-create</code> needs to run as root, <a href="https://www.stgraber.org/2014/01/17/lxc-1-0-unprivileged-containers/">unprivileged containers</a> won't work. We could actually make Docker start inside an unprivileged container, the problem is that we wouldn't be allowed to create block and character devices, and many Docker containers need this ability.</p>
<h2 id="step-2">Step 2: Modify the configuration for the Swarm Manager container</h2>
<p>Before starting the LXC container, open the file <code>/var/lib/lxc/swarm-manager/config</code> on the host and add the following configuration to the bottom of the file:</p>
<div class="highlight"><pre><span></span><span class="c1"># Distribution configuration</span>
<span class="c1"># ...</span>
<span class="c1"># Container specific configuration</span>
<span class="c1"># ...</span>
<span class="c1"># Network configuration</span>
<span class="c1"># ...</span>
<span class="c1"># Allow running Docker inside LXC</span>
lxc.aa_profile <span class="o">=</span> unconfined
lxc.cap.drop <span class="o">=</span>
</pre></div>
<p>The first rule (<code>lxc.aa_profile = unconfined</code>) disables AppArmor confinement. The second one (<code>lxc.cap.drop =</code>) gives all capabilities to the processes in LXC container.</p>
<p>These two rules may seem harmful from a security standpoint, and in fact they are. However we must remember that we will be running Docker inside the LXC container. Docker already ships with its own AppArmor profile and the two rules above are needed exactly for the purposes of letting Docker talk to AppArmor.</p>
<p>So, while Docker itself won't be confined, <strong>Docker containers will be confined</strong>, and this is an encouraging fact.</p>
<h2 id="step-3">Step 3: Load the OverlayFS module</h2>
<p>OverlayFS is shipped with Ubuntu, but not enabled by default. To enable it:</p>
<div class="highlight"><pre><span></span><span class="gp">root@host:~#</span> modprobe overlay
</pre></div>
<p>It is important to do this step before installing Docker. Docker supports various storage drivers and when Docker is installed for the first time it tries to detect the most appropriate one for the system. If Docker detects that OverlayFS is not loaded, it'll fall back to the device mapper. There's nothing wrong with the device mapper, we can make it work, however, as I said at the beginning, in this tutorial I'm focusing only on OverlayFS.</p>
<p>If you want to load OverlayFS at boot, instead of doing it manually after every reboot, add it to <code>/etc/modules-load.d/modules.conf</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">root@host:~#</span> <span class="nb">echo</span> overlay >> /etc/modules-load.d/modules.conf
</pre></div>
<h2 id="step-4">Step 4: Start the container and install Docker</h2>
<p>It's time to see if we did everything right!</p>
<div class="highlight"><pre><span></span><span class="gp">root@host:~#</span> lxc-start -n swarm-manager
<span class="gp">root@host:~#</span> lxc-attach -n swarm-manager
<span class="gp">root@swarm-manager:~#</span> apt update
<span class="gp">root@swarm-manager:~#</span> apt install docker.io
</pre></div>
<p>Installation should complete without any problem. If you get an error like this:</p>
<div class="highlight"><pre><span></span>Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
invoke-rc.d: initscript docker, action "start" failed.
dpkg: error processing package docker.io (--configure):
subprocess installed post-installation script returned error exit status 1
</pre></div>
<p>It means that Docker failed to start. Try checking <code>systemctl status docker</code> as suggested, or run <code>docker daemon</code> manually. You might get an error like this:</p>
<div class="highlight"><pre><span></span><span class="gp">root@swarm-manager:~#</span> docker daemon
<span class="go">WARN[0000] devmapper: Udev sync is not supported. This will lead to unexpected behavior, data loss and errors. For more information, see https://docs.docker.com/reference/commandline/daemon/#daemon-storage-driver-option</span>
<span class="go">ERRO[0000] There are no more loopback devices available.</span>
<span class="go">ERRO[0000] [graphdriver] prior storage driver "devicemapper" failed: loopback attach failed</span>
<span class="go">FATA[0000] Error starting daemon: error initializing graphdriver: loopback attach failed</span>
</pre></div>
<p>In this case, Docker is using the devicemapper storage driver and is complaining about the lack of loopback devices. If that's the case, check whether OverlayFS is loaded and reinstall Docker.</p>
<p>Or you might get an error like this:</p>
<div class="highlight"><pre><span></span><span class="gp">root@swarm-manager:~#</span> docker daemon
<span class="go">...</span>
<span class="go">FATA[0000] Error starting daemon: AppArmor enabled on system but the docker-default profile could not be loaded.</span>
</pre></div>
<p>It this other case, Docker is complaining about the fact that it can't talk to AppArmor. Check the configuration for the LXC container.</p>
<h2 id="step-5">Step 5: Check if Docker is working</h2>
<p>Once you are all set, you should be able to use Docker: try running <code>docker info</code>, <code>docker ps</code> or launch a container:</p>
<div class="highlight"><pre><span></span><span class="gp">root@swarm-manager:~#</span> docker run --rm docker/whalesay cowsay burp!
<span class="go">Unable to find image 'docker/whalesay:latest' locally</span>
<span class="go">latest: Pulling from docker/whalesay</span>
<span class="go">...</span>
<span class="go">Status: Downloaded newer image for docker/whalesay:latest</span>
<span class="go"> _______</span>
<span class="go">< burp! ></span>
<span class="go"> -------</span>
<span class="go"> \</span>
<span class="go"> \</span>
<span class="go"> \</span>
<span class="gp"> #</span><span class="c1"># .</span>
<span class="gp"> #</span><span class="c1"># ## ## ==</span>
<span class="gp"> #</span><span class="c1"># ## ## ## ===</span>
<span class="go"> /""""""""""""""""___/ ===</span>
<span class="go"> ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ / ===- ~~~</span>
<span class="go"> \______ o __/</span>
<span class="go"> \ \ __/</span>
<span class="go"> \____\______/</span>
</pre></div>
<p>It appears to be working. By the way, we can check whether Docker is correctly confining containers. Try running a Docker container and check on the host the output of <code>aa-status</code>: you should see a process running with the <code>docker-default</code> profile. For example:</p>
<div class="highlight"><pre><span></span><span class="gp">root@swarm-manager:~#</span> docker run --rm ubuntu bash -c <span class="s1">'while true; do sleep 1; echo -n zZ; done'</span>
<span class="go">zZzZzZzZzZzZzZzZ...</span>
<span class="gp">#</span> On another shell
<span class="gp">root@host:~#</span> aa-status
<span class="go">apparmor module is loaded.</span>
<span class="go">5 profiles are loaded.</span>
<span class="go">5 profiles are in enforce mode.</span>
<span class="go"> /sbin/dhclient</span>
<span class="go"> /usr/lib/NetworkManager/nm-dhcp-client.action</span>
<span class="go"> /usr/lib/NetworkManager/nm-dhcp-helper</span>
<span class="go"> /usr/lib/connman/scripts/dhclient-script</span>
<span class="go"> docker-default</span>
<span class="go">0 profiles are in complain mode.</span>
<span class="go">4 processes have profiles defined.</span>
<span class="go">4 processes are in enforce mode.</span>
<span class="go"> /sbin/dhclient (797)</span>
<span class="go"> /sbin/dhclient (2832)</span>
<span class="go"> docker-default (6956)</span>
<span class="go"> docker-default (6973)</span>
<span class="go">0 processes are in complain mode.</span>
<span class="go">0 processes are unconfined but have a profile defined.</span>
<span class="gp">root@host:~#</span> ps -ef <span class="p">|</span> grep 6956
<span class="go">root 6956 4982 0 17:17 ? 00:00:00 bash -c while true; do sleep 1; echo -n zZ; done</span>
<span class="go">root 6973 6956 0 17:17 ? 00:00:00 sleep 1</span>
<span class="go">root 6982 6808 0 17:17 pts/3 00:00:00 grep --color=auto 6956</span>
</pre></div>
<p>Yay! Everything is running as expected: we launched a process inside a Docker container, and that process is running with the <code>docker-default</code> AppArmor profile. Once again: even if LXC is running unconfined, our Docker containers are not.</p>
<h2 id="step-6">Step 6: Set up the Swarm Manager</h2>
<p>That was the hardest part. Now we can proceed setting up Swarm as we would usually do.</p>
<p>As I said at the beginning, Swarm can be configured in many ways. In this tutorial I'll show how to set it up with etcd discovery. First of all, we need the IP address of the LXC container:</p>
<div class="highlight"><pre><span></span><span class="gp">root@swarm-manager:~#</span> ifconfig eth0
<span class="go">eth0 Link encap:Ethernet HWaddr 00:16:3e:8e:cb:43</span>
<span class="go"> inet addr:10.0.3.154 Bcast:10.0.3.255 Mask:255.255.255.0</span>
<span class="go"> inet6 addr: fe80::216:3eff:fe8e:cb43/64 Scope:Link</span>
<span class="go"> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1</span>
<span class="go"> RX packets:23177 errors:0 dropped:0 overruns:0 frame:0</span>
<span class="go"> TX packets:20859 errors:0 dropped:0 overruns:0 carrier:0</span>
<span class="go"> collisions:0 txqueuelen:1000</span>
<span class="go"> RX bytes:147652946 (147.6 MB) TX bytes:1455613 (1.4 MB)</span>
</pre></div>
<p><code>10.0.3.154</code> is my IP address. Let's start etcd:</p>
<div class="highlight"><pre><span></span><span class="gp">root@swarm-manager:~#</span> <span class="nv">SWARM_MANAGER_IP</span><span class="o">=</span>10.0.3.154
<span class="gp">root@swarm-manager:~#</span> docker run -d --restart<span class="o">=</span>always --name<span class="o">=</span>etcd -p 4001:4001 -p 2380:2380 -p 2379:2379 <span class="se">\</span>
<span class="go"> quay.io/coreos/etcd -name etcd0 \</span>
<span class="go"> -advertise-client-urls http://$SWARM_MANAGER_IP:2379,http://$SWARM_MANAGER_IP:4001 \</span>
<span class="go"> -listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \</span>
<span class="go"> -initial-advertise-peer-urls http://$SWARM_MANAGER_IP:2380 \</span>
<span class="go"> -listen-peer-urls http://0.0.0.0:2380 \</span>
<span class="go"> -initial-cluster-token etcd-cluster-1 \</span>
<span class="go"> -initial-cluster etcd0=http://$SWARM_MANAGER_IP:2380 \</span>
<span class="go"> -initial-cluster-state new</span>
<span class="go">Unable to find image 'quay.io/coreos/etcd:latest' locally</span>
<span class="go">latest: Pulling from coreos/etcd</span>
<span class="go">...</span>
<span class="go">Status: Downloaded newer image for quay.io/coreos/etcd:latest</span>
<span class="go">e742278a97d2ad3f88658aa871903d20b4094e551969a03aa8332d3876fe5d0d</span>
<span class="gp">root@swarm-manager:~#</span> docker ps
<span class="go">CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES</span>
<span class="go">e742278a97d2 quay.io/coreos/etcd "/etcd -name etcd0 -a" 32 seconds ago Up 31 seconds 0.0.0.0:2379-2380->2379-2380/tcp, 0.0.0.0:4001->4001/tcp, 7001/tcp etcd</span>
</pre></div>
<p>Replace <code>10.0.3.154</code> with the IP address of your LXC container.</p>
<p>Note that I've started etcd with <code>--restart=always</code>, so that every time etcd is automatically started when the LXC container starts. With this option, etcd will restart even if you explicitly stop it. Drop <code>--restart=always</code> if that's not what you want.</p>
<p>Now we can start the Swarm manager:</p>
<div class="highlight"><pre><span></span><span class="gp">root@swarm-manager:~#</span> docker run -d --restart<span class="o">=</span>always --name<span class="o">=</span>swarm -p 3375:3375 <span class="se">\</span>
<span class="go"> swarm manage -H 0.0.0.0:3375 etcd://$SWARM_MANAGER_IP:2379</span>
<span class="go">Unable to find image 'swarm:latest' locally</span>
<span class="go">latest: Pulling from library/swarm</span>
<span class="go">...</span>
<span class="go">Status: Downloaded newer image for swarm:latest</span>
<span class="go">8080c93c544ff92cc2cf682ff0bbc82e0d2dfb01e1f98f202c3a0801d3427330</span>
<span class="gp">root@swarm-manager:~#</span> docker ps
<span class="go">CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES</span>
<span class="go">46b556e73e87 swarm "/swarm manage -H 0.0" 3 seconds ago Up 2 seconds 2375/tcp, 0.0.0.0:3375->3375/tcp swarm</span>
<span class="go">e742278a97d2 quay.io/coreos/etcd "/etcd -name etcd0 -a" 7 minutes ago Up 7 minutes 0.0.0.0:2379-2380->2379-2380/tcp, 0.0.0.0:4001->4001/tcp, 7001/tcp etcd</span>
</pre></div>
<p>Our Swarm manager is up and running. We can connect to it and issue a few commands:</p>
<div class="highlight"><pre><span></span><span class="gp">root@swarm-manager:~#</span> docker -H localhost:3375 info
<span class="go">Containers: 0</span>
<span class="go"> Running: 0</span>
<span class="go"> Paused: 0</span>
<span class="go"> Stopped: 0</span>
<span class="go">Images: 0</span>
<span class="go">Server Version: swarm/1.1.3</span>
<span class="go">Role: primary</span>
<span class="go">Strategy: spread</span>
<span class="go">Filters: health, port, dependency, affinity, constraint</span>
<span class="go">Nodes: 0</span>
<span class="go">Plugins:</span>
<span class="go"> Volume:</span>
<span class="go"> Network:</span>
<span class="go">Kernel Version: 4.4.0-15-generic</span>
<span class="go">Operating System: linux</span>
<span class="go">Architecture: amd64</span>
<span class="go">CPUs: 0</span>
<span class="go">Total Memory: 0 B</span>
<span class="go">Name: d39c33295ef3</span>
</pre></div>
<p>As you can see there are no nodes connected, as we would expect. Everything looks good.</p>
<h2 id="step-7">Step 7: Create the Swarm Agents</h2>
<p>Our Swarm manager can't do anything interesting without agent nodes. Creating new LXC containers for the agents is not much different from what we already did with the manager. To set up new agents in an automatic fashion I've created a script, so that you don't need to repeat the steps manually:</p>
<div class="highlight"><pre><span></span><span class="ch">#!/bin/bash</span>
<span class="nb">set</span> -eu
<span class="nv">SWARM_MANAGER_IP</span><span class="o">=</span>10.0.3.154
<span class="nv">DOWNLOAD_DIST</span><span class="o">=</span>ubuntu
<span class="nv">DOWNLOAD_RELEASE</span><span class="o">=</span>xenial
<span class="nv">DOWNLOAD_ARCH</span><span class="o">=</span>amd64
<span class="k">for</span> LXC_NAME in <span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span>
<span class="k">do</span>
<span class="nv">LXC_PATH</span><span class="o">=</span><span class="s2">"/var/lib/lxc/</span><span class="nv">$LXC_NAME</span><span class="s2">"</span>
<span class="nv">LXC_ROOTFS</span><span class="o">=</span><span class="s2">"</span><span class="nv">$LXC_PATH</span><span class="s2">/rootfs"</span>
<span class="c1"># Create the container.</span>
lxc-create -t download -n <span class="s2">"</span><span class="nv">$LXC_NAME</span><span class="s2">"</span> -- <span class="se">\</span>
-d <span class="s2">"</span><span class="nv">$DOWNLOAD_DIST</span><span class="s2">"</span> -r <span class="s2">"</span><span class="nv">$DOWNLOAD_RELEASE</span><span class="s2">"</span> -a <span class="s2">"</span><span class="nv">$DOWNLOAD_ARCH</span><span class="s2">"</span>
cat <span class="s"><<EOF >> "$LXC_PATH/config"</span>
<span class="s"># Allow running Docker inside LXC</span>
<span class="s">lxc.aa_profile = unconfined</span>
<span class="s">lxc.cap.drop =</span>
<span class="s">EOF</span>
<span class="c1"># Start the container and wait for networking to start.</span>
lxc-start -n <span class="s2">"</span><span class="nv">$LXC_NAME</span><span class="s2">"</span>
sleep 10s
<span class="c1"># Install Docker.</span>
lxc-attach -n <span class="s2">"</span><span class="nv">$LXC_NAME</span><span class="s2">"</span> -- apt-get update
lxc-attach -n <span class="s2">"</span><span class="nv">$LXC_NAME</span><span class="s2">"</span> -- apt-get install -y docker.io
<span class="c1"># Tell Docker to listen on all interfaces.</span>
sed -i -e <span class="s1">'s/^#DOCKER_OPTS=.*$/DOCKER_OPTS="-H 0.0.0.0:2375"/'</span> <span class="s2">"</span><span class="nv">$LXC_ROOTFS</span><span class="s2">/etc/default/docker"</span>
lxc-attach -n <span class="s2">"</span><span class="nv">$LXC_NAME</span><span class="s2">"</span> -- systemctl restart docker
<span class="c1"># Join the Swarm.</span>
<span class="nv">SWARM_AGENT_IP</span><span class="o">=</span><span class="s2">"</span><span class="k">$(</span>lxc-attach -n <span class="s2">"</span><span class="nv">$LXC_NAME</span><span class="s2">"</span> -- ifconfig eth0 <span class="p">|</span> grep -Po <span class="s1">'(?<=inet addr:)\S+'</span><span class="k">)</span><span class="s2">"</span>
lxc-attach -n <span class="s2">"</span><span class="nv">$LXC_NAME</span><span class="s2">"</span> -- docker run -d --restart<span class="o">=</span>always --name<span class="o">=</span>swarm <span class="se">\</span>
swarm join --addr<span class="o">=</span><span class="s2">"</span><span class="nv">$SWARM_AGENT_IP</span><span class="s2">:2375"</span> <span class="s2">"etcd://</span><span class="nv">$SWARM_MANAGER_IP</span><span class="s2">:2379"</span>
<span class="k">done</span>
</pre></div>
<p>Be sure to change the values for <code>SWARM_MANAGER_IP</code>, <code>DOWNLOAD_DIST</code>, <code>DOWNLOAD_RELEASE</code> and <code>DOWNLOAD_ARCH</code> to fit your needs.</p>
<p>Thanks to this script, creating 10 new agents is as simple as running one command:</p>
<div class="highlight"><pre><span></span><span class="gp">root@host:~#</span> ./swarm-agent-create swarm-agent-<span class="o">{</span>0..9<span class="o">}</span>
</pre></div>
<p>Here's an explaination of what the script does:</p>
<ul>
<li>
<p>It first sets up a new LXC container following steps 1-5 above, that is: create a new LXC container (with <code>lxc-create</code>), apply the LXC configuration (<code>lxc.aa_profile</code> and <code>lxc.cap.drop</code> rules), start the container and install Docker.</p>
<div class="highlight"><pre><span></span><span class="nv">LXC_PATH</span><span class="o">=</span><span class="s2">"/var/lib/lxc/</span><span class="nv">$LXC_NAME</span><span class="s2">"</span>
<span class="nv">LXC_ROOTFS</span><span class="o">=</span><span class="s2">"</span><span class="nv">$LXC_PATH</span><span class="s2">/rootfs"</span>
<span class="c1"># Create the container.</span>
lxc-create -t download -n <span class="s2">"</span><span class="nv">$LXC_NAME</span><span class="s2">"</span> -- <span class="se">\</span>
-d <span class="s2">"</span><span class="nv">$DOWNLOAD_DIST</span><span class="s2">"</span> -r <span class="s2">"</span><span class="nv">$DOWNLOAD_RELEASE</span><span class="s2">"</span> -a <span class="s2">"</span><span class="nv">$DOWNLOAD_ARCH</span><span class="s2">"</span>
cat <span class="s"><<EOF >> "$LXC_PATH/config"</span>
<span class="s"># Allow running Docker inside LXC</span>
<span class="s">lxc.aa_profile = unconfined</span>
<span class="s">lxc.cap.drop =</span>
<span class="s">EOF</span>
<span class="c1"># Start the container and wait for networking to start.</span>
lxc-start -n <span class="s2">"</span><span class="nv">$LXC_NAME</span><span class="s2">"</span>
sleep 10s
<span class="c1"># Install Docker.</span>
lxc-attach -n <span class="s2">"</span><span class="nv">$LXC_NAME</span><span class="s2">"</span> -- apt-get update
lxc-attach -n <span class="s2">"</span><span class="nv">$LXC_NAME</span><span class="s2">"</span> -- apt-get install -y docker.io
</pre></div>
</li>
<li>
<p>Our Swarm agents need to be reachable by the manager. For this reason we need to configure them so that they bind to a public interface. To do so, the script adds <code>DOCKER_OPTS="-H 0.0.0.0:2375"</code> and restarts Docker.</p>
<div class="highlight"><pre><span></span><span class="c1"># Tell Docker to listen on all interfaces.</span>
sed -i -e <span class="s1">'s/^#DOCKER_OPTS=.*$/DOCKER_OPTS="-H 0.0.0.0:2375"/'</span> <span class="s2">"</span><span class="nv">$LXC_ROOTFS</span><span class="s2">/etc/default/docker"</span>
lxc-attach -n <span class="s2">"</span><span class="nv">$LXC_NAME</span><span class="s2">"</span> -- systemctl restart docker
</pre></div>
</li>
<li>
<p>Lastly, the script checks the IP address for the LXC container and it launches Swarm.</p>
<div class="highlight"><pre><span></span><span class="c1"># Join the Swarm.</span>
<span class="nv">SWARM_AGENT_IP</span><span class="o">=</span><span class="s2">"</span><span class="k">$(</span>lxc-attach -n <span class="s2">"</span><span class="nv">$LXC_NAME</span><span class="s2">"</span> -- ifconfig eth0 <span class="p">|</span> grep -Po <span class="s1">'(?<=inet addr:)\S+'</span><span class="k">)</span><span class="s2">"</span>
lxc-attach -n <span class="s2">"</span><span class="nv">$LXC_NAME</span><span class="s2">"</span> -- docker run -d --restart<span class="o">=</span>always --name<span class="o">=</span>swarm <span class="se">\</span>
swarm join --addr<span class="o">=</span><span class="s2">"</span><span class="nv">$SWARM_AGENT_IP</span><span class="s2">:2375"</span> <span class="s2">"etcd://</span><span class="nv">$SWARM_MANAGER_IP</span><span class="s2">:2379"</span>
</pre></div>
</li>
</ul>
<h2 id="step-8">Step 8: Play with the Swarm</h2>
<p>Now, if we check <code>docker info</code> on the Swarm manager, we should see 10 healthy nodes:</p>
<div class="highlight"><pre><span></span><span class="gp">root@swarm-manager:~#</span> docker -H localhost:3375 info
<span class="go">Containers: 10</span>
<span class="go"> Running: 10</span>
<span class="go"> Paused: 0</span>
<span class="go"> Stopped: 0</span>
<span class="go">Images: 10</span>
<span class="go">Server Version: swarm/1.1.3</span>
<span class="go">Role: primary</span>
<span class="go">Strategy: spread</span>
<span class="go">Filters: health, port, dependency, affinity, constraint</span>
<span class="go">Nodes: 10</span>
<span class="go"> swarm-agent-0: 10.0.3.73:2375</span>
<span class="go"> └ Status: Healthy</span>
<span class="go"> └ Containers: 1</span>
<span class="go"> └ Reserved CPUs: 0 / 4</span>
<span class="go"> └ Reserved Memory: 0 B / 4.052 GiB</span>
<span class="go"> └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay</span>
<span class="go"> └ Error: (none)</span>
<span class="go"> └ UpdatedAt: 2016-04-13T15:32:35Z</span>
<span class="go"> swarm-agent-1: 10.0.3.97:2375</span>
<span class="go"> └ Status: Healthy</span>
<span class="go"> └ Containers: 1</span>
<span class="go"> └ Reserved CPUs: 0 / 4</span>
<span class="go"> └ Reserved Memory: 0 B / 4.052 GiB</span>
<span class="go"> └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay</span>
<span class="go"> └ Error: (none)</span>
<span class="go"> └ UpdatedAt: 2016-04-13T15:31:49Z</span>
<span class="go"> swarm-agent-2: 10.0.3.58:2375</span>
<span class="go"> └ Status: Healthy</span>
<span class="go"> └ Containers: 1</span>
<span class="go"> └ Reserved CPUs: 0 / 4</span>
<span class="go"> └ Reserved Memory: 0 B / 4.052 GiB</span>
<span class="go"> └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay</span>
<span class="go"> └ Error: (none)</span>
<span class="go"> └ UpdatedAt: 2016-04-13T15:31:54Z</span>
<span class="go"> swarm-agent-3: 10.0.3.195:2375</span>
<span class="go"> └ Status: Healthy</span>
<span class="go"> └ Containers: 1</span>
<span class="go"> └ Reserved CPUs: 0 / 4</span>
<span class="go"> └ Reserved Memory: 0 B / 4.052 GiB</span>
<span class="go"> └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay</span>
<span class="go"> └ Error: (none)</span>
<span class="go"> └ UpdatedAt: 2016-04-13T15:32:03Z</span>
<span class="go"> swarm-agent-4: 10.0.3.235:2375</span>
<span class="go"> └ Status: Healthy</span>
<span class="go"> └ Containers: 1</span>
<span class="go"> └ Reserved CPUs: 0 / 4</span>
<span class="go"> └ Reserved Memory: 0 B / 4.052 GiB</span>
<span class="go"> └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay</span>
<span class="go"> └ Error: (none)</span>
<span class="go"> └ UpdatedAt: 2016-04-13T15:32:22Z</span>
<span class="go"> swarm-agent-5: 10.0.3.174:2375</span>
<span class="go"> └ Status: Healthy</span>
<span class="go"> └ Containers: 1</span>
<span class="go"> └ Reserved CPUs: 0 / 4</span>
<span class="go"> └ Reserved Memory: 0 B / 4.052 GiB</span>
<span class="go"> └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay</span>
<span class="go"> └ Error: (none)</span>
<span class="go"> └ UpdatedAt: 2016-04-13T15:32:16Z</span>
<span class="go"> swarm-agent-6: 10.0.3.222:2375</span>
<span class="go"> └ Status: Healthy</span>
<span class="go"> └ Containers: 1</span>
<span class="go"> └ Reserved CPUs: 0 / 4</span>
<span class="go"> └ Reserved Memory: 0 B / 4.052 GiB</span>
<span class="go"> └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay</span>
<span class="go"> └ Error: (none)</span>
<span class="go"> └ UpdatedAt: 2016-04-13T15:32:21Z</span>
<span class="go"> swarm-agent-7: 10.0.3.140:2375</span>
<span class="go"> └ Status: Healthy</span>
<span class="go"> └ Containers: 1</span>
<span class="go"> └ Reserved CPUs: 0 / 4</span>
<span class="go"> └ Reserved Memory: 0 B / 4.052 GiB</span>
<span class="go"> └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay</span>
<span class="go"> └ Error: (none)</span>
<span class="go"> └ UpdatedAt: 2016-04-13T15:31:43Z</span>
<span class="go"> swarm-agent-8: 10.0.3.95:2375</span>
<span class="go"> └ Status: Healthy</span>
<span class="go"> └ Containers: 1</span>
<span class="go"> └ Reserved CPUs: 0 / 4</span>
<span class="go"> └ Reserved Memory: 0 B / 4.052 GiB</span>
<span class="go"> └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay</span>
<span class="go"> └ Error: (none)</span>
<span class="go"> └ UpdatedAt: 2016-04-13T15:32:17Z</span>
<span class="go"> swarm-agent-9: 10.0.3.125:2375</span>
<span class="go"> └ Status: Healthy</span>
<span class="go"> └ Containers: 1</span>
<span class="go"> └ Reserved CPUs: 0 / 4</span>
<span class="go"> └ Reserved Memory: 0 B / 4.052 GiB</span>
<span class="go"> └ Labels: executiondriver=native-0.2, kernelversion=4.4.0-15-generic, operatingsystem=Ubuntu 16.04, storagedriver=overlay</span>
<span class="go"> └ Error: (none)</span>
<span class="go"> └ UpdatedAt: 2016-04-13T15:32:30Z</span>
<span class="go">Plugins:</span>
<span class="go"> Volume:</span>
<span class="go"> Network:</span>
<span class="go">Kernel Version: 4.4.0-15-generic</span>
<span class="go">Operating System: linux</span>
<span class="go">Architecture: amd64</span>
<span class="go">CPUs: 40</span>
<span class="go">Total Memory: 40.52 GiB</span>
<span class="go">Name: d39c33295ef3</span>
</pre></div>
<p>Let's try running a command on the Swarm:</p>
<div class="highlight"><pre><span></span><span class="gp">root@swarm-manager:~#</span> docker -H localhost:3375 run -i --rm docker/whalesay cowsay <span class="s1">'It works!'</span>
<span class="go"> ___________</span>
<span class="go">< It works! ></span>
<span class="go"> -----------</span>
<span class="go"> \</span>
<span class="go"> \</span>
<span class="go"> \</span>
<span class="gp"> #</span><span class="c1"># .</span>
<span class="gp"> #</span><span class="c1"># ## ## ==</span>
<span class="gp"> #</span><span class="c1"># ## ## ## ===</span>
<span class="go"> /""""""""""""""""___/ ===</span>
<span class="go"> ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ / ===- ~~~</span>
<span class="go"> \______ o __/</span>
<span class="go"> \ \ __/</span>
<span class="go"> \____\______/</span>
</pre></div>
<h2>Conclusion</h2>
<p>We created a Swarm cluster consisting of one manager and 10 agents, and we kept memory and disk usage low thanks to LXC containers. We also succeeded in confining our Docker containers with AppArmor. Overall, this setup is probably not ideal for use in a production environment, but very useful for simulating clusters on your laptop.</p>
<p>I hope you enjoyed the tutorial. Feel free to leave a comment if you have questions!</p>andreacorbelliniWed, 13 Apr 2016 18:00:00 +0000tag:andrea.corbellini.name,2016-04-13:2016/04/13/docker-swarm-inside-lxc/dockerswarmlxccontainersdistributed-computingWhen bureaucracy hits the web: the cookie lawhttp://andrea.corbellini.name/2015/09/22/cookie-law/<p>For a few years now, every first of April I hoped to read between the news something on the lines of "the cookie law was a joke, sorry for that". You know, bureaucracy is slow, and it's reasonable to think that it takes time for them to reveal jokes. Yet, many firsts of April have passed, and no such announcement has been made. Many missed opportunities for Europe to show their love for progress and their competence with the web.</p>
<p>Being compliant with the EU cookie law is hard to do. It's not just a matter of showing a boring banner, it's a matter of defacing your web pages, writing long privacy policies that nobody will read, implementing ways to prevent certain cookies from being set.</p>
<p>The truth is: if you, as a webmaster, want to avoid wasting time and avoid headaches, you just have to avoid cookies. This is what I have done with most websites I maintain: <strong>I have removed all analytics, all social sharing buttons, all YouTube videos, all comments</strong>. This was a sad thing to do, but it was the only thing I could do: I maintain websites for free mainly as a favor for friends and no-profits I'm involved with — it's not my day job. Also, I do not want other people being sued because of mistakes from my side: cookies may be set in the most unexpected situations and disabling every feature that could potentially set them seems the safest choice.</p>
<p>The only exception is this blog. Here, I use cookies for Google Analytics, for social sharing buttons and for Disqus. I may live without Google Analytics (even though it gives useful insights, such as performance statistics and tips), but I can't really remove social buttons and Disqus: this is a blog and it wouldn't make any sense to remove social features and comments.</p>
<p>Being compliant with the EU cookie law has been on my todo list for a while, and I never found the time (nor the desire) to look into it. Today I did. I spent a few hours of my time to discover that <strong>Google Analytics is "OK"</strong> (in the sense that I do not have to display an ugly banner, nor have to ask for explicit permission from the user before setting the cookies) and to discover that <strong>social buttons and Disqus are "bad"</strong> (in the sense that I have to display a banner and ask for explicit consent from the user <em>before</em> setting the cookies). In the end, the only service that I could remove is the less problematic service.</p>
<p>As I said, I really do not want to remove social buttons, Disqus or whatever third-party content I'll want to display in the future. Therefore, in order to comply with the cookie law, I'm forced to write code, write a privacy policy, waste another bunch of hours of my time. But not today, as I've already had enough sense of sadness and impotence.</p>
<p>At least for now, I guess that the EU cookie law compliance will stay on my todo list for some more time. Probably if I worked on compliance instead of writing this rant, I could have already finished (but then what's the point of having a blog if you don't blog?)</p>
<p>The cookie law wants to be "on the side of the users," and it is based on noble principles: it wants users to be well-informed about how their data is used and by whom. However, as it is today, it's against both users and webmasters. <strong>Webmasters have to lose their time working on compliance, and users receive a degraded experience due to silly regulations.</strong></p>
<p>I'd like to do <a href="http://nocookielaw.com/">what Silktide did</a>: actively protesting against the law, but I wouldn't be so happy if I were sued. I'd like to read "the cookie law was a joke" in the news, but I'm starting to believe that it's not going to happen any time soon. It seems that accepting the sadness of the reality is the only option I'm left with.</p>
<p>End of rant, let's move on.</p>andreacorbelliniTue, 22 Sep 2015 18:35:00 +0000tag:andrea.corbellini.name,2015-09-22:2015/09/22/cookie-law/blogcookie-lawHello Pelican!http://andrea.corbellini.name/2015/08/02/hello-pelican/<p>Today I switched from WordPress.com to <a href="http://getpelican.com/">Pelican</a> and <a href="https://pages.github.com/">GitHub Pages</a>.</p>
<p>First off, let me say: almost all URLs that were previously working should still work. Only the feed URLs are broken, and this is not something I can fix. If you were following my blog via a feed reader, you should update to the new feed. Sorry for the inconvenience.</p>
<p>Having said that, I'd like to share with you the motivation that made me move and the details of the migration.</p>
<h2>The bad things of WordPress</h2>
<p>Now, this doesn't want to be a rant, so I'll be pretty concise. WordPress, the content management system, is an excellent platform for blogging. Easy to start with, easy to maintain, easy to use. WordPress.com makes things even easier. It also comes with many useful features, like comments and social networks integration.</p>
<p>The problem is: you can't customize things or add features without paying. Of course, this is business, and I do not want to discuss business decisions made at WordPress.com. Not only that, but I could live fine with most of the major limitations. Also, I was perfectly conscious of this kind of problems with WordPress.com when I started (after all, this is not <a href="http://andrea.corbellini.name/2015/02/15/new-blog-again/">the first blog I started</a>).</p>
<p>I actually become upset of WordPress.com when writing the series of blog posts about <a href="http://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/">Elliptic Curve Cryptography</a>. When writing these articles, I spent a lot of time employing workarounds to overcome WordPress.com limitations. Being used to Vim and its advanced features, I also found the editors (both the old and the new one) as a great obstacle for getting things done quickly. I do not want to enter the details of the problems I'm referring to, what matters is that, eventually, I gave up and I realized it was time to move on and seek for an alternative.</p>
<h2>Why Pelican</h2>
<p>Pelican is a static site generator. I've always thought that a static site had too many limitations for me. But while seeking an alternative to WordPress.com, I realized that many of those limitations were not affecting me in any way. Actually, with a static site I can do everything I want: edit my articles with Vim, render my equations with MathJax, customize my theme, version control my content, write scripts to post process my content.</p>
<p>The only bad thing about Pelican is that it does not come with any theme I truly like. I decided to make my own. I'm not entirely satisfied with it, as I feel it is too "anonymous", but I believe it is fully responsive, fast, readable and offers all the features I want. Perhaps I'll tweak it a little more to make it more "personal".</p>
<p>Setting up Pelican and migrating everything required some time, but at least this time I worked on true solutions, not on ugly hacks and workarounds like I did with WordPress. This implies that when writing articles I will be able to focus more on content than other details.</p>
<h2>Why not other static site generators</h2>
<p>In short: Pelican is written in Python and to my eyes it looked better than the other Python static site generators. I'll be honest and say that I did not truly evaluate all of the alternatives: I knew <a href="...">list.org</a> switched to Pelican and that made me try Pelican before all other solutions.</p>
<h2>Conclusion</h2>
<p>In the end I decided to leave WordPress for Pelican hosted on GitHub Pages. I'm pretty satisfied with the result I got. The nature of GitHub Pages prevents me from using HTTP redirects (and therefore the old feed links are broken), however in exchange I've got much more freedom, and this is what matters to me.</p>andreacorbelliniSun, 02 Aug 2015 18:55:00 +0000tag:andrea.corbellini.name,2015-08-02:2015/08/02/hello-pelican/blogpelicanwordpressLet's Encrypt is going to start soonhttp://andrea.corbellini.name/2015/06/16/lets-encrypt-is-going-to-start-soon/<p><a href="https://letsencrypt.org/">Let's Encrypt</a> (the free, automated and open certificate authority) has just <a href="https://letsencrypt.org/2015/06/16/lets-encrypt-launch-schedule.html">announced its launch schedule</a>. According to it, certificates will be released to the public starting from the <strong>week of September 14, 2015</strong>.</p>
<p>Their intermediate certificates, which <a href="https://letsencrypt.org/2015/06/04/isrg-ca-certs.html">were generated a few days ago</a>, will be signed by <a href="https://www.identrustssl.com/">IdenTrust</a>. What this means is that if you browse a web page secured by Let's Encrypt, you won't get any scary message, but the usual green lock.</p>
<figure>
<img src="http://andrea.corbellini.name/images/green-lock.png" alt="Green lock" width="612" height="188">
<figcaption><strong>You will see this...</strong></figcaption>
</figure>
<figure>
<img src="http://andrea.corbellini.name/images/red-lock.png" alt="Red lock" width="612" height="300">
<figcaption><strong>... not this.</strong></figcaption>
</figure>
<p>In case you are curious: the root certificate is a 4096-bit RSA key, the two intermediate certificates are both 2048-bit RSA keys. But they are also <a href="https://letsencrypt.org/certificates/">planning to generate ECDSA keys later this year</a> as well.</p>
<p>Technical aspects aside, this will be a great opportunity for the entire web. As I have <a href="http://andrea.corbellini.name/2015/04/12/lets-encrypt-the-road-towards-a-better-web/">already written</a>, I always dreamed of an encrypted web, and I truly believe that Let's Encrypt — or at least its approach to the problem — is the way to go.</p>
<p>So, will you get a Let's Encrypt certificate when the time comes? I will do. Not for this blog (I can't put a certificate without paying), but for other websites I manage.</p>
<p>Perhaps I'll also show a "Proudly secured by Let's Encrypt" badge.</p>andreacorbelliniTue, 16 Jun 2015 18:20:00 +0000tag:andrea.corbellini.name,2015-06-16:2015/06/16/lets-encrypt-is-going-to-start-soon/ecdsalet's encryptrsasecuritytlsElliptic Curve Cryptography: breaking security and a comparison with RSAhttp://andrea.corbellini.name/2015/06/08/elliptic-curve-cryptography-breaking-security-and-a-comparison-with-rsa/<p><strong>This post is the fourth and last in the series <a href="http://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/">ECC: a gentle introduction</a>.</strong></p>
<p>In the <a href="http://andrea.corbellini.name/2015/05/30/elliptic-curve-cryptography-ecdh-and-ecdsa/">last post</a> we have seen two algorithms, ECDH and ECDSA, and we have seen how the discrete logarithm problem for elliptic curves plays an important role for their security. But, if you remember, we said that <a href="http://andrea.corbellini.name/2015/05/23/elliptic-curve-cryptography-finite-fields-and-discrete-logarithms/#discrete-logarithm">we have no mathematical proofs</a> for the complexity of the discrete logarithm problem: we believe it to be "hard", but we can't be sure. In the first part of this post, we'll try to get an idea of how "hard" it is in practice with today's techniques.</p>
<p>Then, in the second part, we will try to answer the question: why do we need elliptic curve cryptography if RSA (and the other cryptosystems based on modular arithmetic) work well?</p>
<h1>Breaking the discrete logarithm problem</h1>
<p>We will now see the two most efficient algorithms for computing discrete logarithms on elliptic curve: the baby-step, giant-step algorithm, and Pollard's rho method.</p>
<p>Before starting, as a reminder, here is what the discrete logarithm problem is about: <strong>given two points $P$ and $Q$ find out the integer $x$ that satisfies the equation $Q = xP$</strong>. The points belong to a subgroup of an elliptic curve, which has a base point $G$ and which order is $n$.</p>
<h2>Baby-step, giant-step</h2>
<p>Before entering the details of the algorithm, a quick consideration: we can always write any integer $x$ as <strong>$x = am + b$</strong>, where $a$, $m$ and $b$ are three arbitrary integers. For example, we can write $10 = 2 \cdot 3 + 4$.</p>
<p>With this in mind, we can rewrite the equation for the discrete logarithm problem as follows:
$$\begin{array}{rl}
Q & = xP \\
Q & = (am + b) P \\
Q & = am P + b P \\
Q - am P & = b P
\end{array}$$</p>
<p>The baby-step giant-step is a "meet in the middle" algorithm. Contrary to the brute-force attack (which forces us to calculate all the points $xP$ for every $x$ until we find $Q$), we will calculate "few" values for $bP$ and "few" values for $Q - amP$ until we find a correspondence. The algorithm works as follows:</p>
<ol>
<li>Calculate $m = \left\lceil{\sqrt{n}}\right\rceil$</li>
<li>For every $b$ in ${0, \dots, m}$, calculate $bP$ and store the results in a hash table.</li>
<li>For every $a$ in ${0, \dots, m}$:<ol>
<li>calculate $amP$;</li>
<li>calculate $Q - amP$;</li>
<li>check the hash table and look if there exist a point $bP$ such that $Q - amP = bP$;</li>
<li>if such point exists, then we have found $x = am + b$.</li>
</ol>
</li>
</ol>
<p>As you can see, initially we calculate the points $bP$ with little (i.e. <strong>"baby"</strong>) increments for the coefficient $b$ ($1P$, $2P$, $3P$, ...). Then, in the second part of the algorithm, we calculate the points $amP$ with huge (i.e. <strong>"giant"</strong>) increments for $am$ ($1mP$, $2mP$, $3mP$, ..., where $m$ is a huge number).</p>
<figure>
<img src="http://andrea.corbellini.name/images/baby-step-giant-step.gif" alt="Baby-step, giant-step" width="310" height="346">
<figcaption>The baby-step, giant-step algorithm: initially we calculate few points via small steps and store them in a hash table. Then we perform the giant steps and compare the new points with the points in the hash table. Once a match is found, calculating the discrete logarithm is a matter of rearranging terms.</figcaption>
</figure>
<p>To understand why this algorithm works, forget for a moment that the points $bP$ are cached and take the equation $Q = amP + bP$. Consider what follows:</p>
<ul>
<li>When $a = 0$ we are checking whether $Q$ is equal to $bP$, where $b$ is one of the integers from 0 to $m$. This way, we are comparing $Q$ against all points from $0P$ to $mP$.</li>
<li>When $a = 1$ we are checking whether $Q$ is equal to $mP + bP$. We are comparing $Q$ against all points from $mP$ to $2mP$.</li>
<li>When $a = 2$ we are comparing $Q$ against all the points from $2mP$ to $3mP$.</li>
<li>...</li>
<li>When $a = m - 1$, we are comparing $Q$ against all points from $(m - 1)mP$ to $m^2 P = nP$.</li>
</ul>
<p>In conclusion, <strong>we are checking all points from $0P$ to $nP$</strong> (that is, all the possible points) <strong>performing at most $2m$ additions and multiplications</strong> (exactly $m$ for the baby steps, at most $m$ for the giant steps).</p>
<p>If you consider that a lookup on a hash table takes $O(1)$ time, it's easy to see that this algorithm has both <strong>time and space complexity $O(\sqrt{n})$</strong> (or <strong>$O(2^{k / 2})$</strong> if you consider the bit length). It's still exponential time, but much better than a brute-force attack.</p>
<h3>Baby-step giant-step in practice</h3>
<p>It may make sense to see what the complexity $O(\sqrt{n})$ means in practice. Let's take a standardized curve: <code>prime192v1</code> (aka <code>secp192r1</code>, <code>ansiX9p192r1</code>). This curve has order $n$ = 0xffffffff ffffffff ffffffff 99def836 146bc9b1 b4d22831. The square root of $n$ is approximately 7.922816251426434 · 10<sup>28</sup> (almost <strong>eighty octilions</strong>).</p>
<p>Now imagine storing $\sqrt{n}$ points in a hash table. Suppose that each point requires exactly 32 bytes: <strong>our hash table would need approximately 2.5 · 10<sup>30</sup> bytes of memory</strong>. <a href="http://www.csc.com/big_data/flxwd/83638-big_data_just_beginning_to_explode_interactive_infographic">Looking on the web</a>, it seems that the total world storage capacity is in the order of the zettabyte (10<sup>21</sup> bytes). This is almost <strong>ten orders of magnitude</strong> lower than the memory required by our hash table! Even if our points took 1 byte each, we would be still very far from being able to store all of them.</p>
<p>This is impressive, and is even more impressive if you consider that <code>prime192v1</code> is one of the curves with the lowest order. The order of <code>secp521r1</code> (another standard curve from NIST) is approximately 6.9 · 10<sup>156</sup>!</p>
<h3>Playing with baby-step giant-step</h3>
<p>I made <a href="https://github.com/andreacorbellini/ecc/blob/master/logs/babygiantstep.py">a Python script</a> that computes discrete logarithms using the baby-step giant-step algorithm. Obviously it only works with curves with small orders: don't try it with <code>secp521r1</code>, unless you want to receive a <code>MemoryError</code>.</p>
<p>It should produce an output like this:</p>
<div class="highlight"><pre><span></span><span class="n">Curve</span><span class="o">:</span> <span class="n">y</span><span class="o">^</span><span class="mi">2</span> <span class="o">=</span> <span class="o">(</span><span class="n">x</span><span class="o">^</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">1</span><span class="n">x</span> <span class="o">-</span> <span class="mi">1</span><span class="o">)</span> <span class="n">mod</span> <span class="mi">10177</span>
<span class="n">Curve</span> <span class="n">order</span><span class="o">:</span> <span class="mi">10331</span>
<span class="n">p</span> <span class="o">=</span> <span class="o">(</span><span class="mh">0x1</span><span class="o">,</span> <span class="mh">0x1</span><span class="o">)</span>
<span class="n">q</span> <span class="o">=</span> <span class="o">(</span><span class="mh">0x1a28</span><span class="o">,</span> <span class="mh">0x8fb</span><span class="o">)</span>
<span class="mi">325</span> <span class="o">*</span> <span class="n">p</span> <span class="o">=</span> <span class="n">q</span>
<span class="n">log</span><span class="o">(</span><span class="n">p</span><span class="o">,</span> <span class="n">q</span><span class="o">)</span> <span class="o">=</span> <span class="mi">325</span>
<span class="n">Took</span> <span class="mi">105</span> <span class="n">steps</span>
</pre></div>
<h2>Pollard's ρ</h2>
<p>Pollard's rho is another algorithm for computing discrete logarithms. It has the same asymptotic time complexity $O(\sqrt{n})$ of the baby-step giant-step algorithm, but its space complexity is just $O(1)$. If baby-step giant-step can't solve discrete logarithms because of the huge memory requirements, will Pollard's rho make it? Let's see...</p>
<p>First of all, another reminder of the discrete logarithm problem: given $P$ and $Q$ find $x$ such that $Q = xP$. With Pollard's rho, we will solve a sightly different problem: given $P$ and $Q$, <strong>find the integers $a$, $b$, $A$ and $B$ such that $aP + bQ = AP + BQ$</strong>.</p>
<p>Once the four integers are found, we can use the equation $Q = xP$ to find out $x$:
$$\begin{array}{rl}
aP + bQ & = AP + BQ \\
aP + bxP & = AP + BxP \\
(a + bx) P & = (A + Bx) P \\
(a - A) P & = (B - b) xP
\end{array}$$</p>
<p>Now we can get rid of $P$. But before doing so, remember that our subgroup is cyclic with order $n$, therefore the coefficients used in point multiplication are modulo $n$:
$$\begin{array}{rl}
a - A & \equiv (B - b) x \pmod{n} \\
x & = (a - A)(B - b)^{-1} \bmod{n}
\end{array}$$</p>
<p>The principle of operation of Pollard's rho is simple: we define a <strong>pseudo-random sequence of $(a, b)$ pairs</strong>. This sequence of pairs can be used to generate the sequence of points $aP + bQ$. Because both $P$ and $Q$ are elements of the same cyclic subgroup, <strong>the sequence of points $aP + bQ$ is cyclic too</strong>.</p>
<p>This means that if walk our pseudo-random sequence of $(a, b)$ pairs, sooner or later we will detect a cycle. That is: <strong>we will find a pair $(a, b)$ and another distinct pair $(A, B)$ such that $aP + bQ = AP + BQ$</strong>. Same points, distinct pairs: we can apply the equation above to find the logarithm.</p>
<p>The problem is: how do we detect the cycle in an efficient way?</p>
<h3>Tortoise and Hare</h3>
<p>In order to detect our cycle, we could try all the possible values for $a$ and $b$ using a <a href="http://en.wikipedia.org/wiki/Pairing_function">pairing function</a>, but given that there are $n^2$ such pairs, our algorithm would be $O(n^2)$, much worse than a brute-force attack.</p>
<p>But there exist a faster method: the <strong>tortoise and hare algorithm</strong> (also known as Floyd's cycle-finding algorithm). The picture below shows the principle of operation of the tortoise and hare method, which is at the core of Pollard's rho.</p>
<figure>
<img src="http://andrea.corbellini.name/images/tortoise-hare.gif" alt="Tortoise and Hare" width="650" height="101">
<figcaption>We have the curve $y^2 \equiv x^3 + 2x + 3 \pmod{97}$ and the points $P = (3, 6)$ and $Q = (80, 87)$. The points belong to a cyclic subgroup of order 5.<br>We walk a sequence of pairs at different speeds until we find two different pairs $(a, b)$ and $(A, B)$ that produce the same point. In this case, we have found the pairs $(3, 3)$ and $(2, 0)$ that allow us to calculate the logarithm as $x = (3 - 2)(0 - 3)^{-1} \bmod{5} = 3$. And in fact we correctly have $Q = 3P$.</figcaption>
</figure>
<p>Basically, we take our pseudo-random sequence of $(a, b)$ pairs, together with the corresponding sequence of $aP + bQ$ points. The sequence of $(a, b)$ pairs may or may not be cyclic, but the sequence of point is, because both $P$ and $Q$ were generated from the same base point, and from the properties of subgroups we know that we can't "escape" from the subgroup using just scalar multiplication and addition.</p>
<p>Now we take our two pets, the tortoise and the hare, and make them walk our sequence from left to right. <strong>The tortoise</strong> (the green spot in the picture) is slow and <strong>reads each point one by one</strong>; <strong>the hare</strong> (represented in red) is fast and <strong>skips a point at every step</strong>.</p>
<p>After some time both the tortoise and the hare will have found the same point, but with different coefficient pairs. Or, to express that with equations, the tortoise will have found a pair $(a, b)$ and the hare will have found a pair $(A, B)$ such that $aP + bQ = AP + BQ$.</p>
<p>If our random sequence is defined through an algorithm (as opposed to being stored statically), it's easy to see how this principle of operation requires just <strong>$O(\log n)$ space</strong>. Calculating the asymptotic time complexity is not that easy, but we can build a probabilistic proof that shows how <strong>the time complexity is $O(\sqrt{n})$</strong>, as we have already said.</p>
<h3>Playing with Pollard's ρ</h3>
<p>I've built <a href="https://github.com/andreacorbellini/ecc/blob/master/logs/pollardsrho.py">a Python script</a> that computes discrete logarithms using Pollard's rho. It is not the implementation of the original Pollard's rho, but a slight variation of it (I've used a more efficient method for generating the pseudo-random sequence of pairs). The script contains some useful comments, so read it if you are interested in the details of the algorithm.</p>
<p>This script, like the baby-step giant-step one, works on a tiny curve, and produces the same kind of output.</p>
<h3>Pollard's ρ in practice</h3>
<p>We said that baby-step giant-step can't be used in practice, because of the huge memory requirements. Pollard's rho, on the other hand, requires very few memory. So, how practical is it?</p>
<p><strong>Certicom launched a <a href="https://www.certicom.com/index.php/the-certicom-ecc-challenge">challenge</a> in 1998</strong> to compute discrete logarithms on elliptic curves with bit lengths ranging from 109 to 359. As of today, <strong>only 109-bit long curves</strong> have been successfully broken. The latest successful attempt was made in 2004. Quoting <a href="http://en.wikipedia.org/wiki/Discrete_logarithm_records">Wikipedia</a>:</p>
<blockquote>
<p>The prize was awarded on 8 April 2004 to a group of about 2600 people represented by Chris Monico. They also used a version of a parallelized Pollard rho method, taking 17 months of calendar time.</p>
</blockquote>
<p>As we have already said, <code>prime192v1</code> is one of the "smallest" elliptic curves. We also said that Pollard's rho has $O(\sqrt{n})$ time complexity. If we used the same technique as Chris Monico (the same algorithm, on the same hardware, with the same number of machines), how much would it take to compute a logarithm on <code>prime192v1</code>?
$$17\ \text{months}\ \times \frac{\sqrt{2^{192}}}{\sqrt{2^{109}}} \approx 5 \cdot 10^{13}\ \text{months}$$</p>
<p>This number is pretty self-explanatory and gives a clear idea of how hard it can be to break a discrete logarithm using such techniques.</p>
<h2>Pollard's ρ vs Baby-step giant-step</h2>
<p>I decided to put the <a href="https://github.com/andreacorbellini/ecc/blob/master/logs/babygiantstep.py">baby-step giant-step script</a> and the <a href="https://github.com/andreacorbellini/ecc/blob/master/logs/pollardsrho.py">Pollard's rho script</a> together with a <a href="https://github.com/andreacorbellini/ecc/blob/master/logs/bruteforce.py">brute-force script</a> into a <a href="https://github.com/andreacorbellini/ecc/blob/master/logs/comparelogs.py">fourth script</a> to compare their performances.</p>
<p>This fourth script computes all the logarithms for all the points on the "tiny" curve using different algorithms and reports how much time it did take:</p>
<div class="highlight"><pre><span></span>Curve order: 10331
Using bruteforce
Computing all logarithms: 100.00% done
Took 2m 31s (5193 steps on average)
Using babygiantstep
Computing all logarithms: 100.00% done
Took 0m 6s (152 steps on average)
Using pollardsrho
Computing all logarithms: 100.00% done
Took 0m 21s (138 steps on average)
</pre></div>
<p>As we could expect, the brute-force method is tremendously slow if compared to the others two. Baby-step giant-step is the faster, while Pollard's rho is more than three times slower than baby-step giant-step (although it uses far less memory and fewer number of steps on average).</p>
<p>Also look at the number of steps: brute force used 5193 steps on average for computing each logarithm. 5193 is very near to 10331 / 2 (half the curve order). Baby-step giant-steps and Pollard's rho used 152 steps and 138 steps respectively, two numbers very close to the square root of 10331 (101.64).</p>
<h2>Final consideration</h2>
<p>While discussing these algorithms, I have presented many numbers. It's important to be cautious when reading them: algorithms can be greatly optimized in many ways. Hardware can improve. Specialized hardware can be built.</p>
<p>The fact that an approach today seems impractical, does not imply that the approach can't be improved. It also does not imply that other, better approaches exist (remember, once again, that we have no proofs for the complexity of the discrete logarithm problem).</p>
<h1>Shor's algorithm</h1>
<p>If today's techniques are unsuitable, what about tomorrow's techniques? Well, things are a bit more worrisome: there exist a <strong><a href="https://en.wikipedia.org/wiki/Quantum_algorithm">quantum algorithm</a> capable of computing discrete logarithms in polynomial time: <a href="https://en.wikipedia.org/wiki/Shor%27s_algorithm">Shor's algorithm</a></strong>, which has time complexity $O((\log n)^3)$ and space complexity $O(\log n)$.</p>
<p>The efficiency of quantum algorithms stands in state superposition. On classical computers, the memory cells (i.e. the bits) may be either 1 or 0. There are no intermediate states between the two. On the other hand, the memory cells of quantum computers (known as qubits) instead are subject to the uncertainty principle: they do not have a truly defined state until they are measured. State superposition does not mean that each qubit may be 0 and 1 at the same time (as it is often said on the web), it means that when we measure the qubit, we have a certain probability of observing 0, and another probability of observing 1. Quantum algorithms work by modifying the probability of each qubit.</p>
<p>This peculiarity implies that with a limited number of qubits, we can deal with lots of possible inputs at the same time. So, for example, we can tell a quantum computer that there's a number $x$ uniformly distributed between 0 and $n - 1$. This requires just $\log n$ qubits instead of $n \log n$ bits. Then, we can tell the quantum computer to perform scalar multiplication $xP$. This will result in the superposition of states given by all the points from $0P$ to $(n - 1)P$ — that is, if we measured our qubits now, we would obtain one of the points from $0P$ to $(n - 1)P$ with probability $1 / n$.</p>
<p>This was to give you an idea of how powerful state superposition is. Shor's algorithm does not work exactly this way, it is actually more complicated. What makes it complicated is that, while we can "simulate" $n$ states at the same time, at some point we have to reduce these many states to just a few ones, because we want as output a single answer, not many (i.e. we want to know one single logarithm, not many probable wrong logarithms).</p>
<h1>ECC and RSA</h1>
<p>Now let's forget about quantum computing, which is still far from being a serious problem. The question I'll answer now is: <strong>why bothering with elliptic curves if RSA works well?</strong></p>
<p>A quick answer is given by NIST, which provides with <a href="https://www.nsa.gov/business/programs/elliptic_curve.shtml">a table that compares RSA and ECC key sizes</a> required to achieve the same level of security.</p>
<table class="table">
<thead>
<tr><th>RSA key size (bits)</th><th>ECC key size (bits)</th></tr>
</thead>
<tbody>
<tr><td>1024</td><td>160</td></tr>
<tr><td>2048</td><td>224</td></tr>
<tr><td>3072</td><td>256</td></tr>
<tr><td>7680</td><td>384</td></tr>
<tr><td>15360</td><td>521</td></tr>
</tbody>
</table>
<p>Note that there is no linear relationship between the RSA key sizes and the ECC key sizes (in other words: if we double the RSA key size, we don't have to double the ECC key size). This table tells us not only that ECC uses less memory, but also that key generation and signing are considerably faster.</p>
<p>But why is it so? The answer is that the faster algorithms for computing discrete logarithms over elliptic curves are Pollard's rho and baby-step giant-step, while in the case of RSA we have faster algorithms. One in particular is the <strong><a href="https://en.wikipedia.org/wiki/General_number_field_sieve">general number field sieve</a></strong>: an algorithm for integer factorization that can be used to compute discrete logarithms. The general number field sieve is the fastest algorithm for integer factorization to date.</p>
<p>All of this applies to other cryptosystems based on modular arithmetic as well, including DSA, D-H and ElGamal.</p>
<h1>Hidden threats of NSA</h1>
<p>An now the hard part. So far we have discussed algorithms and mathematics. Now it's time to discuss people, and things get more complicated.</p>
<p>If you remember, in the last post we said that certain classes of elliptic curves are weak, and to solve the problem of trusting curves from dubious sources we added a random seed to our domain parameters. And if we look at standard curves from NIST we can see that they are all verifiably random.</p>
<p>If we read the Wikipedia page for "<a href="http://en.wikipedia.org/wiki/Nothing_up_my_sleeve_number">nothing up my sleeve</a>", we can see that:</p>
<ul>
<li>The random numbers for MD5 come from the sine of integers.</li>
<li>The random numbers for Blowfish come from the first digits of $\pi$.</li>
<li>The random numbers for RC5 come from both $e$ and the golden ratio.</li>
</ul>
<p>These numbers are random because their digits are uniformly distributed. And the are also unsuspicious, because they have a justification.</p>
<p>Now the question is: <strong>where do the random seeds for NIST curves come from?</strong> The answer is, sadly: we don't know. Those seeds have no justification at all.</p>
<p><strong>Is it possible that NIST has discovered a "sufficiently large" class of weak elliptic curves and has tried many possible seeds until they found a vulnerable curve?</strong> I can't answer this question, but this is a legit and important question. We know that NIST has succeeded in standardizing at least a <a href="http://en.wikipedia.org/wiki/Dual_EC_DRBG">vulnerable random number generator</a> (a generator which, oddly enough, is based on elliptic curves). Perhaps they also succeeded in standardizing a set of weak elliptic curves. How do we know? We can't.</p>
<p>What's important to understand is that "verifiably random" and "secure" are not synonyms. And it doesn't matter how hard the logarithm problem is, or how long our keys are, if our algorithms are broken, there's nothing we can do.</p>
<p>With respect to this, RSA wins, as it does not require special domain parameters that can be tampered. RSA (as well as other modular arithmetic systems) may be a good alternative if we can't trust authorities and if we can't construct our own domain parameters. And in case you are asking: yes, TLS may use NIST curves. If you check <a href="https://google.com/">https://google.com</a>, you'll see that the connection is using ECDHE and ECDSA, with a certificate based on <code>prime256v1</code> (aka <code>secp256p1</code>).</p>
<h1>That's all!</h1>
<p>I hope you have enjoyed this series. My aim was to give you the basic knowledge, terminology and conventions to understand what elliptic curve cryptography today is. If I reached my aim, you should now be able to understand existing ECC-based cryptosystems and to expand your knowledge by reading "not so gentle" documentation. When writing this series, I could have skipped over many details and use a simpler terminology, but I felt that by doing so you would have not been able to understand what the web has to offer. I believe I have found a good compromise between simplicity and completeness.</p>
<p>Note though that by reading just this series, you are not able to implement secure ECC cryptosystems: security requires us to know many subtle but important details. Remember the <a href="http://andrea.corbellini.name/2015/05/30/elliptic-curve-cryptography-ecdh-and-ecdsa/#random-curves">requirements for Smart's attack</a> and <a href="http://andrea.corbellini.name/2015/05/30/elliptic-curve-cryptography-ecdh-and-ecdsa/#ecdsa-k">Sony's mistake</a> — these are just two examples that should teach you how easy is to produce insecure algorithms and how easy it is to exploit them.</p>
<p>So, if you are interested in diving deeper into the world of ECC, where to go from here?</p>
<p>First off, so far we have seen Weierstrass curves over prime fields, but you must know that there exist other kinds of curve and fields, in particular:</p>
<ul>
<li><strong>Koblitz curves over binary fields.</strong> Those are elliptic curves in the form $y^2 + xy = x^3 + ax^2 + 1$ (where $a$ is either 0 or 1) over finite fields containing $2^m$ elements (where $m$ is a prime). They allow particularly efficient point additions and scalar multiplications.
Examples of standardized Koblitz curves are <code>nistk163</code>, <code>nistk283</code> and <code>nistk571</code> (three curves defined over a field of 163, 283 and 571 bits).</li>
<li><strong>Binary curves.</strong> They are very similar to Koblitz curves and are in the form $x^2 + xy = x^3 + x^2 + b$ (where $b$ is an integer often generated from a random seed). As the name suggests, binary curves are restricted to binary fields too. Examples of standardized curves are <code>nistb163</code>, <code>nistb283</code> and <code>nistb571</code>.
It must be said that there are growing concerns that both Koblitz and Binary curves may not be as safe as prime curves.</li>
<li><strong>Edwards curves</strong>, in the form $x^2 + y^2 = 1 + d x^2 y^2$ (where $d$ is either 0 or 1). These are particularly interesting not only because point addition and scalar multiplication are fast, but also because the formula for point addition is always the same, in any case ($P \ne Q$, $P = Q$, $P = -Q$, ...). This feature leverages the possibility of side-channel attacks, where you measure the time used for scalar multiplication and try to guess the scalar coefficient based on the time it took to compute.
Edwards curves are relatively new (they were presented in 2007) and no authority such as Certicom or NIST have yet standardized any of them.</li>
<li><strong>Curve25519</strong> and <strong>Ed25519</strong> are two particular elliptic curves designed for ECDH and a variant of ECDSA respectively. Like Edwards curves, these two curves are fast and help preventing side-channel attacks. And like Edwards curves, these two curves have not been standardized yet and we can't find them in any popular software (except OpenSSH, that supports Ed25519 key pairs since 2014).</li>
</ul>
<p>If you are interested in the implementation details of ECC, then I suggest you read the sources of <strong>OpenSSL</strong> and <strong>GnuTLS</strong>.</p>
<p>Finally, if you are interested in the mathematical details, rather than the security and efficiency of the algorithms, you must know that:</p>
<ul>
<li>Elliptic curves are <strong>algebraic varieties with genus one</strong>.</li>
<li>Points at infinity are studied in <strong>projective geometry</strong> and can be represented using <strong>homogeneous coordinates</strong> (although most of the features of projective geometry are not needed for elliptic curve cryptography).</li>
</ul>
<p>And don't forget to study <strong>finite fields</strong> and <strong>field theory</strong>.</p>
<p>These are the keywords that you should look up if you're interested in the topics.</p>
<p>Now the series is officially concluded. Thank you for all your friendly comments, tweets and mails. Many have asked me if I'm going to write other series on other closely related topics. The answer is: maybe. I accept suggestions, but I can't promise anything.</p>
<p>Thanks for reading and see you next time!</p>andreacorbelliniMon, 08 Jun 2015 13:28:00 +0000tag:andrea.corbellini.name,2015-06-08:2015/06/08/elliptic-curve-cryptography-breaking-security-and-a-comparison-with-rsa/dhdsaeccecdhecdheecdsarsasecurityElliptic Curve Cryptography: ECDH and ECDSAhttp://andrea.corbellini.name/2015/05/30/elliptic-curve-cryptography-ecdh-and-ecdsa/<p><strong>This post is the third in the series <a href="http://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/">ECC: a gentle introduction</a>.</strong></p>
<p>In the previous posts, we have seen <a href="http://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/#elliptic-curves">what an elliptic curve is</a> and we have defined a <a href="http://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/#group-law">group law</a> in order to do some math with the points of elliptic curves. Then we have <a href="http://andrea.corbellini.name/2015/05/23/elliptic-curve-cryptography-finite-fields-and-discrete-logarithms/">restricted elliptic curves to finite fields of integers modulo a prime</a>. With this restriction, we have seen that the points of elliptic curves generate <a href="http://andrea.corbellini.name/2015/05/23/elliptic-curve-cryptography-finite-fields-and-discrete-logarithms/#scalar-multiplication">cyclic subgrups</a> and we have introduced the terms <a href="http://andrea.corbellini.name/2015/05/23/elliptic-curve-cryptography-finite-fields-and-discrete-logarithms/#base-point">base point</a>, <a href="http://andrea.corbellini.name/2015/05/23/elliptic-curve-cryptography-finite-fields-and-discrete-logarithms/#subgroup-order">order</a> and <a href="http://andrea.corbellini.name/2015/05/23/elliptic-curve-cryptography-finite-fields-and-discrete-logarithms/#cofactor">cofactor</a>.</p>
<p>Finally, we have seen that <a href="http://andrea.corbellini.name/2015/05/23/elliptic-curve-cryptography-finite-fields-and-discrete-logarithms/#scalar-multiplication">scalar multiplication in finite fields</a> is an "easy" problem, while the <a href="http://andrea.corbellini.name/2015/05/23/elliptic-curve-cryptography-finite-fields-and-discrete-logarithms/#discrete-logarithm">discrete logarithm problem</a> seems to be "hard". Now we'll see how all of this applies to cryptography.</p>
<h2>Domain parameters</h2>
<p>Our elliptic curve algorithms will work in a cyclic subgroup of an elliptic curve over a finite field. Therefore, our algorithms will need the following parameters:</p>
<ul>
<li>The <strong>prime $p$</strong> that specifies the size of the finite field.</li>
<li>The <strong>coefficients $a$ and $b$</strong> of the elliptic curve equation.</li>
<li>The <strong>base point $G$</strong> that generates our subgroup.</li>
<li>The <strong>order $n$</strong> of the subgrouop.</li>
<li>The <strong>cofactor $h$</strong> of the subgroup.</li>
</ul>
<p>In conclusion, the <strong>domain parameters</strong> for our algorithms are the <strong>sextuple $(p, a, b, G, n, h)$</strong>.</p>
<h3 id="random-curves">Random curves</h3>
<p>When I said that the discrete logarithm problem was "hard", I wasn't entirely right. There are <strong>some classes of elliptic curves that are particularly weak</strong> and allow the use of special purpose algorithms to solve the discrete logarithm problem efficiently. For example, all the curves that have $p = hn$ (that is, the order of the finite field is equal to the order of the elliptic curve) are vulnerable to <a href="http://interact.sagemath.org/edu/2010/414/projects/novotney.pdf">Smart's attack</a>, which can be used to solve discrete logarithms in polynomial time on a classical computer.</p>
<p>Now, suppose that I give you the domain parameters of a curve. There's the possibility that I've discovered a new class of weak curves that nobody knows, and probably I have built a "fast" algorithm for computing discrete logarithms on the curve I gave you. How can I convince you of the contrary, i.e. that I'm not aware of any vulnerability? <strong>How can I assure you that the curve is "safe" (in the sense that it can't be used for special purpose attacks by me)?</strong></p>
<p>In an attempt to solve this kind of problem, sometimes we have an additional domain parameter: the <strong>seed $S$</strong>. This is a random number used to generate the coefficients $a$ and $b$, or the base point $G$, or both. These parameters are generated by computing the hash of the seed $S$. Hashes, as we know, are "easy" to compute, but "hard" to reverse.</p>
<figure>
<img src="http://andrea.corbellini.name/images/random-parameters-generation.png" alt="Random curve generation" width="500" height="74">
<figcaption>A simple sketch of how a random curve is generated from a seed: the hash of a random number is used to calculate different parameters of the curve.</figcaption>
</figure>
<figure>
<img src="http://andrea.corbellini.name/images/seed-inversion.png" alt="Building a seed from a hash" width="359" height="76">
<figcaption>If we wanted to cheat and try to construct a seed from the domain parameters, we would have to solve a "hard" problem: hash inversion.</figcaption>
</figure>
<p>A curve generated through a seed is said to be <strong>verifiably random</strong>. The principle of using hashes to generate parameters is known as "<a href="http://en.wikipedia.org/wiki/Nothing_up_my_sleeve_number">nothing up my sleeve</a>", and is commonly used in cryptography.</p>
<p>This trick should give some sort of assurance that <strong>the curve has not been specially crafted to expose vulnerabilities known to the author</strong>. In fact, if I give you a curve together with a seed, it means I was not free to arbitrarily choose the parameters $a$ and $b$, and you should be relatively sure that the curve cannot be used for special purpose attacks by me. The reason why I say "relatively" will be explained in the next post.</p>
<p>A standardized algorithm for generating and checking random curves is described in ANSI X9.62 and is based on <a href="https://en.wikipedia.org/wiki/SHA-1">SHA-1</a>. If you are curious, you can read the algorithms for generating verifiable random curves on <a href="http://www.secg.org/sec1-v2.pdf">a specification by SECG</a> (look for "Verifiably Random Curves and Base Point Generators").</p>
<p>I've created a <strong><a href="https://github.com/andreacorbellini/ecc/blob/master/scripts/verifyrandom.py">tiny Python script</a> that verifies all the random curves currently <a href="https://github.com/openssl/openssl/blob/81fc390/crypto/ec/ec_curve.c">shipped with OpenSSL</a></strong>. I strongly recommend you to check it out!</p>
<h2>Elliptic Curve Cryptography</h2>
<p>It took us a long time, but finally here we are! Therefore, pure and simple:</p>
<ol>
<li>The <strong>private key</strong> is a random integer $d$ chosen from $\{1, \dots, n - 1\}$ (where $n$ is the order of the subgroup).</li>
<li>The <strong>public key</strong> is the point $H = dG$ (where $G$ is the base point of the subgroup).</li>
</ol>
<p>You see? If we know $d$ and $G$ (along with the other domain parameters), finding $H$ is "easy". But if we know $H$ and $G$, <strong>finding the private key $d$ is "hard", because it requires us to solve the discrete logarithm problem</strong>.</p>
<p>Now we are going to describe two public-key algorithms based on that: ECDH (Elliptic curve Diffie-Hellman), which is used for encryption, and ECDSA (Elliptic Curve Digital Signature Algorithm), used for digital signing.</p>
<h3>Encryption with ECDH</h3>
<p>ECDH is a variant of the <a href="https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange">Diffie-Hellman algorithm</a> for elliptic curves. It is actually a <a href="https://en.wikipedia.org/wiki/Key-agreement_protocol">key-agreement protocol</a>, more than an encryption algorithm. This basically means that ECDH defines (to some extent) how keys should be generated and exchanged between parties. How to actually encrypt data using such keys is up to us.</p>
<p>The problem it solves is the following: two parties (the usual <a href="http://en.wikipedia.org/wiki/Alice_and_Bob">Alice and Bob</a>) want to exchange information securely, so that a third party (the <a href="http://en.wikipedia.org/wiki/Man-in-the-middle_attack">Man In the Middle</a>) may intercept them, but may not decode them. This is one of the principles behind TLS, just to give you an example.</p>
<p>Here's how it works:</p>
<ol>
<li>
<p>First, <strong>Alice and Bob generate their own private and public keys</strong>. We have the private key $d_A$ and the public key $H_A = d_AG$ for Alice, and the keys $d_B$ and $H_B = d_BG$ for Bob. Note that both Alice and Bob are using the same domain parameters: the same base point $G$ on the same elliptic curve on the same finite field.</p>
</li>
<li>
<p><strong>Alice and Bob exchange their public keys $H_A$ and $H_B$ over an insecure channel</strong>. The Man In the Middle would intercept $H_A$ and $H_B$, but won't be able to find out neither $d_A$ nor $d_B$ without solving the discrete logarithm problem.</p>
</li>
<li>
<p><strong>Alice calculates $S = d_A H_B$</strong> (using her own private key and Bob's public key), <strong>and Bob calculates $S = d_B H_A$</strong> (using his own private key and Alice's public key). Note that $S$ is the same for both Alice and Bob, in fact:
$$S = d_A H_B = d_A (d_B G) = d_B (d_A G) = d_B H_A$$</p>
</li>
</ol>
<p>The Man In the Middle, however, only knows $H_A$ and $H_B$ (together with the other domain parameters) and would not be able to find out the <strong>shared secret $S$</strong>. This is known as the Diffie-Hellman problem, which can be stated as follows:</p>
<blockquote>
<p>Given three points $P$, $aP$ and $bP$, what is the result of $abP$?</p>
</blockquote>
<p>Or, equivalently:</p>
<blockquote>
<p>Given three integers $k$, $k^x$ and $k^y$, what is the result of $k^{xy}$?</p>
</blockquote>
<p>(The latter form is used in the original Diffie-Hellman algorithm, based on modular arithmetic.)</p>
<figure>
<img src="http://andrea.corbellini.name/images/ecdh.png" alt="ECDH" width="468" height="196">
<figcaption>The Diffie-Hellman key exchange: Alice and Bob can "easily" calculate the shared secret, the Man in the Middle has to solve a "hard" problem.</figcaption>
</figure>
<p>The principle behind the Diffie-Hellman problem is also explained in a great <a href="https://www.youtube.com/watch?v=YEBfamv-_do#t=02m37s">YouTube video by Khan Academy</a>, which later explains the Diffie-Hellman algorithm applied to modular arithmetic (not to elliptic curves).</p>
<p>The Diffie-Hellman problem for elliptic curves is assumed to be a "hard" problem. It is believed to be as "hard" as the discrete logarithm problem, although no mathematical proofs are available. What we can tell for sure is that it can't be "harder", because solving the logarithm problem is a way of solving the Diffie-Hellman problem.</p>
<p><strong>Now that Alice and Bob have obtained the shared secret, they can exchange data with symmetric encryption.</strong></p>
<p>For example, they can use the $x$ coordinate of $S$ as the key to encrypt messages using secure ciphers like <a href="https://en.wikipedia.org/wiki/Advanced_Encryption_Standard">AES</a> or <a href="https://en.wikipedia.org/wiki/Triple_DES">3DES</a>. This is more or less what TLS does, the difference is that TLS concatenates the $x$ coordinate with other numbers relative to the connection and then computes a hash of the resulting byte string.</p>
<h4>Playing with ECDH</h4>
<p>I've created <strong><a href="https://github.com/andreacorbellini/ecc/blob/master/scripts/ecdhe.py">another Python script</a> for computing public/private keys and shared secrets over an elliptic curve</strong>.</p>
<p>Unlike all the examples we have seen till now, this script makes use of a standardized curve, rather than a simple curve on a small field. The curve I've chosen is <code>secp256k1</code>, from <a href="http://www.secg.org/">SECG</a> (the "Standards for Efficient Cryptography Group", founded by <a href="https://www.certicom.com/">Certicom</a>). <a href="https://en.bitcoin.it/wiki/Secp256k1">This same curve is also used by Bitcoin</a> for digital signatures. Here are the domain parameters:</p>
<ul>
<li>$p$ = 0xffffffff ffffffff ffffffff ffffffff ffffffff ffffffff fffffffe fffffc2f</li>
<li>$a$ = 0</li>
<li>$b$ = 7</li>
<li>$x_G$ = 0x79be667e f9dcbbac 55a06295 ce870b07 029bfcdb 2dce28d9 59f2815b 16f81798</li>
<li>$y_G$ = 0x483ada77 26a3c465 5da4fbfc 0e1108a8 fd17b448 a6855419 9c47d08f fb10d4b8</li>
<li>$n$ = 0xffffffff ffffffff ffffffff fffffffe baaedce6 af48a03b bfd25e8c d0364141</li>
<li>$h$ = 1</li>
</ul>
<p>(These numbers were taken from <a href="https://github.com/openssl/openssl/blob/81fc390/crypto/ec/ec_curve.c#L766">OpenSSL source code</a>.)</p>
<p>Of course, you are free to modify the script to use other curves and domain parameters, just be sure to use prime fields and curves Weierstrass normal form, otherwise the script won't work.</p>
<p>The script is really simple and includes some of the algorithms we have described so far: point addition, double and add, ECDH. I recommend you to read and run it. It will produce an output like this:</p>
<div class="highlight"><pre><span></span><span class="n">Curve</span><span class="o">:</span> <span class="n">secp256k1</span>
<span class="n">Alice</span><span class="s1">'s private key: 0xe32868331fa8ef0138de0de85478346aec5e3912b6029ae71691c384237a3eeb</span>
<span class="s1">Alice'</span><span class="n">s</span> <span class="kd">public</span> <span class="n">key</span><span class="o">:</span> <span class="o">(</span><span class="mh">0x86b1aa5120f079594348c67647679e7ac4c365b2c01330db782b0ba611c1d677</span><span class="o">,</span> <span class="mh">0x5f4376a23eed633657a90f385ba21068ed7e29859a7fab09e953cc5b3e89beba</span><span class="o">)</span>
<span class="n">Bob</span><span class="s1">'s private key: 0xcef147652aa90162e1fff9cf07f2605ea05529ca215a04350a98ecc24aa34342</span>
<span class="s1">Bob'</span><span class="n">s</span> <span class="kd">public</span> <span class="n">key</span><span class="o">:</span> <span class="o">(</span><span class="mh">0x4034127647bb7fdab7f1526c7d10be8b28174e2bba35b06ffd8a26fc2c20134a</span><span class="o">,</span> <span class="mh">0x9e773199edc1ea792b150270ea3317689286c9fe239dd5b9c5cfd9e81b4b632</span><span class="o">)</span>
<span class="n">Shared</span> <span class="n">secret</span><span class="o">:</span> <span class="o">(</span><span class="mh">0x3e2ffbc3aa8a2836c1689e55cd169ba638b58a3a18803fcf7de153525b28c3cd</span><span class="o">,</span> <span class="mh">0x43ca148c92af58ebdb525542488a4fe6397809200fe8c61b41a105449507083</span><span class="o">)</span>
</pre></div>
<h4>Ephemeral ECDH</h4>
<p>Some of you may have heard of ECDHE instead of ECDH. The "E" in ECHDE stands for "Ephemeral" and refers to the fact that the <strong>keys exchanged are temporary</strong>, rather than static.</p>
<p>ECDHE is used, for example, in TLS, where both the client and the server generate their public-private key pair on the fly, when the connection is established. The keys are then signed with the TLS certificate (for authentication) and exchanged between the parties.</p>
<h3>Signing with ECDSA</h3>
<p>The scenario is the following: <strong>Alice wants to sign a message with her private key</strong> ($d_A$), and <strong>Bob wants to validate the signature using Alice's public key</strong> ($H_A$). Nobody but Alice should be able to produce valid signatures. Everyone should be able to check signatures.</p>
<p>Again, Alice and Bob are using the same domain parameters. The algorithm we are going to see is ECDSA, a variant of the <a href="https://en.wikipedia.org/wiki/Digital_Signature_Algorithm">Digital Signature Algorithm</a> applied to elliptic curves.</p>
<p>ECDSA works on the hash of the message, rather than on the message itself. The choice of the hash function is up to us, but it should be obvious that a <a href="http://en.wikipedia.org/wiki/Cryptographic_hash_function">cryptographically-secure hash function</a> should be chosen. <strong>The hash of the message ought to be truncated</strong> so that the bit length of the hash is the same as the bit length of $n$ (the order of the subgroup). <strong>The truncated hash is an integer and will be denoted as $z$.</strong></p>
<p>The algorithm performed by Alice to sign the message works as follows:</p>
<ol>
<li>Take a <strong>random integer $k$</strong> chosen from $\{1, \dots, n - 1\}$ (where $n$ is still the subgroup order).</li>
<li>Calculate the point <strong>$P = kG$</strong> (where $G$ is the base point of the subgroup).</li>
<li>Calculate the number <strong>$r = x_P \bmod{n}$</strong> (where $x_P$ is the $x$ coordinate of $P$).</li>
<li>If $r = 0$, then choose another $k$ and try again.</li>
<li>Calculate <strong>$s = k^{-1} (z + rd_A) \bmod{n}$</strong> (where $d_A$ is Alice's private key and $k^{-1}$ is the multiplicative inverse of $k$ modulo $n$).</li>
<li>If $s = 0$, then choose another $k$ and try again.</li>
</ol>
<p>The pair <strong>$(r, s)$ is the signature</strong>.</p>
<figure>
<img src="http://andrea.corbellini.name/images/ecdsa.png" alt="ECDSA" width="514" height="255">
<figcaption>Alice signs the hash $z$ using her private key $d_A$ and a random $k$. Bob verifies that the message has been correctly signed using Alice's public key $H_A$.</figcaption>
</figure>
<p>In plain words, this algorithm first generates a secret ($k$). This secret is hidden in $r$ thanks to point multiplication (that, as we know, is "easy" one way, and "hard" the other way round). $r$ is then bound to the message hash by the equation $s = k^{-1} (z + rd_A) \bmod{n}$.</p>
<p>Note that in order to calculate $s$, we have computed the inverse of $k$ modulo $n$. We have <a href="http://andrea.corbellini.name/2015/05/23/elliptic-curve-cryptography-finite-fields-and-discrete-logarithms/#p-must-be-prime">already said in the previous post</a> that this is guaranteed to work only if $n$ is a prime number. <strong>If a subgroup has a non-prime order, ECDSA can't be used.</strong> It's not by chance that almost all standardized curves have a prime order, and those that have a non-prime order are unsuitable for ECDSA.</p>
<h4>Verifying signatures</h4>
<p>In order to verify the signature we'll need Alice's public key $H_A$, the (truncated) hash $z$ and, obviously, the signature $(r, s)$.</p>
<ol>
<li>Calculate the integer $u_1 = s^{-1} z \bmod{n}$.</li>
<li>Calculate the integer $u_2 = s^{-1} r \bmod{n}$.</li>
<li>Calculate the point $P = u_1 G + u_2 H_A$.</li>
</ol>
<p>The signature is valid only if $r = x_P \bmod{n}$.</p>
<h3>Correctness of the algorithm</h3>
<p>The logic behind this algorithm may not seem obvious at a first sight, however if we put together all the equations we have written so far, things will be clearer.</p>
<p>Let's start from $P = u_1 G + u_2 H_A$. We know, from the definition of public key, that $H_A = d_A G$ (where $d_A$ is the private key). We can write:
$$\begin{array}{rl}
P & = u_1 G + u_2 H_A \\
& = u_1 G + u_2 d_A G \\
& = (u_1 + u_2 d_A) G
\end{array}$$</p>
<p>Using the definitions of $u_1$ and $u_2$, we can write:
$$\begin{array}{rl}
P & = (u_1 + u_2 d_A) G \\
& = (s^{-1} z + s^{-1} r d_A) G \\
& = s^{-1} (z + r d_A) G
\end{array}$$</p>
<p>Here we have omitted "$\text{mod}\ n$" both for brevity, and because the cyclic subgroup generated by $G$ has order $n$, hence "$\text{mod}\ n$" is superfluous.</p>
<p>Previously, we defined $s = k^{-1} (z + rd_A) \bmod{n}$. Multiplying each side of the equation by $k$ and dividing by $s$, we get: $k = s^{-1} (z + rd_A) \bmod{n}$. Substituting this result in our equation for $P$, we get:
$$\begin{array}{rl}
P & = s^{-1} (z + r d_A) G \\
& = k G
\end{array}$$</p>
<p><strong>This is the same equation for $P$ we had at step 2 of the signature generation algorithm!</strong> When generating signatures and when verifying them, we are calculating the same point $P$, just with a different set of equations. This is why the algorithm works.</p>
<h4>Playing with ECDSA</h4>
<p>Of course, I've created <strong><a href="https://github.com/andreacorbellini/ecc/blob/master/scripts/ecdsa.py">a Python script</a> for signature generation and verification</strong>. The code shares some parts with the ECDH script, in particular the domain parameters and the public/private key pair generation algorithm.</p>
<p>Here is the kind of output produced by the script:</p>
<div class="highlight"><pre><span></span><span class="n">Curve</span><span class="o">:</span> <span class="n">secp256k1</span>
<span class="n">Private</span> <span class="n">key</span><span class="o">:</span> <span class="mh">0x9f4c9eb899bd86e0e83ecca659602a15b2edb648e2ae4ee4a256b17bb29a1a1e</span>
<span class="n">Public</span> <span class="n">key</span><span class="o">:</span> <span class="o">(</span><span class="mh">0xabd9791437093d377ca25ea974ddc099eafa3d97c7250d2ea32af6a1556f92a</span><span class="o">,</span> <span class="mh">0x3fe60f6150b6d87ae8d64b78199b13f26977407c801f233288c97ddc4acca326</span><span class="o">)</span>
<span class="n">Message</span><span class="o">:</span> <span class="n">b</span><span class="s1">'Hello!'</span>
<span class="n">Signature</span><span class="o">:</span> <span class="o">(</span><span class="mh">0xddcb8b5abfe46902f2ac54ab9cd5cf205e359c03fdf66ead1130826f79d45478</span><span class="o">,</span> <span class="mh">0x551a5b2cd8465db43254df998ba577cb28e1ee73c5530430395e4fba96610151</span><span class="o">)</span>
<span class="n">Verification</span><span class="o">:</span> <span class="n">signature</span> <span class="n">matches</span>
<span class="n">Message</span><span class="o">:</span> <span class="n">b</span><span class="s1">'Hi there!'</span>
<span class="n">Verification</span><span class="o">:</span> <span class="n">invalid</span> <span class="n">signature</span>
<span class="n">Message</span><span class="o">:</span> <span class="n">b</span><span class="s1">'Hello!'</span>
<span class="n">Public</span> <span class="n">key</span><span class="o">:</span> <span class="o">(</span><span class="mh">0xc40572bb38dec72b82b3efb1efc8552588b8774149a32e546fb703021cf3b78a</span><span class="o">,</span> <span class="mh">0x8c6e5c5a9c1ea4cad778072fe955ed1c6a2a92f516f02cab57e0ba7d0765f8bb</span><span class="o">)</span>
<span class="n">Verification</span><span class="o">:</span> <span class="n">invalid</span> <span class="n">signature</span>
</pre></div>
<p>As you can see, the script first signs a message (the byte string "Hello!"), then verifies the signature. Afterwards, it tries to verify the same signature against another message ("Hi there!") and verification fails. Lastly, it tries to verify the signature against the correct message, but using another random public key and verification fails again.</p>
<h3 id="ecdsa-k">The importance of <em>k</em></h3>
<p>When generating ECDSA signatures, it is important to keep the secret $k$ really secret. If we used the same $k$ for all signatures, or if our random number generator were somewhat predictable, <strong>an attacker would be able to find out the private key</strong>!</p>
<p><a href="http://www.bbc.com/news/technology-12116051">This is the kind of mistake made by Sony a few years ago.</a> Basically, the PlayStation 3 game console can run only games signed by Sony with ECDSA. This way, if I wanted to create a new game for PlayStation 3, I couldn't distribute it to the public without a signature from Sony. The problem is: all the signatures made by Sony were generated using a static $k$.</p>
<p>(Apparently, Sony's random number generator was inspired by either <a href="http://xkcd.com/221/">XKCD</a> or <a href="http://dilbert.com/strip/2001-10-25">Dilbert</a>.)</p>
<p>In this situation, we could easily recover Sony's private key $d_S$ by buying just two signed games, extracting their hashes ($z_1$ and $z_2$) and their signatures ($(r_1, s_1)$ and $(r_2, s_2)$), together with the domain parameters. Here's how:</p>
<ul>
<li>First off, note that $r_1 = r_2$ (because $r = x_P \bmod{n}$ and $P = kG$ is the same for both signatures).</li>
<li>Consider that $(s_1 - s_2) \bmod{n} = k^{-1} (z_1 - z_2) \bmod{n}$ (this result comes directly from the equation for $s$).</li>
<li>Now multiply each side of the equation by $k$: $k (s_1 - s_2) \bmod{n} = (z_1 - z_2) \bmod{n}$.</li>
<li>Divide by $(s_1 - s_2)$ to get $k = (z_1 - z_2)(s_1 - s_2)^{-1} \bmod{n}$.</li>
</ul>
<p>The last equation lets us calculate $k$ using only two hashes and their corresponding signatures. Now we can extract the private key using the equation for $s$:
$$s = k^{-1}(z + rd_S) \bmod{n}\ \ \Rightarrow\ \ d_S = r^{-1} (sk - z) \bmod{n}$$</p>
<p>Similar techniques may be employed if $k$ is not static but predictable in some way.</p>
<h2>Have a great weekend</h2>
<p>I really hope you enjoyed what I've written here. As usual, don't hesitate to leave a comment or send me a poke if you need help with something.</p>
<p>Next week I'll publish the fourth and last article of this series. It'll be about techniques for solving discrete logarithms, some important problems of Elliptic Curve cryptography, and how ECC compares with RSA. Don't miss it!</p>
<p><strong><a href="http://andrea.corbellini.name/2015/06/08/elliptic-curve-cryptography-breaking-security-and-a-comparison-with-rsa/">Read the next post of the series »</a></strong></p>andreacorbelliniSat, 30 May 2015 19:23:00 +0000tag:andrea.corbellini.name,2015-05-30:2015/05/30/elliptic-curve-cryptography-ecdh-and-ecdsa/dhdsaeccecdhecdheecdsasecuritytlsElliptic Curve Cryptography: finite fields and discrete logarithmshttp://andrea.corbellini.name/2015/05/23/elliptic-curve-cryptography-finite-fields-and-discrete-logarithms/<p><strong>This post is the second in the series <a href="http://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/">ECC: a gentle introduction</a>.</strong></p>
<p>In the <a href="http://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/">previous post</a>, we have seen how elliptic curves over the real numbers can be used to define a group. Specifically, we have defined a rule for <a href="http://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/#group-law">point addition</a>: given three aligned points, their sum is zero ($P + Q + R = 0$). We have derived a <a href="http://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/#geometric-addition">geometric method</a> and an <a href="http://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/#algebraic-addition">algebraic method</a> for computing point additions.</p>
<p>We then introduced <a href="http://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/#scalar-multiplication">scalar multiplication</a> ($nP = P + P + \cdots + P$) and we found out an "easy" algorithm for computing scalar multiplication: <a href="http://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/#double-and-add">double and add</a>.</p>
<p><strong>Now we will restrict our elliptic curves to finite fields</strong>, rather than the set of real numbers, and see how things change.</p>
<h2>The field of integers modulo <em>p</em></h2>
<p>A finite field is, first of all, a set with a finite number of elements. An example of finite field is the set of integers modulo $p$, where $p$ is a prime number. It is generally denoted as $\mathbb{Z}/p$, $GF(p)$ or $\mathbb{F}_p$. We will use the latter notation.</p>
<p>In fields we have two binary operations: addition (+) and multiplication (·). Both are closed, associative and commutative. For both operations, there exist a unique identity element, and for every element there's a unique inverse element. Finally, multiplication is distributive over the addition: $x \cdot (y + z) = x \cdot y + x \cdot z$.</p>
<p>The set of <strong>integers modulo $p$ consists of all the integers from 0 to $p - 1$</strong>. Addition and multiplication work as in <a href="http://en.wikipedia.org/wiki/Modular_arithmetic">modular arithmetic</a> (also known as "clock arithmetic"). Here are a few examples of operations in $\mathbb{F}_{23}$:</p>
<ul>
<li>Addition: $(18 + 9) \bmod{23} = 4$</li>
<li>Subtraction: $(7 - 14) \bmod{23} = 16$</li>
<li>Multiplication: $4 \cdot 7 \bmod{23} = 5$</li>
<li>
<p>Additive inverse: $-5 \bmod{23} = 18$</p>
<p>Indeed: $(5 + (-5)) \bmod{23} = (5 + 18) \bmod{23} = 0$</p>
</li>
<li>
<p>Multiplicative inverse: $9^{-1} \bmod{23} = 18$</p>
<p>Indeed: $9 \cdot 9^{-1} \bmod{23} = 9 \cdot 18 \bmod{23} = 1$</p>
</li>
</ul>
<p>If these equations don't look familiar to you and you need a primer on modular arithmetic, check out <a href="https://www.khanacademy.org/computing/computer-science/cryptography/modarithmetic/a/what-is-modular-arithmetic">Khan Academy</a>.</p>
<p>As we already said, the integers modulo $p$ are a field, and therefore all the properties listed above hold. <span id="p-must-be-prime">Note that the requirement for $p$ to be prime is important!</span> The set of integers modulo 4 is not a field: 2 has no multiplicative inverse (i.e. the equation $2 \cdot x \bmod{4} = 1$ has no solutions).</p>
<h3>Division modulo <em>p</em></h3>
<p>We will soon define elliptic curves over $\mathbb{F}_p$, but before doing so we need a clear idea of what $x / y$ means in $\mathbb{F}_p$. Simply put: $x / y = x \cdot y^{-1}$, or, in plain words, $x$ over $y$ is equal to $x$ times the multiplicative inverse of $y$. This fact is not surprising, but gives us a basic method to perform division: <strong>find the multiplicative inverse of a number and then perform a single multiplication</strong>.</p>
<p>Computing the multiplicative inverse can be "easily" done with the <strong><a href="http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm">extended Euclidean algorithm</a></strong>, which is $O(\log p)$ (or $O(k)$ if we consider the bit length) in the worst case.</p>
<p>We won't enter the details of the extended Euclidean algorithm, as it is off-topic, however here's a working Python implementation:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">extended_euclidean_algorithm</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Returns a three-tuple (gcd, x, y) such that</span>
<span class="sd"> a * x + b * y == gcd, where gcd is the greatest</span>
<span class="sd"> common divisor of a and b.</span>
<span class="sd"> This function implements the extended Euclidean</span>
<span class="sd"> algorithm and runs in O(log b) in the worst case.</span>
<span class="sd"> """</span>
<span class="n">s</span><span class="p">,</span> <span class="n">old_s</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span>
<span class="n">t</span><span class="p">,</span> <span class="n">old_t</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span>
<span class="n">r</span><span class="p">,</span> <span class="n">old_r</span> <span class="o">=</span> <span class="n">b</span><span class="p">,</span> <span class="n">a</span>
<span class="k">while</span> <span class="n">r</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">quotient</span> <span class="o">=</span> <span class="n">old_r</span> <span class="o">//</span> <span class="n">r</span>
<span class="n">old_r</span><span class="p">,</span> <span class="n">r</span> <span class="o">=</span> <span class="n">r</span><span class="p">,</span> <span class="n">old_r</span> <span class="o">-</span> <span class="n">quotient</span> <span class="o">*</span> <span class="n">r</span>
<span class="n">old_s</span><span class="p">,</span> <span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="p">,</span> <span class="n">old_s</span> <span class="o">-</span> <span class="n">quotient</span> <span class="o">*</span> <span class="n">s</span>
<span class="n">old_t</span><span class="p">,</span> <span class="n">t</span> <span class="o">=</span> <span class="n">t</span><span class="p">,</span> <span class="n">old_t</span> <span class="o">-</span> <span class="n">quotient</span> <span class="o">*</span> <span class="n">t</span>
<span class="k">return</span> <span class="n">old_r</span><span class="p">,</span> <span class="n">old_s</span><span class="p">,</span> <span class="n">old_t</span>
<span class="k">def</span> <span class="nf">inverse_of</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">p</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Returns the multiplicative inverse of</span>
<span class="sd"> n modulo p.</span>
<span class="sd"> This function returns an integer m such that</span>
<span class="sd"> (n * m) % p == 1.</span>
<span class="sd"> """</span>
<span class="n">gcd</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">extended_euclidean_algorithm</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span>
<span class="k">assert</span> <span class="p">(</span><span class="n">n</span> <span class="o">*</span> <span class="n">x</span> <span class="o">+</span> <span class="n">p</span> <span class="o">*</span> <span class="n">y</span><span class="p">)</span> <span class="o">%</span> <span class="n">p</span> <span class="o">==</span> <span class="n">gcd</span>
<span class="k">if</span> <span class="n">gcd</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">:</span>
<span class="c1"># Either n is 0, or p is not a prime number.</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span>
<span class="s1">'{} has no multiplicative inverse '</span>
<span class="s1">'modulo {}'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">p</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">x</span> <span class="o">%</span> <span class="n">p</span>
</pre></div>
<h2>Elliptic curves in $\mathbb{F}_p$</h2>
<p>Now we have all the necessary elements to restrict elliptic curves over $\mathbb{F}_p$. The set of points, that in the <a href="http://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/#elliptic-curves">previous post</a> was:
$$\begin{array}{rcl}
\left\{(x, y) \in \mathbb{R}^2 \right. & \left. | \right. & \left. y^2 = x^3 + ax + b, \right. \\
& & \left. 4a^3 + 27b^2 \ne 0\right\}\ \cup\ \left\{0\right\}
\end{array}$$
now becomes:
$$\begin{array}{rcl}
\left\{(x, y) \in (\mathbb{F}_p)^2 \right. & \left. | \right. & \left. y^2 \equiv x^3 + ax + b \pmod{p}, \right. \\
& & \left. 4a^3 + 27b^2 \not\equiv 0 \pmod{p}\right\}\ \cup\ \left\{0\right\}
\end{array}$$</p>
<p>where 0 is still the point at infinity, and $a$ and $b$ are two integers in $\mathbb{F}_p$.</p>
<figure>
<img src="http://andrea.corbellini.name/images/elliptic-curves-mod-p.png" alt="Elliptic curves in Fp" width="608" height="608">
<figcaption>The curve $y^2 \equiv x^3 - 7x + 10 \pmod{p}$ with $p = 19, 97, 127, 487$. Note that, for every $x$, there are at most two points. Also note the symmetry about $y = p / 2$.</figcaption>
</figure>
<figure>
<img src="http://andrea.corbellini.name/images/singular-mod-p.png" alt="Singular curve in Fp" width="300" height="300">
<figcaption>The curve $y^2 \equiv x^3 \pmod{29}$ is singular and has a triple point in $(0, 0)$. It is not a valid elliptic curve.</figcaption>
</figure>
<p>What previously was a continuous curve is now a set of disjoint points in the $xy$-plane. But we can prove that, even if we have restricted our domain, <strong>elliptic curves in $\mathbb{F}_p$ still form an abelian group</strong>.</p>
<h2>Point addition</h2>
<p>Clearly, we need to change a bit our definition of addition in order to make it work in $\mathbb{F}_p$. With reals, we said that the sum of three aligned points was zero ($P + Q + R = 0$). We can keep this definition, but what does it mean for three points to be aligned in $\mathbb{F}_p$?</p>
<p>We can say that <strong>three points are aligned if there's a line that connects all of them</strong>. Now, of course, lines in $\mathbb{F}_p$ are not the same as lines in $\mathbb{R}$. We can say, informally, that a line in $\mathbb{F}_p$ is the set of points $(x, y)$ that satisfy the equation $ax + by + c \equiv 0 \pmod{p}$ (this is the standard line equation, with the addition of "$(\text{mod}\ p)$").</p>
<figure>
<img src="http://andrea.corbellini.name/images/point-addition-mod-p.png" alt="Point addition for elliptic curves in Z/p" width="523" height="528">
<figcaption>Point addition over the curve $y^2 \equiv x^3 - x + 3 \pmod{127}$, with $P = (16, 20)$ and $Q = (41, 120)$. Note how the line $y \equiv 4x + 83 \pmod{127}$ that connects the points "repeats" itself in the plane.</figcaption>
</figure>
<p>Given that we are in a group, point addition retains the properties we already know:</p>
<ul>
<li>$Q + 0 = 0 + Q = Q$ (from the definition of identity element).</li>
<li>Given a non-zero point $Q$, the inverse $-Q$ is the point having the same abscissa but opposite ordinate. Or, if you prefer, $-Q = (x_Q, -y_Q \bmod{p})$.
For example, if a curve in $\mathbb{F}_{29}$ has a point $Q = (2, 5)$, the inverse is $-Q = (2, -5 \bmod{29}) = (2, 24)$.</li>
<li>Also, $P + (-P) = 0$ (from the definition of inverse element).</li>
</ul>
<h2>Algebraic sum</h2>
<p><strong>The equations for calculating point additions are exactly the same as in the previous post</strong>, except for the fact that we need to add "$\text{mod}\ p$" at the end of every expression. Therefore, given $P = (x_P, y_P)$, $Q = (x_Q, y_Q)$ and $R = (x_R, y_R)$, we can calculate $P + Q = -R$ as follows:
$$\begin{array}{rcl}
x_R & = & (m^2 - x_P - x_Q) \bmod{p} \\
y_R & = & [y_P + m(x_R - x_P)] \bmod{p} \\
& = & [y_Q + m(x_R - x_Q)] \bmod{p}
\end{array}$$</p>
<p>If $P \ne Q$, the the slope $m$ assumes the form:
$$m = (y_P - y_Q)(x_P - x_Q)^{-1} \bmod{p}$$</p>
<p>Else, if $P = Q$, we have:
$$m = (3 x_P^2 + a)(2 y_P)^{-1} \bmod{p}$$</p>
<p>It's not a coincidence that the equations have not changed: in fact, these equations work in every field, finite or infinite (with the exception of $\mathbb{F}_2$ and $\mathbb{F}_3$, which are special cased). Now I feel I have to provide a justification for this fact. The problem is: proofs for the group law generally involve complex mathematical concepts. However, I found out a <a href="http://math.rice.edu/~friedl/papers/AAELLIPTIC.PDF">proof from Stefan Friedl</a> that uses only elementary concepts. Read it if you are interested in why these equations work in (almost) every field.</p>
<p>Back to us — we won't define a geometric method: in fact, there are a few problems with that. For example, in the previous post, we said that to compute $P + P$ we needed to take the tangent to the curve in $P$. But without continuity, the word "tangent" does not make any sense. We can workaround this and other problems, however a pure geometric method would just be too complicated and not practical at all.</p>
<p>Instead, you can play with the <strong><a href="https://cdn.rawgit.com/andreacorbellini/ecc/920b29a/interactive/modk-add.html">interactive tool</a> I've written for computing point additions</strong>.</p>
<h2>The order of an elliptic curve group</h2>
<p>We said that an elliptic curve defined over a finite field has a finite number of points. An important question that we need to answer is: <strong>how many points are there exactly?</strong></p>
<p>Firstly, let's say that the number of points in a group is called the <strong>order of the group</strong>.</p>
<p>Trying all the possible values for $x$ from 0 to $p - 1$ is not a feasible way to count the points, as it would require $O(p)$ steps, and this is "hard" if $p$ is a large prime.</p>
<p>Luckily, there's a faster algorithm for computing the order: <a href="https://en.wikipedia.org/wiki/Schoof%27s_algorithm">Schoof's algorithm</a>. I won't enter the details of the algorithm — what matters is that it runs in polynomial time, and this is what we need.</p>
<h2 id="scalar-multiplication">Scalar multiplication and cyclic subgroups</h2>
<p>As with reals, multiplication can be defined as:
$$n P = \underbrace{P + P + \cdots + P}_{n\ \text{times}}$$</p>
<p>And, again, we can use the <a href="http://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/#double-and-add">double and add algorithm</a> to perform multiplication in $O(\log n)$ steps (or $O(k)$, where $k$ is the number of bits of $n$). I've written an <strong><a href="https://cdn.rawgit.com/andreacorbellini/ecc/920b29a/interactive/modk-mul.html">interactive tool</a> for scalar multiplication</strong> too.</p>
<p>Multiplication over points for elliptic curves in $\mathbb{F}_p$ has an interesting property. Take the curve $y^2 \equiv x^3 + 2x + 3 \pmod{97}$ and the point $P = (3, 6)$. Now <a href="https://cdn.rawgit.com/andreacorbellini/ecc/920b29a/interactive/modk-mul.html">calculate</a> all the multiples of $P$:</p>
<figure>
<img src="http://andrea.corbellini.name/images/cyclic-subgroup.png" alt="Cyclic subgroup" width="322" height="255">
<figcaption>The multiples of $P = (3, 6)$ are just five distinct points ($0$, $P$, $2P$, $3P$, $4P$) and they are repeating cyclically. It's easy to spot the similarity between scalar multiplication on elliptic curves and addition in modular arithmetic.</figcaption>
</figure>
<ul>
<li>$0P = 0$</li>
<li>$1P = (3, 6)$</li>
<li>$2P = (80, 10)$</li>
<li>$3P = (80, 87)$</li>
<li>$4P = (3, 91)$</li>
<li>$5P = 0$</li>
<li>$6P = (3, 6)$</li>
<li>$7P = (80, 10)$</li>
<li>$8P = (80, 87)$</li>
<li>$9P = (3, 91)$</li>
<li>...</li>
</ul>
<p>Here we can immediately spot two things: firstly, the multiples of $P$ are just five: the other points of the elliptic curve never appear. Secondly, they are <strong>repeating cyclically</strong>. We can write:</p>
<ul>
<li>$5kP = 0$</li>
<li>$(5k + 1)P = P$</li>
<li>$(5k + 2)P = 2P$</li>
<li>$(5k + 3)P = 3P$</li>
<li>$(5k + 4)P = 4P$</li>
</ul>
<p>for every integer $k$. Note that these five equations can be "compressed" into a single one, thanks to the modulo operator: $kP = (k \bmod{5})P$.</p>
<p>Not only that, but we can immediately verify that <strong>these five points are closed under addition</strong>. Which means: however I add $0$, $P$, $2P$, $3P$ or $4P$, the result is always one of these five points. Again, the other points of the elliptic curve never appear in the results.</p>
<p>The same holds for every point, not just for $P = (3, 6)$. In fact, if we take a generic $P$:
$$nP + mP = \underbrace{P + \cdots + P}_{n\ \text{times}} + \underbrace{P + \cdots + P}_{m\ \text{times}} = (n + m)P$$</p>
<p>Which means: <strong>if we add two multiples of $P$, we obtain a multiple of $P$</strong> (i.e. multiples of $P$ are closed under addition). This is enough to <a href="https://en.wikipedia.org/wiki/Subgroup#Basic_properties_of_subgroups">prove</a> that <strong>the set of the multiples of $P$ is a cyclic subgroup</strong> of the group formed by the elliptic curve.</p>
<p>A "subgroup" is a group which is a subset of another group. A "cyclic subgroup" is a subgroup which elements are repeating cyclically, like we have shown in the previous example. <span id="base-point">The point $P$ is called <strong>generator</strong> or <strong>base point</strong> of the cyclic subgroup</span>.</p>
<p>Cyclic subgroups are the foundations of ECC and other cryptosystems. We will see why in the next post.</p>
<h3 id="subgroup-order">Subgroup order</h3>
<p>We can ask ourselves <strong>what the order of a subgroup generated by a point $P$ is</strong> (or, equivalently, what the order of $P$ is). To answer this question we can't use Schoof's algorithm, because that algorithm only works on whole elliptic curves, not on subgroups. Before approaching the problem, we need a few more bits:</p>
<ul>
<li>So far, we have the defined the order as the number of points of a group. This definition is still valid, but within a cyclic subgroup we can give a new, equivalent definition: <strong>the order of $P$ is the smallest positive integer $n$ such that $nP = 0$</strong>.
In fact, if you look at the previous example, our subgroup contained five points, and we had $5P = 0$.</li>
<li>The order of $P$ is linked to the order of the elliptic curve by <a href="https://en.wikipedia.org/wiki/Lagrange%27s_theorem_(group_theory)">Lagrange's theorem</a>, which states that <strong>the order of a subgroup is a divisor of the order of the parent group</strong>.
In other words, if an elliptic curve contains $N$ points and one of its subgroups contains $n$ points, then $n$ is a divisor of $N$.</li>
</ul>
<p>These two information together give us a way to find out the order of a subgroup with base point $P$:</p>
<ol>
<li>Calculate the elliptic curve's order $N$ using Schoof's algorithm.</li>
<li>Find out all the divisors of $N$.</li>
<li>For every divisor $n$ of $N$, compute $nP$.</li>
<li>The smallest $n$ such that $nP = 0$ is the order of the subgroup.</li>
</ol>
<p>For example, the curve $y^2 = x^3 - x + 3$ over the field $\mathbb{F}_{37}$ has order $N = 42$. Its subgroups may have order $n = 1$, $2$, $3$, $6$, $7$, $14$, $21$ or $42$. If <a href="https://cdn.rawgit.com/andreacorbellini/ecc/920b29a/interactive/modk-mul.html?a=-1&b=3&p=37&px=2&py=3">we try $P = (2, 3)$</a> we can see that $P \ne 0$, $2P \ne 0$, ..., $7P = 0$, hence the order of $P$ is $n = 7$.</p>
<p>Note that <strong>it's important to take the smallest divisor, not a random one</strong>. If we proceeded randomly, we could have taken $n = 14$, which is not the order of the subgroup, but one of its multiples.</p>
<p>Another example: the elliptic curve defined by the equation $y^2 = x^3 - x + 1$ over the field $\mathbb{F}_{29}$ has order $N = 37$, which is a prime. Its subgroups may only have order $n = 1$ or $37$. As you can easily guess, when $n = 1$, the subgroup contains only the point at infinity; when $n = N$, the subgroup contains all the points of the elliptic curve.</p>
<h3>Finding a base point</h3>
<p>For our ECC algorithms, we want subgroups with a high order. So in general we will choose an elliptic curve, calculate its order ($N$), choose a high divisor as the subgroup order ($n$) and eventually find a suitable base point. That is: we won't choose a base point and then calculate its order, but we'll do the opposite: we will first choose an order that looks good enough and then we will hunt for a suitable base point. How do we do that?</p>
<p><span id="cofactor">Firstly, we need to introduce one more term. Lagrange's theorem implies that the number <strong>$h = N / n$ is always an integer</strong> (because $n$ is a divisor of $N$). The number $h$ has a name: it's the <strong>cofactor of the subgroup</strong>.</span></p>
<p>Now consider that for every point of an elliptic curve we have $NP = 0$. This happens because $N$ is a multiple of any candidate $n$. Using the definition of cofator, we can write:
$$n(hP) = 0$$</p>
<p>Now suppose that $n$ is a prime number (for reason that will be explained in the next post, we prefer prime orders). This equation, written in this form, is telling us that the point $G = hP$ generates a subgroup of order $n$ (except when $G = hP = 0$, in which case the subgroup has order 1).</p>
<p>In the light of this, we can outline the following algorithm:</p>
<ol>
<li>Calculate the order $N$ of the elliptic curve.</li>
<li>Choose the order $n$ of the subgroup. For the algorithm to work, this number must be prime and must be a divisor of $N$.</li>
<li>Compute the cofactor $h = N / n$.</li>
<li>Choose a random point $P$ on the curve.</li>
<li>Compute $G = hP$.</li>
<li>If $G$ is 0, then go back to step 4. Otherwise we have found a generator of a subgroup with order $n$ and cofactor $h$.</li>
</ol>
<p>Note that this algorithm only works if $n$ is a prime. If $n$ wasn't a prime, then the order of $G$ could be one of the divisors of $n$.</p>
<h2 id="discrete-logarithm">Discrete logarithm</h2>
<p>As we did when working with continuous elliptic curves, we are now going to discuss the question: <strong>if we know $P$ and $Q$, what is $k$ such that $Q = kP$?</strong></p>
<p>This problem, which is known as the <strong>discrete logarithm problem</strong> for elliptic curves, is believed to be a "hard" problem, in that there is no known polynomial time algorithm that can run on a classical computer. There are, however, no mathematical proofs for this belief.</p>
<p>This problem is also analogous to the discrete logarithm problem used with other cryptosystems such as the Digital Signature Algorithm (DSA), the Diffie-Hellman key exchange (D-H) and the ElGamal algorithm — it's not a coincidence that they have the same name. The difference is that, with those algorithms, we use modulo exponentiation instead of scalar multiplication. Their discrete logarithm problem can be stated as follows: if we know $a$ and $b$, what's $k$ such that $b = a^k \bmod{p}$?</p>
<p>Both these problems are "discrete" because they involve finite sets (more precisely, cyclic subgroups). And they are "logarithms" because they are analogous to ordinary logarithms.</p>
<p>What makes ECC interesting is that, as of today, the discrete logarithm problem for elliptic curves seems to be "harder" if compared to other similar problems used in cryptography. This implies that we need fewer bits for the integer $k$ in order to achieve the same level of security as with other cryptosystems, as we will see in details in the fourth and last post of this series.</p>
<h2>More next week!</h2>
<p>Enough for today! I really hope you enjoyed this post. Leave a comment if you didn't.</p>
<p>Next week's post will be the third in this series and will be about ECC algorithms: key pair generation, ECDH and ECDSA. That will be one of the most interesting parts of this series. Don't miss it!</p>
<p><strong><a href="http://andrea.corbellini.name/2015/05/30/elliptic-curve-cryptography-ecdh-and-ecdsa/">Read the next post of the series »</a></strong></p>andreacorbelliniSat, 23 May 2015 14:08:00 +0000tag:andrea.corbellini.name,2015-05-23:2015/05/23/elliptic-curve-cryptography-finite-fields-and-discrete-logarithms/eccmathsecurityElliptic Curve Cryptography: a gentle introductionhttp://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/<p>Those of you who know what public-key cryptography is may have already heard of <strong>ECC</strong>, <strong>ECDH</strong> or <strong>ECDSA</strong>. The first is an acronym for Elliptic Curve Cryptography, the others are names for algorithms based on it.</p>
<p>Today, we can find elliptic curves cryptosystems in <a href="https://tools.ietf.org/html/rfc4492">TLS</a>, <a href="https://tools.ietf.org/html/rfc6637">PGP</a> and <a href="https://tools.ietf.org/html/rfc5656">SSH</a>, which are just three of the main technologies on which the modern web and IT world are based. Not to mention <a href="https://en.bitcoin.it/wiki/Secp256k1">Bitcoin</a> and other cryptocurrencies.</p>
<p>Before ECC become popular, almost all public-key algorithms were based on RSA, DSA, and DH, alternative cryptosystems based on modular arithmetic. RSA and friends are still very important today, and often are used alongside ECC. However, while the magic behind RSA and friends can be easily explained, is widely understood, and <a href="http://code.activestate.com/recipes/578838-rsa-a-simple-and-easy-to-read-implementation/">rough implementations can be written quite easily</a>, the foundations of ECC are still a mystery to most.</p>
<p>With a series of blog posts I'm going to give you a gentle introduction to the world of elliptic curve cryptography. My aim is not to provide a complete and detailed guide to ECC (the web is full of information on the subject), but to provide <strong>a simple overview of what ECC is and why it is considered secure</strong>, without losing time on long mathematical proofs or boring implementation details. I will also give <strong>helpful examples together with visual interactive tools and scripts to play with</strong>.</p>
<p>Specifically, here are the topics I'll touch:</p>
<ol>
<li><strong><a href="http://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/">Elliptic curves over real numbers and the group law</a></strong> (covered in this blog post)</li>
<li><strong><a href="http://andrea.corbellini.name/2015/05/23/elliptic-curve-cryptography-finite-fields-and-discrete-logarithms/">Elliptic curves over finite fields and the discrete logarithm problem</a></strong></li>
<li><strong><a href="http://andrea.corbellini.name/2015/05/30/elliptic-curve-cryptography-ecdh-and-ecdsa/">Key pair generation and two ECC algorithms: ECDH and ECDSA</a></strong></li>
<li><strong><a href="http://andrea.corbellini.name/2015/06/08/elliptic-curve-cryptography-breaking-security-and-a-comparison-with-rsa/">Algorithms for breaking ECC security, and a comparison with RSA</a></strong></li>
</ol>
<p>In order to understand what's written here, you'll need to know some basic stuff of set theory, geometry and modular arithmetic, and have familiarity with symmetric and asymmetric cryptography. Lastly, you need to have a clear idea of what an "easy" problem is, what a "hard" problem is, and their roles in cryptography.</p>
<p>Ready? Let's start!</p>
<h2 id="elliptic-curves">Elliptic Curves</h2>
<p>First of all: what is an elliptic curve? Wolfram MathWorld gives an excellent and complete <a href="http://mathworld.wolfram.com/EllipticCurve.html">definition</a>. But for our aims, an elliptic curve will simply be <strong>the set of points described by the equation</strong>:
$$y^2 = x^3 + ax + b$$</p>
<p>where $4a^3 + 27b^2 \ne 0$ (this is required to exclude <a href="https://en.wikipedia.org/wiki/Singularity_(mathematics)">singular curves</a>). The equation above is what is called <em>Weierstrass normal form</em> for elliptic curves.</p>
<figure>
<img src="http://andrea.corbellini.name/images/curves.png" alt="Different shapes for different elliptic curves" width="440" height="450">
<figcaption>Different shapes for different elliptic curves ($b = 1$, $a$ varying from 2 to -3).</figcaption>
</figure>
<figure>
<img src="http://andrea.corbellini.name/images/singularities.png" alt="Types of singularities" width="300" height="220">
<figcaption>Types of singularities: on the left, a curve with a cusp ($y^2 = x^3$). On the right, a curve with a self-intersection ($y^2 = x^3 - 3x + 2$). None of them is a valid elliptic curve.</figcaption>
</figure>
<p>Depending on the value of $a$ and $b$, elliptic curves may assume different shapes on the plane. As it can be easily seen and verified, elliptic curves are symmetric about the $x$-axis.</p>
<p>For our aims, <strong>we will also need a <a href="https://en.wikipedia.org/wiki/Point_at_infinity">point at infinity</a></strong> (also known as ideal point) to be part of our curve. From now on, we will denote our point at infinity with the symbol 0 (zero).</p>
<p>If we want to explicitly take into account the point at infinity, we can refine our definition of elliptic curve as follows:
$$\left\{ (x, y) \in \mathbb{R}^2\ |\ y^2 = x^3 + ax + b,\ 4 a^3 + 27 b^2 \ne 0 \right\}\ \cup\ \left\{ 0 \right\}$$</p>
<h2>Groups</h2>
<p>A group in mathematics is a set for which we have defined a binary operation that we call "addition" and indicate with the symbol +. In order for the set $\mathbb{G}$ to be a group, addition must defined so that it respects the following four properties:</p>
<ol>
<li><strong>closure:</strong> if $a$ and $b$ are members of $\mathbb{G}$, then $a + b$ is a member of $\mathbb{G}$;</li>
<li><strong>associativity:</strong> $(a + b) + c = a + (b + c)$;</li>
<li>there exists an <strong>identity element</strong> 0 such that $a + 0 = 0 + a = a$;</li>
<li>every element has an <strong>inverse</strong>, that is: for every $a$ there exists $b$ such that $a + b = 0$.</li>
</ol>
<p>If we add a fifth requirement:</p>
<ol start="5">
<li><strong>commutativity:</strong> $a + b = b + a$,</li>
</ol>
<p>then the group is called <em>abelian group</em>.</p>
<p>With the usual notion of addition, the set of integer numbers $\mathbb{Z}$ is a group (moreover, it's an abelian group). The set of natural numbers $\mathbb{N}$ however is not a group, as the fourth property can't be satisfied.</p>
<p>Groups are nice because, if we can demonstrate that those four properties hold, we get some other properties for free. For example: <strong>the identity element is unique</strong>; also the <strong>inverses are unique</strong>, that is: for every $a$ there exists only one $b$ such that $a + b = 0$ (and we can write $b$ as $-a$). Either directly or indirectly, these and other facts about groups will be very important for us later.</p>
<h2 id="group-law">The group law for elliptic curves</h2>
<p>We can define a group over elliptic curves. Specifically:</p>
<ul>
<li>the elements of the group are the points of an elliptic curve;</li>
<li>the <strong>identity element</strong> is the point at infinity 0;</li>
<li>the <strong>inverse</strong> of a point $P$ is the one symmetric about the $x$-axis;</li>
<li><strong>addition</strong> is given by the following rule: <strong>given three aligned, non-zero points $P$, $Q$ and $R$, their sum is $P + Q + R = 0$</strong>.</li>
</ul>
<figure>
<img src="http://andrea.corbellini.name/images/three-aligned-points.png" alt="Three aligned points" width="300" height="300">
<figcaption>The sum of three aligned point is 0.</figcaption>
</figure>
<p>Note that with the last rule, we only require three aligned points, and three points are aligned without respect to order. This means that, if $P$, $Q$ and $R$ are aligned, then $P + (Q + R) = Q + (P + R) = R + (P + Q) = \cdots = 0$. This way, we have intuitively proved that <strong>our + operator is both associative and commutative: we are in an abelian group</strong>.</p>
<p>So far, so great. But how do we actually compute the sum of two arbitrary points?</p>
<h2 id="geometric-addition">Geometric addition</h2>
<p>Thanks to the fact that we are in an abelian group, we can write $P + Q + R = 0$ as $P + Q = -R$. This equation, in this form, lets us derive a geometric method to compute the sum between two points $P$ and $Q$: <strong>if we draw a line passing through $P$ and $Q$, this line will intersect a third point on the curve, $R$</strong> (this is implied by the fact that $P$, $Q$ and $R$ are aligned). <strong>If we take the inverse of this point, $-R$, we have found the result of $P + Q$</strong>.</p>
<figure>
<img src="http://andrea.corbellini.name/images/point-addition.png" alt="Point addition" width="287" height="300">
<figcaption>Draw the line through $P$ and $Q$. The line intersects a third point $R$. The point symmetric to it, $-R$, is the result of $P + Q$.</figcaption>
</figure>
<p>This geometric method works but needs some refinement. Particularly, we need to answer a few questions:</p>
<ul>
<li><strong>What if $P = 0$ or $Q = 0$?</strong> Certainly, we can't draw any line (0 is not on the $xy$-plane). But given that we have defined 0 as the identity element, $P + 0 = P$ and $0 + Q = Q$, for any $P$ and for any $Q$.</li>
<li><strong>What if $P = -Q$?</strong> In this case, the line going through the two points is vertical, and does not intersect any third point. But if $P$ is the inverse of $Q$, then we have $P + Q = P + (-P) = 0$ from the definition of inverse.</li>
<li><strong>What if $P = Q$?</strong> In this case, there are infinitely many lines passing through the point. Here things start getting a bit more complicated. But consider a point $Q' \ne P$. What happens if we make $Q'$ approach $P$, getting closer and closer to it?
<figure>
<img src="http://andrea.corbellini.name/images/animation-point-doubling.gif" width="300" height="300" alt="The result of P + Q as Q is approaching P">
<figcaption>As the two points become closer together, the line passing through them becomes tangent to the curve.</figcaption>
</figure>
As $Q'$ tends towards $P$, the line passing through $P$ and $Q'$ becomes tangent to the curve. In the light of this we can say that $P + P = -R$, where $R$ is the point of intersection between the curve and the line tangent to the curve in $P$.</li>
<li><strong>What if $P \ne Q$, but there is no third point $R$?</strong> We are in a case very similar to the previous one. In fact, we are in the case where the line passing through $P$ and $Q$ is tangent to the curve.
<figure>
<img src="http://andrea.corbellini.name/images/animation-tangent-line.gif" alt="The result of P + Q as Q is approaching P" width="300" height="300">
<figcaption>If our line intersects just two points, then it means that it's tangent to the curve. It's easy to see how the result of the sum becomes symmetric to one of the two points.</figcaption>
</figure>
Let's assume that $P$ is the tangency point. In the previous case, we would have written $P + P = -Q$. That equation now becomes $P + Q = -P$. If, on the other hand, $Q$ were the tangency point, the correct equation would have been $P + Q = -Q$.</li>
</ul>
<p>The geometric method is now complete and covers all cases. With a pencil and a ruler we are able to perform addition involving every point of any elliptic curve. If you want to try, <strong>take a look at the <a href="https://cdn.rawgit.com/andreacorbellini/ecc/920b29a/interactive/reals-add.html">HTML5/JavaScript visual tool</a> I've built for computing sums on elliptic curves!</strong></p>
<h2 id="algebraic-addition">Algebraic addition</h2>
<p>If we want a computer to perform point addition, we need to turn the geometric method into an algebraic method. Transforming the rules described above into a set of equations may seem straightforward, but actually it can be really tedious because it requires solving cubic equations. For this reason, here I will report only the results.</p>
<p>First, let's get get rid of the most annoying corner cases. We already know that $P + (-P) = 0$, and we also know that $P + 0 = 0 + P = P$. So, in our equations, we will avoid these two cases and we will only consider <strong>two non-zero, non-symmetric points $P = (x_P, y_P)$ and $Q = (x_Q, y_Q)$</strong>.</p>
<p><strong>If $P$ and $Q$ are distinct</strong> ($x_P \ne x_Q$), the line through them has <strong>slope</strong>:
$$m = \frac{y_P - y_Q}{x_P - x_Q}$$</p>
<p>The <strong>intersection</strong> of this line with the elliptic curve is a third point $R = (x_R, y_R)$:
$$\begin{array}{rcl}
x_R & = & m^2 - x_P - x_Q \\
y_R & = & y_P + m(x_R - x_P)
\end{array}$$</p>
<p>or, equivalently:
$$y_R = y_Q + m(x_R - x_Q)$$</p>
<p>Hence $(x_P, y_P) + (x_Q, y_Q) = (x_R, -y_R)$ (pay attention at the signs and remember that $P + Q = -R$).</p>
<p>If we wanted to check whether this result is right, we would have had to check whether $R$ belongs to the curve and whether $P$, $Q$ and $R$ are aligned. Checking whether the points are aligned is trivial, checking that $R$ belongs to the curve is not, as we would need to solve a cubic equation, which is not fun at all.</p>
<p>Instead, let's play with an example: according to our <a href="https://cdn.rawgit.com/andreacorbellini/ecc/920b29a/interactive/reals-add.html">visual tool</a>, given $P = (1, 2)$ and $Q = (3, 4)$ over the curve $y^2 = x^3 - 7x + 10$, their sum is $P + Q = -R = (-3, 2)$. Let's see if our equations agree:
$$\begin{array}{rcl}
m & = & \frac{y_P - y_Q}{x_P - x_Q} = \frac{2 - 4}{1 - 3} = 1 \\
x_R & = & m^2 - x_P - x_Q = 1^2 - 1 - 3 = -3 \\
y_R & = & y_P + m(x_R - x_P) = 2 + 1 \cdot (-3 - 1) = -2 \\
& = & y_Q + m(x_R - x_Q) = 4 + 1 \cdot (-3 - 3) = -2
\end{array}$$</p>
<p>Yes, this is correct!</p>
<p>Note that these equations work even if <strong>one of $P$ or $Q$ is a tangency point</strong>. Let's try with $P = (-1, 4)$ and $Q = (1, 2)$.
$$\begin{array}{rcl}
m & = & \frac{y_P - y_Q}{x_P - x_Q} = \frac{4 - 2}{-1 - 1} = -1 \\
x_R & = & m^2 - x_P - x_Q = (-1)^2 - (-1) - 1 = 1 \\
y_R & = & y_P + m(x_R - x_P) = 4 + -1 \cdot (1 - (-1)) = 2
\end{array}$$</p>
<p>We get the result $P + Q = (1, -2)$, which is the same result given by the <a href="https://cdn.rawgit.com/andreacorbellini/ecc/920b29a/interactive/reals-add.html?px=-1&py=4&qx=1&qy=2">visual tool</a>.</p>
<p><strong>The case $P = Q$ needs to be treated a bit differently</strong>: the equations for $x_R$ and $y_R$ are the same, but given that $x_P = x_Q$, we must use a different equation for the <strong>slope</strong>:
$$m = \frac{3 x_P^2 + a}{2 y_P}$$</p>
<p>Note that, as we would expect, this expression for $m$ is the first derivative of:
$$y_P = \pm \sqrt{x_P^3 + ax_P + b}$$</p>
<p>To prove the validity of this result it is enough to check that $R$ belongs to the curve and that the line passing through $P$ and $R$ has only two intersections with the curve. But again, we don't prove this fact, and instead try with an example: $P = Q = (1, 2)$.
$$\begin{array}{rcl}
m & = & \frac{3x_P^2 + a}{2 y_P} = \frac{3 \cdot 1^2 - 7}{2 \cdot 2} = -1 \\
x_R & = & m^2 - x_P - x_Q = (-1)^2 - 1 - 1 = -1 \\
y_R & = & y_P + m(x_R - x_P) = 2 + (-1) \cdot (-1 - 1) = 4
\end{array}$$</p>
<p>Which gives us $P + P = -R = (-1, -4)$. <a href="https://cdn.rawgit.com/andreacorbellini/ecc/920b29a/interactive/reals-add.html?px=1&py=2&qx=1&qy=2">Correct</a>!</p>
<p>Although the procedure to derive them can be really tedious, our equations are pretty compact. This is thanks to Weierstrass normal form: without it, these equations could have been really long and complicated!</p>
<h2 id="scalar-multiplication">Scalar multiplication</h2>
<p>Other than addition, we can define another operation: <strong>scalar multiplication</strong>, that is:
$$nP = \underbrace{P + P + \cdots + P}_{n\ \text{times}}$$</p>
<p>where $n$ is a natural number. I've written a <strong><a href="https://cdn.rawgit.com/andreacorbellini/ecc/920b29a/interactive/reals-mul.html">visual tool</a> for scalar multiplication</strong> too, if you want to play with that.</p>
<p>Written in that form, it may seem that computing $nP$ requires $n$ additions. If $n$ has $k$ binary digits, then our algorithm would be $O(2^k)$, which is not really good. But there exist faster algorithms.</p>
<p>One of them is the <span id="double-and-add"><strong>double and add</strong></span> algorithm. Its principle of operation can be better explained with an example. Take $n = 151$. Its binary representation is $10010111_2$. This binary representation can be turned into a sum of powers of two:
$$\begin{array}{rcl}
151 & = & 1 \cdot 2^7 + 0 \cdot 2^6 + 0 \cdot 2^5 + 1 \cdot 2^4 + 0 \cdot 2^3 + 1 \cdot 2^2 + 1 \cdot 2^1 + 1 \cdot 2^0 \\
& = & 2^7 + 2^4 + 2^2 + 2^1 + 2^0
\end{array}$$</p>
<p>(We have taken each binary digit of $n$ and multiplied it by a power of two.)</p>
<p>In view of this, we can write:
$$151 \cdot P = 2^7 P + 2^4 P + 2^2 P + 2^1 P + 2^0 P$$</p>
<p>What the double and add algorithm tells us to do is:</p>
<ul>
<li>Take $P$.</li>
<li><em>Double</em> it, so that we get $2P$.</li>
<li><em>Add</em> $2P$ to $P$ (in order to get the result of $2^1P + 2^0P$).</li>
<li><em>Double</em> $2P$, so that we get $2^2P$.</li>
<li><em>Add</em> it to our result (so that we get $2^2P + 2^1P + 2^0P$).</li>
<li><em>Double</em> $2^2P$ to get $2^3P$.</li>
<li>Don't perform any addition involving $2^3P$.</li>
<li><em>Double</em> $2^3P$ to get $2^4P$.</li>
<li><em>Add</em> it to our result (so that we get $2^4P + 2^2P + 2^1P + 2^0P$).</li>
<li>...</li>
</ul>
<p>In the end, we can compute $151 \cdot P$ performing just seven doublings and four additions.</p>
<p>If this is not clear enough, here's a Python script that implements the algorithm:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">bits</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Generates the binary digits of n, starting</span>
<span class="sd"> from the least significant bit.</span>
<span class="sd"> bits(151) -> 1, 1, 1, 0, 1, 0, 0, 1</span>
<span class="sd"> """</span>
<span class="k">while</span> <span class="n">n</span><span class="p">:</span>
<span class="k">yield</span> <span class="n">n</span> <span class="o">&</span> <span class="mi">1</span>
<span class="n">n</span> <span class="o">>>=</span> <span class="mi">1</span>
<span class="k">def</span> <span class="nf">double_and_add</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Returns the result of n * x, computed using</span>
<span class="sd"> the double and add algorithm.</span>
<span class="sd"> """</span>
<span class="n">result</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">addend</span> <span class="o">=</span> <span class="n">x</span>
<span class="k">for</span> <span class="n">bit</span> <span class="ow">in</span> <span class="n">bits</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="k">if</span> <span class="n">bit</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">result</span> <span class="o">+=</span> <span class="n">addend</span>
<span class="n">addend</span> <span class="o">*=</span> <span class="mi">2</span>
<span class="k">return</span> <span class="n">result</span>
</pre></div>
<p>If doubling and adding are both $O(1)$ operations, then <strong>this algorithm is $O(\log n)$</strong> (or $O(k)$ if we consider the bit length), which is pretty good. Surely much better than the initial $O(n)$ algorithm!</p>
<h2>Logarithm</h2>
<p>Given $n$ and $P$, we now have at least one polynomial time algorithm for computing $Q = nP$. But what about the other way round? <strong>What if we know $Q$ and $P$ and need to find out $n$</strong>? This problem is known as the <strong>logarithm problem</strong>. We call it "logarithm" instead of "division" for conformity with other cryptosystems (where instead of multiplication we have exponentiation).</p>
<p>I don't know of any "easy" algorithm for the logarithm problem, however <a href="https://cdn.rawgit.com/andreacorbellini/ecc/920b29a/interactive/reals-mul.html?a=-3&b=1&px=0&py=1">playing with multiplication</a> it's easy to see some patterns. For example, take the curve $y^2 = x^3 - 3x + 1$ and the point $P = (0, 1)$. We can immediately verify that, if $n$ is odd, $nP$ is on the curve on the left semiplane; if $n$ is even, $nP$ is on the curve on the right semiplane. If we experimented more, we could probably find more patterns that eventually could lead us to write an algorithm for computing the logarithm on that curve efficiently.</p>
<p>But there's a variant of the logarithm problem: the <em>discrete</em> logarithm problem. As we will see in the next post, if we reduce the domain of our elliptic curves, <strong>scalar multiplication remains "easy", while the discrete logarithm becomes a "hard" problem</strong>. This duality is the key brick of elliptic curve cryptography.</p>
<h2>See you next week</h2>
<p>That's all for today, I hope you enjoyed this post! Next week we will discover <strong>finite fields</strong> and the <strong><em>discrete</em> logarithm problem</strong>, along with examples and tools to play with. If this stuff sounds interesting to you, then stay tuned!</p>
<p><strong><a href="http://andrea.corbellini.name/2015/05/23/elliptic-curve-cryptography-finite-fields-and-discrete-logarithms/">Read the next post of the series »</a></strong></p>andreacorbelliniSun, 17 May 2015 11:24:00 +0000tag:andrea.corbellini.name,2015-05-17:2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/bitcoindhdsaeccmathpgprsasecuritysshtlswebLet's Encrypt: the road towards a better web?http://andrea.corbellini.name/2015/04/12/lets-encrypt-the-road-towards-a-better-web/<p>I've always dreamed of a encrypted web, where HTTPS is the standard and plain HTTP is no more. A web where eavesdropping or manipulating information is not possible, or at least much harder than today.</p>
<p>I remember that I got excited when I first heard of <strong><a href="http://www.cacert.org/">CAcert</a>: "a community-driven Certificate Authority that issues certificates to the public at large for free"</strong>. Unfortunately, CAcert's root certificate never made it into the major web browsers and operating systems. Whatever the reasons, the result is that visiting a HTTPS website with a certificate released by CAcert produces nothing but a <a href="https://cacert.org/">scary warning with a call to leave the site</a>, making CAcert unsuitable for most.</p>
<p><a href="https://www.startssl.com/">StarCom</a>, on the other hand, has made it into the major browsers. But despite its certificates are released for free, it has never become much widespread. Also, StarCom <a href="https://news.ycombinator.com/item?id=7557764">has</a> <a href="https://www.techdirt.com/articles/20140409/11442426859/shameful-security-startcom-charges-people-to-revoke-ssl-certs-vulnerable-to-heartbleed.shtml">been</a> <a href="https://twitter.com/startssl/status/453631038883758080">heavily</a> <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=994033">criticized</a> for how the Hearbleed vulnerability was handled, and AFAIK this has led many customers away.</p>
<h2>Let's Encrypt</h2>
<p>Recently, I learned about <strong><a href="https://letsencrypt.org/">Let's Encrypt</a>: a "free, automated, and open" Certificate Authority</strong> arriving in mid-2015. There are many important facts that make Let's Encrypt different and better from all the other Certificate Authorities out there. I'll let you discover all of them. Probably, the most important fact is that Let's Encrypt has <strong><a href="https://letsencrypt.org/sponsors/">important sponsors</a>, including Mozilla</strong>. And this is what matters today, because it gives Let's Encrypt a chance to be included in at least one major browser.</p>
<figure>
<a href="https://letsencrypt.org/"><img src="http://andrea.corbellini.name/images/letsencrypt-logo-horizontal.png" alt="Let's Encrypt" width="519" height="124"></a>
<figcaption>Let's Encrypt logo.</figcaption>
</figure>
<p>Another interesting fact about Let's Encrypt is that its <strong>certificates are released in <a href="https://letsencrypt.org/howitworks/technology/">a way that is both secure and automated</a> at the same time</strong>. This gives the opportunity for other (potential) Certificate Authorities to adopt the same automated system.</p>
<p>If Let's Encrypt wins, then everyone will have an easy way to obtain a free HTTPS certificate for their website. The next big step would be making Let's Encrypt increase in adoption and the final step would be deprecating plain HTTP. There are however a few open questions:</p>
<ul>
<li>What will be the answer from Google, Apple, Microsoft and other major browser/operating systems makers?</li>
<li>What will be the reaction of VeriSign and Comodo? (That together hold <a href="http://w3techs.com/technologies/overview/ssl_certificate/all">more than 50%</a> of all the certificates currently used on the web.)</li>
<li>Will they declare war to Let's Encrypt or will they consolidate their efforts on customer services and Extended Validation?</li>
<li>Will the technology behind Let's Encrypt allow the creation of a new model for certificate management? Will we see web servers and providers with built-in support for it?</li>
</ul>
<p>I do not have an answer to these questions, time will tell. However I really hope my dream to become a reality soon. If you, like me, want Let's Encrypt to be a success, then please <strong>share and discuss</strong> about it. Perhaps, one day, we will find ourselves teaching juniors that HTTPS has not always been the standard... :)</p>andreacorbelliniSun, 12 Apr 2015 16:07:00 +0000tag:andrea.corbellini.name,2015-04-12:2015/04/12/lets-encrypt-the-road-towards-a-better-web/securitytlsweblet's encryptRunning Ubuntu Snappy inside Dockerhttp://andrea.corbellini.name/2015/03/25/running-ubuntu-snappy-inside-docker/<p>Many of you may have already heard of <a href="https://developer.ubuntu.com/en/snappy/">Ubuntu Core</a>. For those who haven't, it's a minimal Ubuntu version, running only a few essential services and ships with a new package manager (snappy) that provides <em>transactional</em> updates. Ubuntu Core provides a lightweight base operating system which is fast to deploy and easy to maintain up to date. It also uses a nice <a href="https://wiki.ubuntu.com/SecurityTeam/Specifications/SnappyConfinement">security model</a>.</p>
<p>All these characteristics make it particularly appealing for the cloud. And, in fact, people are starting considering it for building their (micro)services architectures. Some weeks ago, a user on Ask Ubuntu asked: <a href="http://askubuntu.com/questions/566736/can-i-run-snappy-ubuntu-core-as-a-guest-inside-docker/577248">Can I run Snappy Ubuntu Core as a guest inside Docker?</a> The problem is that Ubuntu Core does not ship with an official Docker image that we can pull, so we are forced to set it up manually. Here's how.</p>
<h2>Creating the Docker image</h2>
<h3>Step 1: get the latest Ubuntu Core</h3>
<p>As of writing, the latest Ubuntu Core image is alpha 3 and can be downloaded with:</p>
<div class="highlight"><pre><span></span>$ wget http://cdimage.ubuntu.com/ubuntu-core/releases/alpha-3/ubuntu-core-WEBDM-alpha-03_amd64-generic.img.xz
</pre></div>
<p>(If you browse to <a href="http://cdimage.ubuntu.com/ubuntu-core/releases/alpha-3/">cdimage.ubuntu.com</a>, you can also find the signed hashsums.)</p>
<p>The downloaded image is XZ-compressed and we need to extract it:</p>
<div class="highlight"><pre><span></span>$ unxz ubuntu-core-WEBDM-alpha-03_amd64-generic.img.xz
</pre></div>
<h3>Step 2: connect the image using qemu-nbd</h3>
<p>The file we have just downloaded and extracted is a filesystem dump. The previous version of the image (Alpha 2) was a QCOW2 image (the format used by QEMU). In order to access its contents, we have a few options. Here I'll show one that works with both filesystem dumps and QCOW2 images. The trick consists in using <code>qemu-nbd</code> (a tool from the <a href="https://apps.ubuntu.com/cat/applications/qemu-utils/">qemu-utils</a> package):</p>
<div class="highlight"><pre><span></span># qemu-nbd -rc /dev/nbd0 ubuntu-core-WEBDM-alpha-03_amd64-generic.img
</pre></div>
<p>This command will create a virtual device named <code>/dev/nbd0</code>, with virtual partitions named <code>/dev/nbd0p1</code>, <code>/dev/nbd0p2</code>, ... Use <code>fdisk -l /dev/nbd0</code> to get an idea of what partitions are inside the QCOW2 image.</p>
<h3>Step 3: mount the filesystem</h3>
<p>The partition we are interested in is <code>/dev/nbd0p3</code>, so we need to mount it:</p>
<div class="highlight"><pre><span></span># mkdir nbd0p3
# mount -r /dev/nbd0p3 nbd0p3
</pre></div>
<h3>Step 4: create a base Docker image</h3>
<p>As suggested on the <a href="https://docs.docker.com/articles/baseimages/">Docker documentation</a>, creating a base Docker image from a directory is pretty straightforward:</p>
<div class="highlight"><pre><span></span><span class="n">tar</span> <span class="o">-</span><span class="n">C</span> <span class="n">nbd0p3</span> <span class="o">-</span><span class="n">c</span> <span class="o">.</span> <span class="o">|</span> <span class="n">docker</span> <span class="kn">import</span> <span class="o">-</span> <span class="n">ubuntu</span><span class="o">-</span><span class="n">core</span> <span class="n">alpha</span><span class="o">-</span><span class="mi">3</span>
</pre></div>
<p>Our newly created image will now appear when running <code>docker images</code>:</p>
<div class="highlight"><pre><span></span># docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
ubuntu-core alpha-3 f6df3c0e2d74 5 seconds ago 543.5 MB
</pre></div>
<p>Let's verify if we did a good job:</p>
<div class="highlight"><pre><span></span># docker run ubuntu-core:alpha-3 snappy
Usage:snappy [-h] [-v]
{info,versions,search,update-versions,update,rollback,install,uninstall,tags,config,build,booted,chroot,framework,fake-version,nap}
...
</pre></div>
<p>Yes! We have successfully added Ubuntu Core to the available Docker images and we have run our first snappy container!</p>
<h2>Installing and running software</h2>
<p>Without wasting too many words, here's how to install and run the <code>xkcd-webserver</code> snappy package inside docker:</p>
<div class="highlight"><pre><span></span># docker run -p 8000:80 ubuntu-core:alpha-3 /bin/sh -c 'snappy install xkcd-webserver && cd /apps/xkcd-webserver/0.3.1 && ./bin/xkcd-webserver'
WARN: AppArmor not available when processing AppArmor hook
Failed to get D-Bus connection: Operation not permitted
Failed to get D-Bus connection: Operation not permitted
** (process:13): WARNING **: user.vala:637: Can not connect to logind
xkcd-webserver 21 kB [======================================] OK
WARNING: failed to connect to dbus: org.freedesktop.DBus.Error.FileNotFound: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
Part Tag Installed Available Fingerprint Active
xkcd-webserver edge 0.3.1 - 3a9152b8bff494 *
</pre></div>
<p>Now, if you visit http://localhost:8000/ you should see a random XKCD comic.</p>
<p>If you have payed attention, you may have noticed a few warnings about AppArmor, DBus and logind. The reason why you are seeing these warnings is pretty simple: we did not start neither AppArmor nor DBus nor logind. Now, generally speaking, we could run init inside Docker and fix these and other warnings. However that's not what Docker is meant for. So if you want to run AppArmor or similar stuff <em>from inside</em> Docker or LXC, then probably you should consider virtualization.</p>
<h2>Dockerfile</h2>
<p>Once you have created the base Docker image, you can start creating some <code>Dockerfile</code>s, if you need to. Here's an example:</p>
<div class="highlight"><pre><span></span>FROM ubuntu-core:alpha-3
RUN snappy install xkcd-webserver
EXPOSE 8000:80
CMD cd /apps/xkcd-webserver/0.3.1 && ./bin/xkcd-webserver
</pre></div>
<p>This <code>Dockerfile</code> does the same job as the previous command: it installs and runs <code>xkcd-webserver</code> on port 8000. In order to use it, first build it:</p>
<div class="highlight"><pre><span></span># docker build -t xkcd-webserver .
</pre></div>
<p>Check that it has been correctly installed:</p>
<div class="highlight"><pre><span></span># docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
xkcd-webserver latest 260e0116e9e3 3 minutes ago 543.5 MB
ubuntu-core alpha-3 f6df3c0e2d74 About an hour ago 543.5 MB
</pre></div>
<p>Then run it:</p>
<div class="highlight"><pre><span></span># docker run xkcd-webserver
</pre></div>
<p>Again, you should see a random XKCD comic on <a href="http://localhost:8000/">http://localhost:8000/</a>.</p>
<h2>Conclusion</h2>
<p>That's all folks! I hope you enjoyed this tiny guide, and if you need help, please ask a question on Ask Ubuntu with the <a href="http://askubuntu.com/questions/tagged/ubuntu-core">ubuntu-core tag</a>, which I'm subscribed to.</p>andreacorbelliniWed, 25 Mar 2015 20:46:00 +0000tag:andrea.corbellini.name,2015-03-25:2015/03/25/running-ubuntu-snappy-inside-docker/dockersnappyubuntuubuntu corexkcdAre LXC and Docker secure?http://andrea.corbellini.name/2015/02/20/are-lxc-and-docker-secure/<p>Since its initial release in 2008, LXC has become widespread among servers. Today, it is becoming the preferred deployment strategy in many contexts, also thanks to Docker and, more recently, LXD.</p>
<p>LXC and Docker are used not only to achieve modular architecture design, but also as a way to run untrusted code in an isolated environment.</p>
<p>We can agree that the LXC and Docker ecosystems are great and work well, but there's an important question that I believe everyone should ask, but too few people are asking: <strong>are LXC and Docker secure?</strong></p>
<figure>
<img src="http://andrea.corbellini.name/images/broken-chain.jpg" alt="Broken Chain">
<figcaption>A system is as safe as its weakest component.</figcaption>
</figure>
<p>In order to answer this question, I won't go deep into the details of what LXC and Docker are. The web is full of information on <a href="http://en.wikipedia.org/wiki/Cgroups#NAMESPACE-ISOLATION">namespaces</a> and <a href="http://en.wikipedia.org/wiki/Cgroups">cgroups</a>. Rather, I'd like to show what LXC and Docker can do, what they cannot do, and what their default configuration allows them to do. My hope is to provide a quick checklist for those who want to go with LXC/Docker, but are unsure on what they need to pay attention to.</p>
<h2>What LXC and Docker can do</h2>
<p>As we all know, LXC confines processes mainly thanks to two Linux kernel features: namespaces and cgroups. These provide ways to control and limit access to resource such as memory or filesystem. So, for example, you can limit the bandwidth used by processes inside a container, you can limit the priority of the CPU scheduler, and so on.</p>
<p>As it is well known, processes inside a LXC guest cannot:</p>
<ul>
<li>directly interact with the host processes, or with other LXC containers;</li>
<li>access the root filesystem, unless configured otherwise;</li>
<li>access special devices (block devices, network interfaces, ...), unless configured otherwise;</li>
<li>mount arbitrary filesystems;</li>
<li>execute special <code>ioctl</code>s, special syscalls or special interrupts, that would affect the behavior host.</li>
</ul>
<p>And at the same time, processes inside an LXC guest can find an environment that is perfectly suitable to run a working operating system: I can run init, I can read from <code>/proc</code>, I can access the internet.</p>
<p>This is most of what LXC can do, and it's also what you get by default. Docker (when used with the LXC backend) is a wrapper around LXC that provides utilities for easy deployment and management of the containers, so <strong>everything that applies to LXC, applies to Docker too</strong>.</p>
<p>If this sounds great, then beware that there are the things you should know...</p>
<h2>You need a security context</h2>
<p>LXC is somewhat incomplete. What I mean is that some parts of special filesystems like procfs or sysfs are not faked. For example, as of now, I can successfully change the value of host's <code>/proc/sys/kernel/panic</code> or <code>/sys/class/thermal/cooling_device0/cur_state</code>.</p>
<p>The reason why LXC is "incomplete" doesn't really matter (it's actually the kernel to be incomplete, but anyhow...). What matters is that certain nasty actions can be forbade, not by LXC itself, but by an AppArmor/SELinux profile that blocks read and write access certain <code>/proc</code> and <code>/sys</code> components. The AppArmor rules were shipped in Ubuntu since 12.10 (Quantal), and have been included upstream since early 2014, together with the SELinux rules.</p>
<p>Therefore, <strong>a security context like AppArmor or SELinux is required to run LXC safely</strong>. Without it, the root user inside a guest can take control of the host.</p>
<p>Check that AppArmor or SELinux are running and are configured properly. If you want to go with Grsecurity, then remember to configure it manually.</p>
<h2>Limit resource consumption</h2>
<p>LXC offers ways to limit resource usage, but no special restrictions are put in place by default. <strong>You have to configure them by yourself.</strong></p>
<p>With the default configuration, I can run fork-bombs, request huge memory maps, keep all CPUs busy, doing high loads of I/O. All of this without special privileges. Remember this when running untrusted code.</p>
<figure>
<img src="http://andrea.corbellini.name/images/memory-usage.png" alt="Uncontrolled memory consumption">
</figure>
<p>To limit resource consumption in LXC, open the configuration file for your container and set the <code>lxc.cgroup.&lt;system&gt;</code> values you need.</p>
<p>For example, if you want to limit the container memory usage to 512 MiB, set <code>lxc.cgroup.memory.limit_in_bytes = 512M</code>. Note that the container with that option, once it exceeds the 512 MiB cap, will start using the swap without limits. If this is not what you want, then set <code>lxc.cgroup.memory.memsw.max_usage_in_bytes = 512M</code>. Note that to use both options you may need to add <code>cgroup_enable=memory</code> and <code>swapaccount=1</code> to the kernel command line.</p>
<p>To have an overview of all possible options, check out <a href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch-Subsystems_and_Tunable_Parameters.html">Red Hat's documentation</a> or the <a href="https://www.kernel.org/doc/Documentation/cgroups/">Kernel documentation</a>.</p>
<p>With Docker, the story is similar: just use <code>--lxc-conf</code> from the command line to set LXC's options.</p>
<h2>Limit disk usage</h2>
<p>Something that LXC cannot do is limiting mass storage usage. Luckily, <strong><a href="https://www.stgraber.org/2013/12/27/lxc-1-0-container-storage/">LXC integrates nicely with LVM</a></strong> (and brtfs, and zfs, and overlayfs), and you can use that for easily limiting disk usage. You can, for example, create a logical volume for each of your guests, and give that volume a limited size, so that space usage inside a guest cannot grow indefinitely.</p>
<p>The same <a href="http://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/">holds for Docker</a>.</p>
<h2>Pay attention at <code>/dev/random</code></h2>
<p><strong>Processes inside LXC guests</strong>, by default, can read from <code>/dev/random</code> and <strong>can consume the entropy of the host</strong>. This may cause troubles if you need big amounts of randomness (to generate keys or whatever).</p>
<p>If this is something that you don't want, then configure LXC so that it <a href="https://wiki.archlinux.org/index.php/Linux_Containers#Cgroups_device_configuration">denies access to the character devices</a> <code>1:8</code> (random) and <code>1:9</code> (urandom). Denying access to the path <code>/dev/random</code> is not enough, as <code>mknod</code> is allowed inside guests.</p>
<p>Note however that doing so may break many applications inside the LXC guest that need randomness. Maybe consider using a different machine for processes that require randomness for security purposes.</p>
<h2>Use unprivileged containers</h2>
<p><strong>Containers can be <a href="https://www.stgraber.org/2014/01/17/lxc-1-0-unprivileged-containers/">run from an unprivileged user</a></strong>. This means UID 0 of the guest can't match UID 0 of the host, and many potential security holes can't simply be exploited. Unfortunately, <a href="https://github.com/docker/docker/issues/2918">Docker has not support for unprivileged containers</a> yet.</p>
<p>However, if Docker is not a requirement and you can do well with LXC, start experimenting with unprivileged containers and consider using them in production.</p>
<p>Programs like Apache will complain that it's unable to change its ulimit (because setting the ulimit is a privilege of the real root user). If you need to run programs that require special privileges, either configure them so that they do not complain, or consider using <a href="http://linux.die.net/man/7/capabilities">capabilities</a> (but do not abuse them, and be cautious, or you risk introducing more problems than the ones your are trying to solve!)</p>
<h2>Conclusion</h2>
<p>LXC, Docker and the entire ecosystem around them can be considered quite mature and stable. They're surely production ready, and, if the right configuration is put in place, it can be pretty difficult to cause troubles to the host.</p>
<p>However, whether they can be considered secure or not is up to you: <strong>what are you using containers for? Who are you giving access to? What privileges are you giving, what actions are you restricting?</strong></p>
<p>Always remember what LXC and Docker do by default, and what they do not do, especially when you use them to run untrusted code. Those that I have listed may only be a few of the problems that LXC, Docker and friends may expose. Remember to carefully review your configuration before opening the doors to others.</p>
<h2>Further reading</h2>
<p>If you liked this article, you'll find these ones interesting too:</p>
<ul>
<li><a href="http://blog.docker.com/2013/08/containers-docker-how-secure-are-they/">Containers & Docker: how secure are they?</a>, from the Docker blog.</li>
<li>Stéphane Graber's <a href="https://www.stgraber.org/2014/01/01/lxc-1-0-security-features/">Security features</a> from his <a href="https://www.stgraber.org/2013/12/20/lxc-1-0-blog-post-series/">LXC 1.0: Blog post series</a>.</li>
</ul>andreacorbelliniFri, 20 Feb 2015 16:36:00 +0000tag:andrea.corbellini.name,2015-02-20:2015/02/20/are-lxc-and-docker-secure/dockerlxcsecurityPrime numbers and universe factorieshttp://andrea.corbellini.name/2015/02/15/prime-numbers-and-universe-factories/<p>I'm a XKCD fan, and I look it up regularly. There's a comic that I particularly enjoyed: <a href="http://xkcd.com/10/">Pi Equals</a>.</p>
<figure>
<a href="http://xkcd.com/10/"><img src="http://imgs.xkcd.com/comics/pi.jpg" width="469" height="247" alt="Pi Equals"></a>
<figcaption>The comic <a href="http://xkcd.com/10/" title="Pi Equals">Pi Equals</a>, from XKCD.com (CC-BY-NC 2.5).</figcaption>
</figure>
<p>Well, it appears that Randall was right in that there's a help message hidden somewhere. And I just found it in a prime number:</p>
<div class="highlight"><pre><span></span>245178888024581899558766786108789912235672909204719666025638877624752119760547413887830514281649480308707369249
</pre></div>
<p>That number corresponds to the ASCII encoding of this message:</p>
<div class="highlight"><pre><span></span>help!! i'm trapped in a universe factory!!!!!!
</pre></div>
<p>Apparently, universe factory workers speak English and write ASCII. Nice coincidence, huh?</p>
<h2>The discovery</h2>
<p>Yesterday I was playing with the two <a href="https://en.wikipedia.org/wiki/Illegal_prime">illegal primes</a> listed on Wikipedia. I was already aware of them, but I had never decoded them till yesterday. While doing so I wondered: how many prime numbers can be directly mapped to an executable file? Also, how many prime numbers can be directly mapped to plain English texts? Perhaps, while digging prime numbers, could we find something like the Iliad or a fully working operating system?</p>
<p>Well, while asking myself those highly philosophical questions, Randall's comic quickly came to my mind, and I decided to start looking for help requests hidden in primes. You can't imagine how many of them I found!</p>
<p>At first I tried looking for all prime numbers corresponding to strings starting with <code>HELP! I'M TRAPPED IN A UNIVERSE FACTORY!</code>, with an arbitrary suffix. I found many of them, but I wasn't satisfied with the result: I wanted something that was purely English/ASCII, without any garbage. Therefore I tried appending hashtags like <code>#help</code> or <code>#universe</code>, but could not find any interesting combination that was also a prime number (apparently, use of Twitter is forbidden inside universe factories).</p>
<p>So I decided to change approach: I looked for all primes corresponding to <code>HELP</code>, followed by a variable number of exclamation marks, followed by <code>I'M TRAPPED IN A UNIVERSE FACTORY</code>, followed by other exclamation marks. I could not find anything.</p>
<p>But then I tried with a lower case string, and... I found lots of such primes!</p>
<div class="highlight"><pre><span></span>help i'm trapped in a universe factory!!!!!!!
help! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!! i'm trapped in a universe factory!!!!!!
help!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!! i'm trapped in a universe factory!!!!
help!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!! i'm trapped in a universe factory!
help!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!
help!!!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
help!!!!!!!!!!!!!!!!!!!!!!!!!!! i'm trapped in a universe factory!!!!!!!
...
</pre></div>
<p>I picked the one I liked most and verified its primality with <a href="http://www.wolframalpha.com/input/?i=is+245178888024581899558766786108789912235672909204719666025638877624752119760547413887830514281649480308707369249+prime%3F">Wolfram|Alpha</a> and <a href="http://www.numberempire.com/primenumbers.php">numberempire.com</a>.</p>
<p>I'm not 100% sure that all the others are primes, as I used <a href="https://en.wikipedia.org/wiki/Fermat_primality_test">Fermat primality test</a>. However I'm impressed by what I found. Now I can't stop wondering how much literature, physics or technology could be hidden in prime numbers, in plain English and UTF-8 encoded. :D</p>
<p>(Obviously, I'm perfectly conscious on what's happening here, but I though this was a nice fact to share. It could also be a nice number to print on a shirt.)</p>
<p><strong>Dear universe factory worker, I'm going to rescue you, sooner or later. Just tell me how.</strong></p>andreacorbelliniSun, 15 Feb 2015 16:54:00 +0000tag:andrea.corbellini.name,2015-02-15:2015/02/15/prime-numbers-and-universe-factories/funmathNew blog, againhttp://andrea.corbellini.name/2015/02/15/new-blog-again/<p>This must be the third blog I start from scratch. But this time, I'm taking a serious commitment: I'm going to write here regularly.</p>
<p>Wish me luck!</p>andreacorbelliniSun, 15 Feb 2015 12:23:00 +0000tag:andrea.corbellini.name,2015-02-15:2015/02/15/new-blog-again/miscblog