Welcome to the DjaoDjin Blog!

A place to share experiences in building Software-as-a-Service.

Create an EC2 AMI with Ansible

by Sebastien Mirolo on Thu, 30 Mar 2017

Time to boot EC2 instances has significantly improved over the years but it still takes in the order of tens of minutes to do a system update and configuration. As a result, we always create a base image, fully configure and that is then instantiated as necessary. Of course, we use Ansible to setup and register that AMI.

Finding a suitable base image

For us mere developers, it is turtles all the way down. We need to select a base AMI already registered with AWS to build our application base image of.

There is a lot going into picking a suitable base, not a small part being the quirks of instance type and region we will run the image on.

One way to select a base image is to list the suitable instances in the region we are interested in and pick the appropriate ami-id. For example:

$ cat ~/.aws/config
[default]
region = us-west-2

$ cat ~/.aws/credentials
[default]
aws_access_key_id = ***
aws_secret_access_key = ***

$ aws ec2 describe-images --filters "Name=virtualization-type,Values=hvm"

An alternative consists of looking the ami-id for the latest stock image of your favorite distribution. Example: Fedora Cloud

Configure the instance

We use a cloud-init script to configure the instance. The Ansible files look something like this:

$ cat playbooks/group_vars/all
# Variables to connect to AWS
aws_account: ****                 # AWS accountID (used in S3 bucket policies)
aws_region: us-west-2             # AWS region where resources are allocated

# Variables to create EC2 instances
key_name: ****                    # Key used to first ssh into an instance
aws_zone: us-west-2b              # EBS/EC2 must be in the same zone.
ami_id: ami-3dea475d              # Base image (Fedora 25 HVM GP2)

$ cat playbooks/aws-create-images.yml
- name: Create and configure EC2 instances to be used as AMIs
  hosts: localhost
  connection: local
  gather_facts: False

  roles:
    - create_base_image

$ cat playbooks/roles/create_base_image/tasks/main.yml
- name: Create EC2 instance to setup front-end web server
  local_action:
    module: ec2
    key_name: "{{key_name}}"
    group: "default"
    instance_profile_name: "default-profile"
    instance_type: t2.micro
    image: "{{ami_id}}"
    region: "{{aws_region}}"
    zone: "{{aws_zone}}"
    user_data: "{{lookup('template', '../templates/cloud-init-script.j2')}}"
    wait: yes
  register: web_base

$ cat playbooks/roles/create_base_image/templates/cloud-init-script.j2
#!/bin/bash

set -x
set -e

/usr/bin/dnf -y update
/usr/bin/dnf -y install docker awscli
sudo systemctl enable docker.service

With the code above it is then possible to start an EC2 instance and run a configuration script through cloud-init.

$ ansible-playbook playbooks/aws-create-images.yml

The issue here is that we run the commands to register the AMI before the setup of the instance is complete we will have an inconsistent image. We need to wait until cloud-init is done before moving forward.

wait_for, and register image

As we are debugging the Ansible playbooks, it is often useful to log into the instance and tail the cloud-init output.

$ cat /var/lib/cloud/instances/*instance-id*/scripts/part-001
...
$ sudo tail -f /var/log/cloud-init-output.log
...

With a way to see what is going on, it is time to think how to wait for cloud-init to complete in the Ansible playbook. A first idea is to touch a file at the end of the cloud-init script and have Ansible wait for it to appear.

$ cat playbooks/roles/create_base_image/templates/cloud-init-script.j2
...
sudo -u fedora touch /home/fedora/.web-done

$ cat playbooks/roles/create_base_image/tasks/main.yml
...
- name: Wait for configuration of EC2 instance completed
  wait_for:
    delay: 300
    path: /home/fedora/.web-done
...

The wait_for directive does not work here because our playbook executes in the context of the local host and not the remote ec2 instance. Thus Ansible will be waiting on a file /home/fedora/.web-done to appear on the local host. A bit ugly, yet it can be solve by running a ssh command:

- name: Wait for configuration of EC2 instance completed
  local_action:
    module: command
      ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i {{lookup('env', 'HOME')}}/.ssh/{{key_name}} fedora@{{web_base.instances[0].public_ip}} sh -c "'while [ ! -f /home/fedora/.web-done ]; do sleep 30; done'"

Outside stumbling on issue #14655, and having to add AddKeysToAgent yes to ~/ssh/config on the latest macOS Sierra so that we do not get prompted for the password to decrypt the key every time, this works.

We can do better though. Assuming the ssh port is filtered in the security group but the http port is not, we can run a small web server and use the Ansible wait_for module in a straightforward manner.

$ cat playbooks/roles/create_base_image/templates/cloud-init-script.j2
...
# Ansible will be waiting for this server to respond
# before it continues with registering the AMI.
cd /home/fedora
sudo -u fedora echo "DONE" > index.html
/usr/bin/python3 -m http.server 80

$ cat playbooks/roles/create_base_image/tasks/main.yml
...
  wait_for:
    delay: 540  # Yeah, it takes about 8min to update and install packages.
    host: "{{web_base.instances[0].public_ip}}"
    port: 80
    state: started

register image and clean up

Once we have a way to wait for cloud-init to be completed, registering the AMI and cleaning up is straightforward.

$ cat playbooks/roles/create_base_image/tasks/main.yml
...
- name: Register AMI for front-end web servers
  local_action:
    module: ec2_ami
    region: "{{aws_region}}"
    instance_id: "{{web_base_device_id}}"
    name: "web-{{web_base_device_id}}"
    description: "Front-end web reverse proxy"
    wait: yes
  register: web_ami

# Records the ami-id is the set of dynamic variables for other playbooks.
- lineinfile: "dest=group_vars/dynamic regexp='^web_ami_id:' line='web_ami_id: {{web_ami.image_id}}'"

- name: Delete EC2 instance configured to create front-end web server AMI
  local_action:
    module: ec2
    region: "{{aws_region}}"
    instance_id: "{{web_base_device_id}}"
    state: absent

A full working playbook can be found in DjaoDjin open source projects hosted on GitHub.

More to read

If you are looking for related posts, Deploying on EC2 with Ansible and PostgreSQL, encrypted EBS volume and Key Management Service are good reads.

More technical posts are also available on the DjaoDjin blog. For fellow entrepreneurs, business lessons learned running a subscription hosting platform.