Setting up BuildBot with Windows Slave on AWS EC2

Not long ago I set up a BuildBot instance for wxHaskell and thought about sharing my experience with configuring all the elements of the system. Johan Tibell’s commentary to the State of Haskell, 2011 survey, and his call for Windows bots to test Hackage libraries, prompted me to finally put my notes together into tutorial of a sort. There is nothing Haskell-specific, so if you are looking to automate Windows builds of apps written in any other language this guide should provide all the info you need.

The Setup

BuildBot is a build server designed specifically for automating builds of software that targets multiple platforms. The central server (master) sends build requests to multiple slaves, each of which can be configured differently. The server receives the build results from slaves, aggregates them and displays via a web interface. While the server should be accessible at all times, the slaves are only required when there is something to build.

In order for the build to be repeatable and to avoid random errors introduced by unrelated configuration changes, it would be nice for the slaves to be dedicated machines with a known initial configurations. This sounds like an ideal setting for using virual servers created from a disk image specifically for the build. Thanks to services like Amazon EC2 it is relatively easy and inexpensive to set up such a box.

Helpfully, BuildBot supports so called “latent slaves”, which are started by the server when there is a build to run, and are stopped when there is no more work to do. Furthermore, there is an implementation of a latent slave that works with EC2.

What we will need

For the BuildBot master, we’ll need a server with shell access, which is permanently online and where you can open a port to a non-HTTP traffic (this is required for accepting slave connections). I’ve used a WebFaction shared hosting with a dedicated IP (additional 5$ per month), which is required for the aforementioned port.

We’ll set up the slave on Amazon EC2. I’m not going to cover how to set up an account there as the process is straightforward and there are numerous resources on how to do this. Hosting of the disk image (AMI) will set us back a couple of dollars per month; on top of that we’ll need to pay for the time our slave is running, which for typical builds that take less than an hour and one build every day running on a micro instance will add another couple of dollars.

The Master

All of the steps below happen on the always-online server. They assume that python on our server runs Python version 2.7.

Virtualenv

Download the tarball from [http://pypi.python.org/pypi/virtualenv/] and untar to a temporary directory. In there run python setup.py instal --dry-run. If it complains about missing directories, create them. Continue running with –dry-run until it passe, then run without –dry-run. Once done, we should end up with virtualenv script in ~/bin. We can then get rid of the tarball and temporary directory.

BuildBot

Follow the BuildBot tutorial. I installed BuildBot in ~/opt/buildbot instead of suggested ~/tmp/buildbot, other than that I followed it to the letter.

Opening the BuildBot website to the outside world depends on how our server is set up. For WebFaction it involved creating a custom web application via the control panel. This web application got assigned a port number, which then needed to be set in BuildBot’s config by editing ~/opt/buildbot/sandbox/master/master.cfg and changing the http_port in c['status'].append(html.WebStatus(http_port=8010, authz=authz_cfg)) to the assigned port.

Finally, we might want to set up a cron job to automatically start the master should it die for any reason; here’s the script I used:

#!/bin/bash

# starts the buildbot master if it's not running already

HOME=/home/mmakowski

ps -ef | grep 'buildbot start master' | grep -vq grep
RUNNING=$?
if [ $RUNNING -ne 0 ]
then
    cd $HOME/opt/buildbot
    source sandbox/bin/activate
    cd sandbox
    buildbot start master
fi

Then crontab -e and add entry:

0       *       *       *       *       /home/mmakowski/bin/buildbot-start-master-if-not-running &

which will ensure that the script will run once every hour.

The Slave

If we have a Windows box handy we might want to test the slave setup on it before playing with AWS instance, which is billed by the hour.

Log on to AWS Management Console and set up a Windows instance using one of the Windows AMIs provided by Amazon((I’ll assume we used a 64-bit OS in the instructions that follow; if not, 32 bit versions of relevant software should be used instead of the ones mentioned)) (see EC2 Gettings Started guide for step-by-step instructions). After a couple of minutes we will be able to retrieve the administrator password and connect using Remote Desktop Connection (if on Windows) or rdesktop (on Linux or Mac).

Once logged on, we need to change the administrator password to something we’ll remember and install the following software:

On top of that we will need to install any other software required to build the project we’re setting the slave up for (VCS client, compiler, libraries etc.)

Once it is done, we can follow the slave part of the BuildBot tutorial to make sure that the slave can communicate with the master and can handle build requests.

Note that so far the slave we set up is a regular (non-latent) one, i.e. master will assume that it is always online and available to handle build requests. We’ll address this shortly; for now it’s important to make sure that the builds work as expected on our newly set up server.

Making it Latent

The Slave

After we are happy that the master can invoke builds, one final piece of slave configuration is ensuring that the slave process starts up automatically after the server is brought on line; on Windows this is done using services. Making slave run as a service will require a piece of custom Python code. I’ve used the script by Ira Pfeifer, modified slightly to work with recent version of BuildBot. Here it is, for your convenience; make sure to update the environment variables set in the script to contain everything your build needs:

""" buildbot_slave_service.py

Original Author: Ira Pfeifer
Email: ipfeifer -dot- tech -at- gmail
"""

import sys
import os
import subprocess
import win32serviceutil
import win32service
import win32event
import win32api
import time

from buildslave.scripts import runner
from buildslave.scripts.startup import start

# change paths as appropriate
slavepath = "c:\\buildbot\\slave-wxhaskell"
homepath = "\\"
os.environ['HOMEDRIVE'] = "C:"
os.environ['HOMEPATH'] = homepath
os.environ['PATH'] = r"C:\Python27;C:\Python27\Scripts;C:\Program Files (x86)\Haskell\bin;C:\Program Files (x86)\Haskell Platform\2011.2.0.1\lib\extralibs\bin;C:\Program Files (x86)\Haskell Platform\2011.2.0.1\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;c:\program files (x86)\wx-config;c:\program files (x86)\Darcs;C:\Users\Administrator\AppData\Roaming\cabal\bin"
os.environ['WXWIN'] = r"C:\wxWidgets2.8"
os.environ['WXCFG'] = r"gcc_dll\mswu"

class BuildSlaveService(win32serviceutil.ServiceFramework):
    _svc_name_ = "BuildBot_Slave_wxHaskell"
    _svc_display_name_ = "BuildBot Slave wxHaskell"
    _svc_description_ = "Buildbot slave based in " + slavepath

    def __init__(self, args):
        win32serviceutil.ServiceFramework.__init__(self, args)
        self.stop_event = win32event.CreateEvent(None, 0, 0, None)

    def SvcDoRun(self):
        # The service starts a subprocess that will run the actual buildbot
        # so that it can be stopped by simply killing off the subprocess.
        self.child = subprocess.Popen(["python", __file__, "start"])
        isAlive = True
        while isAlive:
            time.sleep(10)

    def SvcStop(self):
        self.ReportServiceStatus(win32service.SERVICE_STOP_PENDING)
        handle = win32api.OpenProcess(1,0,self.child.pid)
        # returns exit code - wrap w/ error handling?
        win32api.TerminateProcess(handle, 0)
        win32api.CloseHandle(handle)
        isAlive = False
        self.ReportServiceStatus(win32service.SERVICE_STOP_PENDING)
        win32.SetEvent(self.hWaitStop)

        SvcShutdown = SvcStop

def start_buildslave():
    config = runner.Options()
    config.parseOptions(['start', slavepath])
    so = config.subOptions
    start(so)

if __name__ == '__main__':
    if len(sys.argv) > 1 and sys.argv[1] == "start":
        start_buildslave()
    else:
        win32serviceutil.HandleCommandLine(BuildSlaveService)

Put this somewhere on the server and run buildbot_slave_service.py install. Then go to Services application and change the startup to run automatically (delayed start), as Administrator.

After this is done we are ready to take a snapshot of the servers hard drive, so that we can start clones later on. Once the AMI appears as available in AMIs section of the EC2 console we will terminate the instance we’ve just configured.

The Master

We’ll need one additional Python library to enable BuildBot master to control EC2 instances: boto. On the master server, cd to ~/opt/buildbot, activate the sandbox (. sandbox/bin/activate) and run easy_install boto.

It’s now time to tell master to treat our slave as latent. The setup is described in the BuildBot docs, and it boils down to updating c['slaves'] like this:

####### BUILDSLAVES

# The 'slaves' list defines the set of recognized buildslaves. Each element is
# a BuildSlave object, specifying a unique slave name and password.  The same
# slave name and password must be configured on the slave.
from buildbot.buildslave import BuildSlave
from buildbot.ec2buildslave import EC2LatentBuildSlave
c['slaves'] = [EC2LatentBuildSlave('<slave name>', '<slave password>', 't1.micro',
                                   region=u'eu-west-1',
                                   ami='ami-12345678',
                                   identifier='<key identifier>',
                                   secret_identifier='<secret key>')]

Where:

Finally, in BuildBot 0.8.4 there is an issue with EC2 instances not being terminated. If left unfixed it can lead to each build’s disk image being left active indefinitely and charging to our account! The bug should be fixed in BuildBot 0.8.5; in the mean time we can patch it by hand, by replacing instance.stop() with instance.terminate() in ~/opt/buildbot/sandbox/lib/python2.7/site-packages/buildbot-0.8.4p2-py2.7.egg/buildbot/ec2buildslave.py.

Once this is done, a buildbot restart master should provide us with a BuildBot process running with the final set up and starting up EC2 slave when required.

Fixing the Slave

If it turns out that our slave configuration needs an update, here is what we do:

Debugging the Master

While the source code of BuildBot might be a bit difficult to follow for someone not versed in Twisted, I found the manhole interface invaluable in tracking down what turned out to be a trivial config issue.

Gotchas

Two things I got bitten by and which took a little bit to investigate and fix are:

Hopefully you won’t come across any more nasty surprises. Happy building!

15/08/2011