Not long ago I set up a BuildBot instance for wxHaskell and thought about sharing my experience with configuring all the elements of the system. Johan Tibell’s commentary to the State of Haskell, 2011 survey, and his call for Windows bots to test Hackage libraries, prompted me to finally put my notes together into tutorial of a sort. There is nothing Haskell-specific, so if you are looking to automate Windows builds of apps written in any other language this guide should provide all the info you need.
BuildBot is a build server designed specifically for automating builds of software that targets multiple platforms. The central server (master) sends build requests to multiple slaves, each of which can be configured differently. The server receives the build results from slaves, aggregates them and displays via a web interface. While the server should be accessible at all times, the slaves are only required when there is something to build.
In order for the build to be repeatable and to avoid random errors introduced by unrelated configuration changes, it would be nice for the slaves to be dedicated machines with a known initial configurations. This sounds like an ideal setting for using virual servers created from a disk image specifically for the build. Thanks to services like Amazon EC2 it is relatively easy and inexpensive to set up such a box.
Helpfully, BuildBot supports so called “latent slaves”, which are started by the server when there is a build to run, and are stopped when there is no more work to do. Furthermore, there is an implementation of a latent slave that works with EC2.
What we will need
For the BuildBot master, we’ll need a server with shell access, which is permanently online and where you can open a port to a non-HTTP traffic (this is required for accepting slave connections). I’ve used a WebFaction shared hosting with a dedicated IP (additional 5$ per month), which is required for the aforementioned port.
We’ll set up the slave on Amazon EC2. I’m not going to cover how to set up an account there as the process is straightforward and there are numerous resources on how to do this. Hosting of the disk image (AMI) will set us back a couple of dollars per month; on top of that we’ll need to pay for the time our slave is running, which for typical builds that take less than an hour and one build every day running on a micro instance will add another couple of dollars.
All of the steps below happen on the always-online server. They assume that
python on our server runs Python version 2.7.
Download the tarball from [http://pypi.python.org/pypi/virtualenv/] and untar to a temporary directory. In there run
python setup.py instal --dry-run. If it complains about missing directories, create them. Continue running with –dry-run until it passe, then run without –dry-run. Once done, we should end up with
virtualenv script in
~/bin. We can then get rid of the tarball and temporary directory.
Follow the BuildBot tutorial. I installed BuildBot in
~/opt/buildbot instead of suggested
~/tmp/buildbot, other than that I followed it to the letter.
Opening the BuildBot website to the outside world depends on how our server is set up. For WebFaction it involved creating a custom web application via the control panel. This web application got assigned a port number, which then needed to be set in BuildBot’s config by editing
~/opt/buildbot/sandbox/master/master.cfg and changing the
c['status'].append(html.WebStatus(http_port=8010, authz=authz_cfg)) to the assigned port.
Finally, we might want to set up a cron job to automatically start the master should it die for any reason; here’s the script I used:
#!/bin/bash # starts the buildbot master if it's not running already HOME=/home/mmakowski ps -ef | grep 'buildbot start master' | grep -vq grep RUNNING=$? if [ $RUNNING -ne 0 ] then cd $HOME/opt/buildbot source sandbox/bin/activate cd sandbox buildbot start master fi
crontab -e and add entry:
0 * * * * /home/mmakowski/bin/buildbot-start-master-if-not-running &
which will ensure that the script will run once every hour.
If we have a Windows box handy we might want to test the slave setup on it before playing with AWS instance, which is billed by the hour.
Log on to AWS Management Console and set up a Windows instance using one of the Windows AMIs provided by Amazon((I’ll assume we used a 64-bit OS in the instructions that follow; if not, 32 bit versions of relevant software should be used instead of the ones mentioned)) (see EC2 Gettings Started guide for step-by-step instructions). After a couple of minutes we will be able to retrieve the administrator password and connect using Remote Desktop Connection (if on Windows) or rdesktop (on Linux or Mac).
Once logged on, we need to change the administrator password to something we’ll remember and install the following software:
- Python 2.7 (64 bit)
- Using easy_install set up
- unzip BuildBot Slave to a temporary directory and run
setup.py installin there
On top of that we will need to install any other software required to build the project we’re setting the slave up for (VCS client, compiler, libraries etc.)
Once it is done, we can follow the slave part of the BuildBot tutorial to make sure that the slave can communicate with the master and can handle build requests.
Note that so far the slave we set up is a regular (non-latent) one, i.e. master will assume that it is always online and available to handle build requests. We’ll address this shortly; for now it’s important to make sure that the builds work as expected on our newly set up server.
Making it Latent
After we are happy that the master can invoke builds, one final piece of slave configuration is ensuring that the slave process starts up automatically after the server is brought on line; on Windows this is done using services. Making slave run as a service will require a piece of custom Python code. I’ve used the script by Ira Pfeifer, modified slightly to work with recent version of BuildBot. Here it is, for your convenience; make sure to update the environment variables set in the script to contain everything your build needs:
""" buildbot_slave_service.py Original Author: Ira Pfeifer Email: ipfeifer -dot- tech -at- gmail """ import sys import os import subprocess import win32serviceutil import win32service import win32event import win32api import time from buildslave.scripts import runner from buildslave.scripts.startup import start # change paths as appropriate slavepath = "c:\\buildbot\\slave-wxhaskell" homepath = "\\" os.environ['HOMEDRIVE'] = "C:" os.environ['HOMEPATH'] = homepath os.environ['PATH'] = r"C:\Python27;C:\Python27\Scripts;C:\Program Files (x86)\Haskell\bin;C:\Program Files (x86)\Haskell Platform\2011.2.0.1\lib\extralibs\bin;C:\Program Files (x86)\Haskell Platform\2011.2.0.1\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;c:\program files (x86)\wx-config;c:\program files (x86)\Darcs;C:\Users\Administrator\AppData\Roaming\cabal\bin" os.environ['WXWIN'] = r"C:\wxWidgets2.8" os.environ['WXCFG'] = r"gcc_dll\mswu" class BuildSlaveService(win32serviceutil.ServiceFramework): _svc_name_ = "BuildBot_Slave_wxHaskell" _svc_display_name_ = "BuildBot Slave wxHaskell" _svc_description_ = "Buildbot slave based in " + slavepath def __init__(self, args): win32serviceutil.ServiceFramework.__init__(self, args) self.stop_event = win32event.CreateEvent(None, 0, 0, None) def SvcDoRun(self): # The service starts a subprocess that will run the actual buildbot # so that it can be stopped by simply killing off the subprocess. self.child = subprocess.Popen(["python", __file__, "start"]) isAlive = True while isAlive: time.sleep(10) def SvcStop(self): self.ReportServiceStatus(win32service.SERVICE_STOP_PENDING) handle = win32api.OpenProcess(1,0,self.child.pid) # returns exit code - wrap w/ error handling? win32api.TerminateProcess(handle, 0) win32api.CloseHandle(handle) isAlive = False self.ReportServiceStatus(win32service.SERVICE_STOP_PENDING) win32.SetEvent(self.hWaitStop) SvcShutdown = SvcStop def start_buildslave(): config = runner.Options() config.parseOptions(['start', slavepath]) so = config.subOptions start(so) if __name__ == '__main__': if len(sys.argv) > 1 and sys.argv == "start": start_buildslave() else: win32serviceutil.HandleCommandLine(BuildSlaveService)
Put this somewhere on the server and run
buildbot_slave_service.py install. Then go to Services application and change the startup to
run automatically (delayed start), as Administrator.
After this is done we are ready to take a snapshot of the servers hard drive, so that we can start clones later on. Once the AMI appears as available in AMIs section of the EC2 console we will terminate the instance we’ve just configured.
We’ll need one additional Python library to enable BuildBot master to control EC2 instances: boto. On the master server, cd to
~/opt/buildbot, activate the sandbox (
. sandbox/bin/activate) and run
It’s now time to tell master to treat our slave as latent. The setup is described in the BuildBot docs, and it boils down to updating
c['slaves'] like this:
####### BUILDSLAVES # The 'slaves' list defines the set of recognized buildslaves. Each element is # a BuildSlave object, specifying a unique slave name and password. The same # slave name and password must be configured on the slave. from buildbot.buildslave import BuildSlave from buildbot.ec2buildslave import EC2LatentBuildSlave c['slaves'] = [EC2LatentBuildSlave('<slave name>', '<slave password>', 't1.micro', region=u'eu-west-1', ami='ami-12345678', identifier='<key identifier>', secret_identifier='<secret key>')]
<slave password>are the same as the ones we used so far
t1.microis the type of EC2 instance; change it to a bigger one if that’s what you require for your build (check the prices before doing that though)
regionshould be the region where you set up your AMI (can be seen in the AWS console)
ami-12345678is the id of the AMI we have created from the slave instance
<secret key>can be generated and obtained from the Security Credentials section of your AWS account.
Finally, in BuildBot 0.8.4 there is an issue with EC2 instances not being terminated. If left unfixed it can lead to each build’s disk image being left active indefinitely and charging to our account! The bug should be fixed in BuildBot 0.8.5; in the mean time we can patch it by hand, by replacing
Once this is done, a
buildbot restart master should provide us with a BuildBot process running with the final set up and starting up EC2 slave when required.
Fixing the Slave
If it turns out that our slave configuration needs an update, here is what we do:
- in AWS console, create a new instance using our slave AMI
- connect to the instance using Remote Desktop Connection/rdesktop and make the necessary config changes
- create a new AMI from the reconfigured instance and then termiante the instance
- update the master config with the new AMI id
Debugging the Master
While the source code of BuildBot might be a bit difficult to follow for someone not versed in Twisted, I found the manhole interface invaluable in tracking down what turned out to be a trivial config issue.
Two things I got bitten by and which took a little bit to investigate and fix are:
- AWS instance reboot on startup. The Windows box is rebooted by AWS immediately after being started; apparently this is required for some TCP/IP-related reconfiguration to take effect. As a result it is crucial that our BuildBot slave service is set up with delayed start. Otherwise the service will start and report to the master; it can even start running the build, when suddenly the server will reboot. After the reboot the service will come back up and try to connect again, but the master will reject this connection as unsolicited.
instance.stop()issue mentioned earlier. Strangely, I would swear that my setup had been working for about a week without any patches, with instances being terminated; the “orphaned” stopped instances seemed to have started appearing suddenly and without any changes in either the slave or the server code or config. I can only assume I must have been seeing things, since
instance.stop()has been there all the time and I’d never expect it to terminate an instance.
Hopefully you won’t come across any more nasty surprises. Happy building!