Spark Summit 2014

Jul 01 2014

I presented yesterday at Spark Summit 2014. This was my first time presenting at a tech conference but I feel it went okay.

Overall the conference was interesting, especially seeing how much momentum Spark has been getting since the last summit in 2013. The announcement of Databricks Cloud is very exciting, especially since we’ve been running Spark in production for the past 8 months at my job and have been looking for a service like this to streamline some of our ops.

Slides from my talk below:

Setting up a Multi-VM environment in Vagrant

Sep 08 2012

Running a complete development environment (web servers, database, etc) has typically meant installing each of these on the same box. But this doesn’t accurately model how these get deployed into production where each service runs on a separate box, often with different dependencies. Vagrant provides an easy way to setup a group of development VMs that are all related in some way.

To demonstrate this, I’ll setup a simple environment containing

  • Two node.js application servers
  • An nginx load balancer

I’m going to use a trivial example of a node application from their homepage. But the basic principle is the same with other applications.

Besides setting up all the Vagrant stuff I’m also going to want some way of automatically configuring the VMs. For that I’ll use Chef.

Single node application server

To start off let’s create a single node.js application server running our app. First, create a cookbook for our node application:

knife cookbook create node app

This will create a default cookbook structure for me to work with. I’ll add my node application as a file resource:

vi cookbooks/nodeapp/files/default/app.js
var http = require('http');
var os   = require('os');
var hostname = os.hostname();

http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Hello from ' + hostname + '\n');
console.log('Server running at');

And reference this in the default recipe:

vi chef/cookbooks/nodeapp/recipes/default.rb 
include_recipe 'node'

cookbook_file '/usr/local/bin/app.js' do
  action :create
  source 'app.js'

node_server 'nodeapp' do
  script '/usr/local/bin/app.js'
  action :start

This uses the ‘node’ cookbook which will take care of installing node, npm, and allow you to setup an app to start on boot

I found that the node cookbook obtained via knife cookbook site install node as of 9/8/2012 has a few issues that prevent it from working on my Ubuntu VM. I’ve created a fork here that works around these issues.

Besides the node cookbook we’ll need to satisfy some other dependencies:

knife cookbook site install apt
knife cookbook site install build-essential
knife cookbook site install git
knife cookbook site install ntp
knife cookbook site install ubuntu

So altogether you should have the following cookbooks:

$ ls chef/cookbooks/

To use this in a Vagrantbox we’re going to include this recipe in an ‘application_server’ role:

vi chef/roles/application_server.rb
name "application_server"
description "node.js application server"
run_list "recipe[ntp]", "recipe[ubuntu]", "recipe[nodeapp]"

Before we get too far along let’s make sure we can setup a single application server running our node app.

Create a simple Vagrantfile that uses the application_server role we created above. We’ll also need to configure port forwarding of port 1337: do |config| = "precise32"
  config.vm.forward_port 1337, 1337
  config.vm.provision :chef_solo do |chef|
    chef.cookbooks_path = "chef/cookbooks"
    chef.roles_path = "chef/roles"
    chef.data_bags_path = "chef/data_bags"
    chef.add_role "application_server"

Run vagrant up and in a few minutes you’re VM should be ready to use. The recipe is setup to build node.js from source so this could take a couple of minutes. If you got any errors running vagrant check the output to figure out what went wrong.

Once this is up you should be able to connect from the host machine to localhost:1337 and get a response.

$ curl localhost:1337
Hello from precise32

Multiple application servers

So far so good, but now let’s extend this to create multiple application servers. To do so we’ll need to change the Vagrantfile to setup multiple VMs, one for each application server we want to run.

Let’s also set the hostnames for these so we can distinguish between them, as well as configure host only networking with static ip addresses.

Since the Vagrantfile is written in Ruby I can define these values in a hash to avoid duplicating config settings for each VM I need to setup: do |config|
  app_servers = { :app1 => '',
                  :app2 => ''

  app_servers.each do |app_server_name, app_server_ip|
    config.vm.define app_server_name do |app_config| = "precise32"
      app_config.vm.host_name = app_server_name.to_s :hostonly, app_server_ip
      app_config.vm.provision :chef_solo do |chef|
        chef.cookbooks_path = "chef/cookbooks"
        chef.roles_path = "chef/roles"
        chef.data_bags_path = "chef/data_bags"
        chef.add_role "application_server"

Now when you run vagrant up both VMs will start up. If you still have VMs running from earlier you may need to vm destroy them. If they’re up you can test each box individually like so:

$ curl
Hello from app1
$ curl
Hello from app2

Since we switched to use host only networking we no longer need to worry about forwarding individual ports since all ports are open. This may not match what we’d see in production in which case we’d want to setup some firewall rules to only allow traffic through specific ports. But for now we’ll just proceed with all ports open.

Load balancer

Having multiple application servers is great but we still need to setup a load balancer between the them. There are multiple choices we could use for this but let’s go with nginx.

We’ll need to grab some more Chef cookbooks:

knife cookbook site install nginx
knife cookbook site install bluepill
knife cookbook site install runit
knife cookbook site install yum
knife cookbook site install ohai

And create a new cookbook containing our configuration for the load balancer:

knife cookbook create loadbalancer

Edit the default recipe in the loadbalancer cookbook so that we require nginx as well as create the default config file. We also setup to restart the nginx resource after changing the config file:

vi chef/cookbooks/loadbalancer/recipes/default.rb 
require_recipe "nginx"

template '/etc/nginx/sites-available/default' do
  source 'loadbalancer.conf.erb'
    :upstream_servers => node[:loadbalancer][:upstream_servers]
  notifies :restart, resources(:service => "nginx")

Put the default.conf file in the templates/default directory. The IP addresses of the upstream app servers will be come from attributes that we define in our Vagrantfile.

vi chef/cookbooks/loadbalancer/templates/default/loadbalancer.conf.erb 
upstream appcluster {
  <% @upstream_servers.each do |ip_address| -%>
  server <%= ip_address %>;
  <% end -%>

server {
  listen 80;
  server_name load_balancer_test;

  location / {
    proxy_pass http://appcluster;

With this all in place we can create a role using our recipes:

vi chef/roles/load_balancer.rb
name "load_balancer"
description "load balancer using nginx"
run_list "recipe[ntp]", "recipe[ubuntu]", "recipe[loadbalancer]"

And update the Vagrantfile to create a new VM from this role as well as pass in the upstream server IP addresses: do |config|
  # Define and configure application servers
  app_servers = { :app1 => '',
                  :app2 => ''

  app_servers.each do |app_server_name, app_server_ip|
    config.vm.define app_server_name do |app_config| = "precise32"
      app_config.vm.host_name = app_server_name.to_s :hostonly, app_server_ip
      app_config.vm.provision :chef_solo do |chef|
        chef.cookbooks_path = "chef/cookbooks"
        chef.roles_path = "chef/roles"
        chef.data_bags_path = "chef/data_bags"
        chef.add_role "application_server"

  # Configure load balancer
  config.vm.define :load_balancer do |load_balancer_config| = "precise32"
    load_balancer_config.vm.host_name = "loadbalancer" :hostonly, ""
    load_balancer_config.vm.provision :chef_solo do |chef|
      chef.cookbooks_path = "chef/cookbooks"
      chef.roles_path = "chef/roles"
      chef.data_bags_path = "chef/data_bags"
      chef.add_role "load_balancer"
      chef.json = {
        'loadbalancer' => {
          'upstream_servers' => ['','']


If you still have your app servers running you only need to run vagrant up load_balancer to bring up the new node. Or you can just run vagrant up to bring up everything.

Now you should be able to test against the new load balancer node and observe that you rotate between each application server:

$ curl
Hello from app1
$ curl
Hello from app2

If you bring down an app server with vagrant destroy app1 and test the load balancer again you should still get a response from the remaining app server (the first time the response may take longer depending on the timeout value you configure in nginx).

You can checkout all the configuration used in this post on my github repository.

Setting up Chef and Vagrant on Windows 7 under Cygwin

Jul 15 2012

At a previous job we were restricted to Windows PCs only. This made setting up certain development tools more annoying than it should be.

Recently I started using Vagrant and Chef which both come with native Windows MSIs, except using these would require using the Windows command shell.  But with a few additional steps I was able to get these running under Cygwin using the native MSIs.

First: Chef!

  1. Download and install the native Windows installer

  2. If you don’t have it already, install git under Cygwin.  Use the Cygwin installer for this and select the ‘git’ package.

  3. The main issue I ran into after this point was that Chef uses a version of Ruby native to Windows, which means that running the various Chef scripts under Cygwin fail since the native Ruby program doesn’t understand the Cygwin style paths.  But this is easy enough to work around by creating a few aliases in your bash profile. Change the paths below to match wherever you installed Chef:

alias knife='/cygdrive/c/opscode/chef/embedded/bin/ruby C:/opscode/chef/bin/knife'
alias chef-client='/cygdrive/c/opscode/chef/embedded/bin/ruby C:/opscode/chef/bin/chef-client'
alias chef-solo='/cygdrive/c/opscode/chef/embedded/bin/ruby C:/opscode/chef/bin/chef-solo'
alias shef='/cygdrive/c/opscode/chef/embedded/bin/ruby C:/opscode/chef/bin/shef'
  1. At this point you should be able to run chef-client -v without any errors.

  2. Next setup a repository for your cookbooks. I’d recommend just cloning the opscode repository: shell git clone git://

  3. Finally configure knife

mkdir chef-repo/.chef 
knife configure -r chef-repo

If you’re using Hosted Chef or Chef Server include your validation keys in chef-repo/.chef/ as well.

Note: when running knife cookbook site install X the process will fail to unzip the cookbook. However you can just manually unzip it to get it working.

Now: Vagrant!

  1. If you don’t have it, get Virtualbox

  2. Download and install the MSI installer for Vagrant

  3. I had an issue running Vagrant out of the box due to some wonky line breaks so do this:

dos2unix /cygdrive/c/bin/vagrant/vagrant/bin/vagrant
  1. Test with vagrant

  2. Vagrant defaults to creating Virtual Box VMs in Cygwin home and not Windows home. This will cause a problem if these are two seperate locations on your machine. After setting up a VM under Vagrant it will appear as if you’ve lost all VMs created outside of Vagrant.

You can fix this by creating a link in your Cygwin home before creating any Vagrant VMs. The links can be created through Windows command shell (but that’s as much as you’ll ever need to do in Windows shell for Vagrant/Chef). These commands should be run as administrator:

cd C:\Windows\system32
mklink C:\bin\cygwin\home\russellcardullo\.VirtualBox C:\Users\russellcardullo\.VirtualBox
mklink "C:\bin\cygwin\home\russellcardullo\VirtualBox VMs" "C:\Users\russellcardullo\VirtualBox VMs"

Or if you hate links you could just relocate your Cygwin home directory to match your Windows home directory. Or relocate your Virtualbox folders from your Windows home to Cygwin home.

You should now have a fully working Vagrant and Chef setup that can be invoked from Cygwin.