Running Drupal 7 on AWS with EFS

In two previous blog posts I talked about running Drupal 7 on AWS:

Since writing part 2 of this topic AWS has finally released Elastic File System (EFS), so I had to write an update for the stack that uses EFS instead of S3.

Elastic File System (EFS)

EFS is a shared nfs filesystem you can attach to one or more EC2 instances. While we can store our user uploaded content in S3 using the Drupal s3fs module, getting the css and js aggregation cache to work over multiple servers was still an issue with S3.

If we use EFS instead of S3, and share the sites/default/files directory over every EC2 instance, we remove that problem.

The source code for this stack is on Github:

  • drupal7-efs.json: A very minimal Drupal 7 stack setup
  • drupal7-efs-realistic: A more realistic Drupal 7 site with a lot of contrib modules. This also uses a Docker hub container image instead of building an image in the Launch Configuration.

I will continue to work on the second one, so you probably want to take that stack.

A short note about this stack and Docker

While this stack uses Docker it is not a complete container management system like ECS is intended to be. Rolling out a new version of a Docker image with this stack is pretty much a manual job: you scale the Auto Scaling Group down to 0 nodes, then scale it up again to the required number. All the new instances that get created that way will have the new version of your Docker image. (or you can scale it up to double the normal size and then scale down again to remove the old instances).

Docker cleanup commands

Running Docker containers also includes a little housekeeping to keep your Docker hosts running optimal and not wasting resources. This blog post provides an overview of which commands you can use.

Currently there are a lot of blog posts and stackoverflow questions that talk about clean up commands for old Docker versions, that are not very useful anymore. In this blog post I will try to keep them updated with newer versions of Docker.

Current Docker version as of 2016/07/20: 1.11 (stable), 1.12 (beta)

Clean up old containers

Originally copied from this blog post: source

These commands can be dangerous! So don’t just copy/paste them without at least having a clue what they do.

# Kill all running containers:
docker kill $(docker ps -q)

# Delete all stopped containers (including data-only containers):
docker rm $(docker ps -a -q)

# Delete all exited containers
docker rm $(docker ps -q -f status=exited)

# Delete ALL images:
docker rmi $(docker images -q)

# Delete all 'untagged/dangling' (<none>) images ():
docker rmi $(docker images -q -f dangling=true)

Clean up old volumes

When a container defines a VOLUME it will not automatically delete this volume when the container gets deleted. Some manual clean up is needed to get rid of these “dangling” volumes.

Originally found on Stackoverflow: source

# List all orphaned volumes:
docker volume ls -qf dangling=true

# Eliminate all of them with:
docker volume rm $(docker volume ls -qf dangling=true)

Running Drupal 7 on AWS - part 2

Update 2016/07/11: AWS has released EFS, which is a better choice for our Drupal 7 setup than using S3. Check this blog post for a newer stack that replaces S3 with EFS.

This blog post continues with an actual code example for the blog post about Running Drupal 7 on AWS - part 1. It provides a full CloudFormation setup to get a full Drupal 7 stack running on AWS.

This is the stack we are creating (click the image for a larger version):

Drupal 7 stack on AWS

All the code referenced here is available in this Github repository: https://github.com/karelbemelmans/drupal7-on-aws

AWS CloudFormation

When you are creating large software stacks, creating it by hand is not an option anymore as that takes too long to setup and is too prone to errors. For this reason AWS has created AWS CloudFormation, their infrastructure-as-a-code service. Check the video on that page for a short introduction.

Sidenote: AWS currently has its own container service called Elastic Container Service (ECS), which we could use since our Drupal 7 site comes in a Docker container. We are however doing it the old school way and will manage our own EC2 instances.

Creating our Drupal 7 stack with CloudFormation

Creating the stack from the drupal7.json file is quite simple:

  • Go to the CloudFormation page on your AWS account
  • Create a new stack, give it a name and select the drupal7.json file
  • Review some of the settings you can change, they should be pretty straight-forward
  • Create the stack and after about 10-15 minutes everything should be up and running

When the stack has been created you will get a value in the Outputs for the WebsiteURL parameter, which is the hostname of the Elastic Load Balancer. The last step to add here would be to create a Route 53 ALIAS record to this name to map it to your real website url.

Surfing to the url will give you an error though, as we have a valid settings file but an empty database. You can either copy your own database now to the RDS server (see the Q&A section how to do that) or simply browse to /install.php and install a fresh copy of Drupal.

Structure of the stack

The biggest piece is the Launch Configuration resource “LaunchConfigurationForDrupalWebServer”. This contains the setup scrip that will be used on the web servers. It installs Docker, generates a Drupal settings.php and builds a new Drupal container that contains this settings.php file.

All the rest is pretty straight-forward AWS stuff: a VPC with 2 subnets, NAT instances for the private subnets, Internet Gateways for the public subnets, a MySQL DB, a memcached instance and an EC2 setup with LC, ASG and ELB.

Some Q&A

How do you ssh into this instance now?

You can’t. You will need to create a bastion (relay) host in a public subnet and assing it a public ip. The web servers run inside the private subnet, which has no direct connection possible from the outside (because of the subnet routing table not using an Internet Gateway). You then ssh to the bastion and then ssh to the instances in the private subnet (or configure ssh forwarding in your local ssh config).

How can I copy my existing database to the RDS db?

Use the bastion host to set up an ssh tunnel. SequelPro for Mac can do this. Or just ssh to the bastion and cat your SQL file to the RDS MySQL instance using the hostname, username and password.

How do I get the logs from the Docker containers in a central location?

Use an rsyslog server in your docker-compose.yml file, like Papertrail:

drupal:
  build: .
  ports:
    - "80:80"
  log_driver: syslog
  log_opt:
    syslog-address: hostname:port
    tag: "drupal"

I can’t seem to send emails, do I need to configure an SMTP server?

Yes. You should configure Amazon Simple Email Service (SES) in Drupal in your settings.php file. You can script this in the Launch Configuration too as you build the settings.php there.

But what about drush in this setup?

Drush is not used here. We don’t want to install it inside the Dockerfile to keep the container as clean as possible. So simple use curl and ADD to download Drupal modules and themes.

In an actual Drupal production site you would also not use the base FROM: drupal:7-apache in your Dockerfile. You would use your own Drupal docker image that contains your full Drupal stack (core, modules, themes, config…) and just overwrite the settings.php file in the Launch Configuration (like already is being done right now).

Todo list

There are still a few things missing for this CloudFormation stack:

  • Use 2 CloudFront distributions:
    • One for the S3 static content
    • One for the ELB so anonymous users also get a cached page
  • Add Papertrail logging to the Docker containers
  • Use more CloudWatch metrics for the Auto Scaling Group adjusments
  • Configure SES so Drupal can send emails

I might add those in the future, but right now these are left as an exercise for you to implement.

Problems with this stack

There are still some problems with running this setup on AWS though:

  • CSS and JS aggregation does not work with the s3fs module
  • Question: Is the session fixation on the ELB the right way to go?

This stack is still a theoretic one, I don’t really use this in production. I’m sure there will be more problems showing up when you actually start using it for a production setup, feel free to use the comments section to point them out and I’ll see if I can find a decent solution for the.

Further reading

While writing this blog post I did a lot of research on writing CloudFormation stacks, and as it usually goes, I found a lot of better examples than the ones I was writing. So, looking back now on my blog post, most of the code of my CF templates comes from the official AWS examples below, so make sure to check them out too. They have a lot of examples for some common stacks, with or without Multi-AZ support, and you can pretty much copy/paste entire stacks as a starting point for your own stacks.

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-whatis-howdoesitwork.html http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-sample-templates.html

Running Drupal 7 on AWS - part 1

The last 5 months I’ve been doing a lot of work on Amazon Web Services (AWS) for my new job as a Cloud Architect at Nordcloud Sweden. Learning how to build applications that take fully advantage of The Cloud has made me very anxious to redo some of my previous projects and rebuild them for AWS. In this blog post I’ll start of with the best way to run a Drupal 7 website on AWS.

While this blog post is written with Drupal 7 as an example, it could easily by adapted for any other PHP based application.

1. The current Drupal server setup

If you are a Drupal builder, you are most likely using a combination of two typical web server setups for your production sites:

  • A Shared Hosting server, where multiple websites run on the same server
  • A dedicated Virtual Private Server (VPS) per website

How you deploy your code and if you use Docker or not is currently not relevant, the main thing is that you have dedicated (virtual or physical) servers that run 24-7 with the exact same hardware configuration.

On these servers you probably have this software stack installed:

  • nginx or apache
  • MySQL/MariaDB database
  • A local disk where your user content gets uploaded to
  • Shell access via ssh so you can run drush and cronjobs
  • Maybe an Apache Solr server for search indexing
  • Maybe a varnish cache in front of the web server
  • Maybe a memcached bin to offload your database

All of this is managed by you, or maybe a hosting company that does it for you, using some kind of provisioning tool like chef or puppet. Making changes to this setup is hard and keeping the setup in sync with your development stack is probably even harder (even when you use Docker).

If you use a managed hosting provider you already got rid of being responsible for the hardware, but you still run the same kind of static server setup that you would have if you did it yourself.

Problems with this setup

Problem 1: There are a lot of single point of failures in this setup: a lot of non-redundant single-instance services are running on the same server. If any component crashes, your entire site is offline.

Problem 2: The CPU/RAM of this server does not scale up or down automatically depending on the server load, there’s always a manual intervention required to make changes to the hardware configuration. If you get an unexpected traffic boost, this might cause your server to go down.

Problem 3: The whole setup is constantly running at full power, no matter what load it’s currently having. This is a waste of resources and even worse, your money.

2. Moving things to AWS

So let’s see now how we can move this setup to AWS, and while doing so, get rid of the problems from the previous paragraph.

When you move this web server setup to the cloud you can basically do it two ways: the wrong way and the right way.

The wrong way: lift and shift

If you just see AWS as another managed hosting provider you could go for the lift and shift solution. In this scenario you re-create your entire server just like you did in the old setup. You run a single EC2 instance (= the AWS equivalent of a virtual server) with your full stack inside of it.

This works of course, but it does not scale, it’s not redundant and it will probably cost you more than running your old setup. So it doesn’t fix any of the problems we’ve described in the previous chapter.

AWS has a tool to calculate the cost of such a move, called the TCO Calculator. Just keep in mind that if you just compare the cloud cost to your own datacenter cost using the same hardware setup you are not using the cloud the right way and you will pay a lot more than you should.

The right way: build your application for AWS

Before we continue to optimize our setup for AWS I have to explain a few AWS concepts that will be important to understand: Managed Services and High-Availability

High-Availability

High-Availability (HA) is a concept that you will see pop up everywhere when using AWS. It’s about not having a single point of failure in your setup by using redundant setups using the tools AWS offers you.

An important part of HA setups is the concept of regions and availability zones (AZ). Each region has several availability zones, which are independent data centers that can communicate with each other as if they were a local network.

Example:

  • Region: eu-west-1 (Ireland))
  • Availability Zones (AZ): eu-west-1a, eu-west-1b, eu-west-1c

Certain things are automatically replicated within the AZ’s for a region (e.g. all the managed services we’ll see in the next topic), but you’re also required to use them intelligently yourself. For a web server setup for example, using EC2 instances, you would create 2 servers, each in a different AZ, and have a EC2 LoadBalancer (which is also HA since it’s a managed service) in front of them. If one of servers goes down, or even the whole AZ, the load balancer will keep working and only send traffic to the server in the AZ that is still working.

In a Lucid Chart diagram this HA setup would look like this:

An example of a High-Availability setup on AWS

AWS Services

AWS Services are simply put your usual services from your software stack, but managed by Amazon. They offer them as high-available software-as-a-service where you don’t have to worry about anything else than using it.

For our Drupal 7 setup we’ll be using these AWS Services:

  • Web servers: Amazon EC2 (EC2 instances, Elastic Load Balancer, Auto Scaling Groups)
  • Database: Amazon RDS (MySQL, MariaDB or even Aurora if you want)
  • Configuration files and User uploaded content: Amazon S3
  • Key/value caching server: Amazon Elasticache (memcached)
  • Reverse proxy content cache: Amazon CloudFront

Now that we have all the AWS tools explained, let’s go build our Drupal 7 site using them.

3. Building Drupal 7 on AWS

To deal with the problems we had when running on a Shared Hosting or VPS server we have to make sure we cover these two items:

  • Our setup needs to have High-Availability: no single point of failure
  • It has to have automatic scaling: scale in and out when needed

Scaling up and down means increasing or decreasing the amount of RAM or CPU cores in a system, while scaling in and out means adding more similar servers to a setup or remove some of them. Scaling in and out obviously only works if you have a load balancer that distributes traffic among the available servers.

Look at this Lucid Chart diagram to get an idea of what the final stack will look like (click for a larger version):

Drupal 7 stack on AWS

Database: AWS RDS MySQL

The database is probably the easiest component to configure in our setup: we simply use an Amazon RDS MySQL instance. We connect to it using the something.amazonaws.com hostname and the username and password we supply.

We can make this stack HA by using the Multi-AZ option. This is not a master-master setup, but a standby instance in a different AZ that will get booted by AWS in the event the main one goes down. You do not need to configure anything for this, AWS will update the ip address of the hostname automatically.

Backups of the RDS instance are taken by using daily snapshots, which will be enabled by default for any RDS database you create.

Upload content: Amazon S3

Update 2016/07/11: AWS has released EFS, which is a better choice for our Drupal 7 setup than using S3. Check this blog post for a newer stack that replaces S3 with EFS.

Since our setup will include web servers running on AWS EC2 that will scale in and out depending on the usage, we cannot have any permanent data inside of them. All the content that gets uploaded by Drupal will have to be stored in a central file storage that is accessible by all web servers: Amazon S3.

Drupal can not use S3 out of the box, but there are https://drupal.org/download available to achieve this. When writing this blog post I was still experimenting which one was best suited for the task, I’ll update this post later on with my findings.

While S3 has versioning support, it’s not a bad idea to have a second AWS account copy all the files from S3 every day, hour or even when they get created.

Besides the user uploaded content we will store another type of files in S3: configuration files used by instances and load balancers. More about this later.

Memcached: AWS Elasticache

There’s not much to say about Elasticache. Simply create a memcached server and configure your instances to use it.

Caching: AWS CloudFront

CloudFront is Amazon’s CDN service with edge locations all over the world. The most important though to know here is that invalidating requests is not easy and you should pretty much rely on your Drupal site setting the correct cache headers for each request it serves. If you need to clear your entire cache, it might be easier (and cheaper) to just create a new CloudFront distribution and delete the old one.

We use CloudFront like you would use any other cache: just put it in front of the web server. In this case it will be put in front of the Elastic Load Balancer (see next topic) with the DNS record for our site pointing to the CloudFront distribution.

Web servers and AutoScaling: Amazon EC2

Now we get to the core of the setup: the actual web servers. We will be using a set of AWS EC2 services to accomplish that task.

Let’s start with pointing out that our Drupal code is in a Docker container, pushed to a (public or private) repository. The EC2 instances can reach the registry and can check out the images without authentication.

Configuring our EC2 instances will be done by something called a Launch Configurations. A Launch Configuration can be best seen as a configuration file that will be used by an Auto Scaling Group to create servers. The Launch Configuration contains the base server image to be used, the type of EC2 instance to be used, some other things I will not go into detail here, and most important: the user-data script.

The user-data script is simply a bash shell script that we will use to install the required software on the web servers:

  • Install certain OS packages we need (e.g. aws-cli, docker)
  • Install extra packages using simple curl commands (e.g. docker-compose)
  • Configure rsyslog monitoring (if we don’t use it via docker-compose)
  • More things as you like
  • And as last step: start the Drupal Docker container.

The user-data script will also handle the creation of a custom settings.php file for Drupal. It will overwrite the default one inside the Docker Drupal container with our values for the datasbase, the memcached server, etc…

This Launch Configuration will now be used by an Auto Scaling Group (ASG) to fire up a set of instances. This ASG can become intelligent if you connect it to AWS Cloud Watch, where it will create or remove instances by monitoring certain metrics (server load, RAM usage,…) but it can also be quote simple as to just have a single web server running in each available AZ all the time.

The third component in our web server setup is an Elastic Load Balancer (ELB). The ASG will create servers and the ELB distributes traffic between them and performs the health checks. If a server becomes unhealthy, the ELB will remote it from the rotation and kills it. The ASG will create a new one which will then be picked up by the ELB again and put into the load balancing rotation.

Together these 3 services - LC, ASG and ELB - create a setup that scales in and out when needed, exactly what we wanted for our Drupal 7 setup.

If this all sounds a bit difficult to visualize, check the AWS Auto Scaling article for a longer explanation with some examples.

Route 53

Route 53 is AWS’s DNS service. While you can use any DNS service you want and just point the CNAME records to AWS hostnames I strongly recommend using Route 53. Because AWS internally updates ip address all the time, using a CNAME record might give you situations where DNS lookups can go to the wrong ip.

To deal with this issue, AWS has created an ALIAS record where you can point to an internal AWS resource (ELB, CloudFront distribution, S3 location, …) and won’t be affected by any downtime when ip address change.

4. Did we solve all our problems?

Now that we’ve listed all the services we will be using to build our Drupal site, did we actually meet all the requirements we set out to achieve?

High Availability

Do we have a High Available setup with no single points of failure? Yes. We either use Amazon HA services or we create services in 2 AZ’s at all times.

Auto Scaling

Is this a setup that automatically scales in and out without manual interaction? Yes. The combination of Launch Configuration, Auto Scaling Groups and Elastic Load Balancers takes care of that.

5. Cost

We have managed to turn our Drupal stack into a high available, auto scaling setup, but what will this cost us to run this? To get that cost, we use the Simple Monthly Calculator that Amazon provides.

Before we start calculating we have to make some decisions about which instance types and data usage we will be talking.

This is a very basic setup for now. I’m not going into detail about instance types, snapshot storage space, CloudWatch monitoring, etc… Adding these will of course increase the total cost of running your site on AWS.

  • We use a db.t2.medium 10GB MySQL RDS database, with the Multi-AZ option
  • An S3 bucket that contains 500GB of files
  • A cache.t2.micro memcached instance, which is more than enough for our setup
  • A CloudFront bucket that has worldwide edge location coverage

Our EC2 setup is as follows:

  • We use 2 Availability Zones
  • In each AZ we create a t2.small (1 CPU core, 2GB RAM) web server with a 30GB EBS root disk
  • One Launch Configuration that handles creating the EC2 instances
  • One Auto Scaling Group that scales out instances in pairs, one per AZ
  • One Elastic Load balancer, created in both AZ’s

We expect about 100GB traffic per month to our site.

As you can see I’m using only small instance types for this calculation. Don’t go too big too fast, our scaling setup adds more capacity than in a setup where you would have only one big instance.

For price calculation I’m taking the EU West-1 region (Ireland). Prices vary between different regions, so this is not a complete picture. But still, you should go for the region that is closest to your customers and has all the services you need (e.g. in Europe the Frankfurt region does not have all the services Ireland currently offers).

Adding all of this into AWS’s Monthly Cost tool gives you this number: $75,74 (€66,70) a month or (see calculation details).

6. Conclusion

I hope this blog post was a good example to show you have to optimize your Drupal site for AWS. It can easily be applied to Drupal 8 or any other PHP application, as long as you focus on the important goals of this setup: High Availability and Auto Scaling.

7. Next steps

Even though this is a lengthy blog post, there are still a lot of topics I haven’t covered yet. There are many more AWS services you can use to monitor, scale and build your application. I also haven’t addressed how you should run cron jobs or nightly import/sync tasks in a setup like this. This is all stuff for upcoming blog posts.

Drupal 7 on AWS Part 2: CloudFormation

This blog post focused on the “how?” and “why?” running Drupal on AWS. Part 2 is an actual example of such a setup, with a complete infrastructure setup provided as a CloudFormation stack. CloudFormation is AWS’s Infrastructure-as-a-code tool, something you should definitely should be using for any large software stack.

Getting started with Amazon Web Services (AWS)

Getting started with AWS is actually quite simple. The best way is to start learning for these 5 AWS Certification exams:

Associate (beginner) level:

Professional level:

Don’t get blinded by the names, no matter what your job title is you should learn about all 3 facets of AWS: architect, developer and sysop. So just do all the exams.

The associate (beginner) ones you can start learning straight away if you have some basic experience with server setups. The professional ones you should take if you have at least one or two years experience of working with AWS.

A good place to learn for these exams is A Cloud Guru. They have a bundle that gives you lifetime access to training material for all 5 exams and it only costs you $189.

A Docker Drupal 8 deployment container

Update 2016/07/06: I’ve restarted working on this project. The README on Github should have all the needed information.

I wrote a small example project that creates a deployable Drupal 8 container: https://github.com/karelbemelmans/docker-drupal8

This can of course be used by any PHP project, so this is just an example using Drupal 8.

Datadog php-fpm monitoring via nginx

It took me some time to get this set up properly, but here are my configs that finally worked to get php-fpm monitoring using datadog.

1. nginx vhost config

First, make sure you override your site’s hostname to localhost. For my site this is to make sure connections don’t go out to cloudflare but stay local on the server: /etc/hosts needs to contain this line.

127.0.0.1 www.karelbemelmans.com karelbemelmans.com

I use my port 80 vhost config for the status page. Cloudflare enforces SSL so this vhost never gets used for anything non-local on my server.

server {
  listen 80;
  listen [::]:80;
  server_name www.karelbemelmans.com karelbemelmans.com;
  server_name www.karelbemelmans.be karelbemelmans.be;
  access_log /var/log/nginx/www.karelbemelmans.com/access.log main;
  error_log /var/log/nginx/www.karelbemelmans.com/error.log error;
  location ~ ^/(status|ping) {
    access_log off;
    allow 127.0.0.1;
    deny all;
    include fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_pass php_www.karelbemelmans.com;
  }
}

I use “php_www.karelbemelmans.com” as fastcgi_pass value. This is defined in another file on the conf.d dir and matches the socket definition in the php-fpm pool config file.

/etc/nginx/conf.d/upstream-www.karelbemelmans.com.conf:

upstream php_www.karelbemelmans.com {
 server unix:/var/run/php-fpm/www.karelbemelmans.com.sock;
}

(This setup is inspired by Mattias’ config for the Nucleus customers)

2. php-fpm pool config

Make sure these 3 lines are present in your php-fpm pool config:

pm.status_path = /status
ping.path = /ping
ping.response = pong

Normally your acl should be fine as requests will come from localhost.

3. Datadog config file

So nginx and php-fpm are configured, all we have left is the datadog config file:

/etc/dd-agent/conf.d/php_fpm.yaml:

init_config:
instances:
 - # Get metrics from your FPM pool with this URL
   status_url: http://karelbemelmans.com/status
   # Get a reliable service check of your FPM pool with that one
   ping_url: http://karelbemelmans.com/ping
   # Set the expected reply to the ping.
   ping_reply: pong

Reload nginx, php-fpm and datadog-agent after that and your php-fpm tracking should now work. This tracks only 1 pool, it’s up to you to figure out how to track multiple pools now :)

Useful bash one liners

Disclaimer: These are extremely simplified one liners that do not perform any form of input validation or character escaping. You should not use this in a ‘hostile’ environment where you have no idea what the input might be. You have been warned.

This is a list of some bash one liners I use on a daily basis during development and problem debugging. I made this list as a “you might not know this one yet” and will continue to update it every now and then.

Latest update: 2016/06/18.

Running a command on multiple files at once

This is a basic structure we will be re-using for the other examples. Run a command on all files in a directory:

for FILE in $(ls *); do command $FILE; done

Run a command for all lines in a file:

for LINE in $(cat file.txt); do command $LINE; done

Warning: As noted in the comments, this assumes there are no spaces in the lines in your file. If they do contain spaces, you need to add proper escaping using quotes.

Make certain things easier to read

Format a local XML file with proper indenting:

xmllint --format input.xml > output.xml

Run this script on all XML files in a directory:

for FILE in $(ls *.xml); do xmllint --format $FILE -o $FILE; done

Monitor new lines at the end of a log file and colorize the output (requires the package ccze):

tail -f /var/logsyslog | ccze

Find a specific text in a lot of files

Find a text inside a list of files and output the filename when a match occurs. Recurse and case-insensitive:

grep -irl foo*

Count the amount of files in a directory:

cd dir; ls -1 | wc -l

Find a filename that contains the string “foo”:

find ./ -name *foo*

Find all files modified in the last 7 days:

find ./ -mtime -7

And similar all files that have be modified more than 7 days ago:

find ./ -mtime +7

Modify files

chmod 644 all files in the current directory and below:

find . -type f -exec chmod 644 {} \;

chmod 755 all directories in the current directory and below:

find . -type d -exec chmod 755 {} \;

Commandline JSON formatting and parsing

The JQ command is a must-have for anything that returns JSON output on the command line:

curl url | jq '.'

I use this for finding the latest snapshot in a snapshot repository for elasticsearch:

curl -s -XGET "localhost:9200/_snapshot/my_backup/_all" | jq -r '.snapshots[-1:][].snapshot'

Actions against botnets and spammers

Find a list of all bots using a guestbook script to spam a site (that sadly has no captcha). I run this on the apache access_log file:

cat access_log | grep POST | grep guestbook | awk '{print $1}' | sort | uniq > ips.txt

The ips.txt file will now contain a list of unique ip addresses I want to ban with iptables:

for IP in $(cat ips.txt ); do iptables -I INPUT -s $IP -j REJECT; done

Cleanup stuff

Delete all but the 5 most recent items in a directory. I use this in Bamboo build scripts to clean up old releases during a deployment:

ls -1 --sort=time | tail -n +6 | xargs rm -rf -

That’s all for now!

Automated Drupal 7 deployments with Atlassian Bamboo

Update 2016/06/18: I finally fixed the markup of this post after migrating from WordPress to Hugo. I also fixed some typo’s and updated this post with some current information. Even though almost a year has passed since writing it, this post is still relevant for the current 5.12 version of Bamboo.

If you are reading this post you probably already know what automated deployment is and why it’s important. I’ll probably write a blog post about that subject in the near future but first I’m going to write this one about doing automated Drupal 7 deployments with Atlassian Bamboo.

This blog post is not going to be a discussion what the best deployment system is. Some people like capistrano, some people like jenkins, some people like Bamboo. For us at Nascom Bamboo works pretty well because it integrates perfectly with JIRA and Bitbucket, allowing us to view linked JIRA issues when making a build and see what on wich environment a JIRA issue has been deployed.

This is a pretty big blog post, so take your time to go through it.

Bamboo in action

Before I dive into the details about the setup I’ll first show you a 4 minute screencast with some minimal comments how the whole setup works. This gives a nice global picture so you have a better understanding of the steps that come next.

Prerequisites

For this blog post I’m going to make some assumptions about your development setup and Drupal site structure:

  • You have a running Bamboo server, either as a dedicated server or just running on your own computer. The version I’m using for this blog is 5.9.3 build 5918 - 28 Jul 15. You could also use the Bamboo On Demand service from Atlassion if you don’t want to setup your own server.
  • The Bamboo server can access your deployment target environment over an ssh connection.
  • Our Drupal source code is in a git repository that can be accessed by the Bamboo server. In my case it’s hosted on Bitbucket
  • Your Drupal site uses the env.settings.php structure I described in a blog post on the site
  • Drush has been installed on the target environment so we can run drush cc all and drush updatedb if needed.

Right, now let’s dive into the setup.

Bamboo concepts

First you need to get used with some Bamboo concepts (also see the official Atlassian Bamboo documentation):

Build plans and artifacts

A build plan is the process that generates a Bamboo artifact. An artifact is something that can be deployed later, most of the time an executable or a jar file when you are talking about software that compiles, but for our Drupal site this will simply be a compressed tar file called drupal.tar.gz that contains the Drupal source code.

Build plans are composed of three pieces: Stages, Jobs and Tasks. If you look at the graph below it should be clear how those three fit inside each other:

Bamboo Build Plan Anatomy

  • Stages execute sequentially (e.g. a Testing Stage, a Package Building stage). If phase x fails, the build process will halt and phases after x will not be executed.
  • A stage consists of jobs that can be executed in parallel (e.g. multiple types of tests in a testing stages that can run at the same time)
  • A job consists of multiple tasks that run sequentially. The first task will always be doign a source code checkout and then the next tasks use this checked out code to do some magic.

As I wrote above, the result of a build plan will be an artifact that we can use for deployment later on.

Releases, deployment plans & environments

Now that we have a build plan that produces an artifact, our drupal.tar.gz file, we need to get that deployed to our servers. We can use releases and deployments plans to achieve that:

  • Releases are simply tagged successful build plans. E.g. build #65 has been tagged as release-2.2.0.
  • A deployment plan is simply a list of environments.
  • An environment has a list of tasks that will be executed sequentally to deploy a release’s artifact to the environment.

A real life example project

I’m going to take my own Narfum IT Services website as an example deployment project. It’s a Drupal 7 site that will be deployed to a staging and a production environment.

Update 2016/06/18: This narfum.eu site is now offline, but the example is still valid.

A Drupal site can always be split into three pieces:

  • Drupal PHP source (Drupal core + all contrib and custom modules and themes)
  • Database
  • User uploaded content (sites/default/files)

If you follow my env.settings.php setup structure for Drupal 7, it’s easy to keep these 3 separated.

Our Bamboo deployment plan will only handle the first item, the Drupal PHP source. This codebase will be stored in our version control system.

The database will most of the time be deployed one time, and then updated via update hooks in Drupal. These update hooks will be run by our deployment plan (via drush updatedb), so it’s not needed to include an automated database deployment.

The user uploaded content is located outside of the Drupal PHP directory, so we can just leave that alone during deployment and just make a new symlink to it. Sidenote: you never commit this content to your version control software!

The build plan

What a build plan comes down to in practice is simply put:

  1. Download the source code
  2. Do some local modification to those files
  3. Package the result as an artifact.

This means that whatever you put in git, is not necessarily going to end up on your deployment environment. For a Drupal website this means we can do a lot of handy things during the build phase:

  • Remove unwanted files developers forgot to remove from git (e.g. remove CHANGELOG.txt and the other .txt files)
  • Compile SASS code to CSS in production mode instead of development mode
  • Remove sources and development files (SASS code, maybe PSD’s you added in git, development modules like devel and coder )
  • Add modules from another repository you need to make sure exist on production sites (e.g. modules like prod_check which is nice to have on a production environment)
  • Or ofcourse remove development modules that should not be deployed to production (e.g. devel, coder, …)
  • And maybe other things specific to your project

It will take you some time to setup all of this and make it error proof, but after that you have a fully automated build system that will never forget a single thing!

Creating a build plan

Ok, let’s start by creating a Build plan. From the “Create” menu at the top chose “Create new plan” and fill in the fields like in the screenshot below. Do not chose a version control system here yet, we’ll add that later. (If you add it here, it will be a global repository and we don’t want that).

(Click the image for a larger version)

Create a new Bamboo build plan

On the next screen just check “Yes please!” to enable the plan and click Create. We will add the tasks later, we just want an empty build plan for now. When we have our empty build plan, go to “Actions” on the right side and chose “Configure plan”. You will get the screen below (click the image for a larger version):

Configure a Bamboo build plan

As you can see Bamboo has made some default items for us: A “Default stage” stage with one job called “Default Job”. We will use these defaults for this example and just add tasks inside this one job.

Connect our git repository

As we need to do git checkouts in more than one task, we will add our Drupal git repository as a local repository for this project. On the build plan configuration page go to the “Repositories” tab and click “Add repository”:

(Click the image for a larger version)

Add a source code repository

It should be pretty obvious what you need to fill in here.

The easiest way to connect to Bitbucket is with the “Bitbucket” option, but that requires entering a password and I don’t like that. So I always chose “Git”, enter the ssh location for the Bitbucket repository and use an ssh private key to authenticate. But chose whatever method works for you.

It’s important that you chose the “master” branch here as that will be the main branch for our builds. Master should always be the code that goes onto production, so try to keep that best practice for your projects too.

If you want to read about a proper git branching model for your development, be sure to checkout the Git branching model.

Add build tasks

The last thing we have to do now is add tasks that will actually do things for us. Below is a screenshot of the real Narfum project (that has 2 stages instead of 1 but we will ignore the test stage for now) where I’m currently showing you the “Package Drupal” job.

(Click the image for a larger version)

Package Drupal job

There are 3 jobs:

  • A “Source Code Checkout” task: Checkout code from Bitbucket to a local directory on the build server
  • A “Script” task: Do some magic (in this case simply compile sass code to css)
  • A “Script” task: Make a tar.gz file

The first step will always be a source code checkout. Remember that jobs can run in parallel and they are sandboxed in their own directory. So one job does not know about another job’s files, meaning you always have to checkout files (or important an artifact) as the first task in a job.

The 3 tasks in detail:

Task 1: Checkout source code from our repository.

These files will be downloaded in the root directory of our job and will be available to be modified for the remaining tasks.

(Click the image for a larger version)

Task 1: Source Code Checkout

Task 2: Magic

Once we have the Drupal code, we can do a lot of things to modify this code. We keep it simple here and just do a production compile of the SASS files for our theme:

(Click the image for a larger version)

Task 2: Magic

Notice the “Working sub directory” at the bottom! This points to the main theme directory.

Task 3: Create Drupal tarball

The last task is always creating a Drupal tarball of our files now that we’re done with modifying them:

(Click the image for a larger version)

Task 3: Create tarball

I prefer to exclude files like the “node_modules” directory rather than removing them so a new build won’t have to download them all again. (I know they are cached in the bamboo user’s homedir yes, but it’s the idea that counts here: we don’t want to re-do too many things for new builds).

After those 3 tasks are done, we will have a drupal.tar.gz file in our root directory. We now need to make this available as a shared artifact so our deployment plan can use it.

The last build step: create the artifact

In the “Package Drupal” stage, go to “Artifacts” and add a new artifact definition:

Artifact overview page:

(Click the image for a larger version)

Artifact overview

Artifact detail page:

Artifact definition

Make sure the “Shared” box is checked, otherwise it will not be available to our deployment project!

And that’s all there is do to for a Drupal build plan. If you run this build now from the “Run” menu and then “Run plan” you should get a green page saying the build was successful. You will also be able to download the artifact manually at the bottom of the page.

This next screenshot is an example build result page from the Narfum website project. You can ignore the right upper box for now, in your project that will be empty as you don’t have a linked deployment project yet:

(Click the image for a larger version)

A successful Bamboo build

The deployment plan

Still with me after the build plan setup? Good. Because now it’s time to deploy our code to an actual environment.

Create a deployment plan

A deployment plan is nothing more than a container for multiple deployment environments. From the “Create” menu chose “Create deployment project” and fill in the screen like in the screenshot below. Make sure you select the right build project you are attaching to this deployment plan:

(Click the image for a larger version)

Create a deployment plan

After that you will see the configuration page of our empty deployment plan.

Creating enviroments

Now chose “Add environment” and simply give it a clear name. I always go for the structure “$hostingprovider - $env_type” so this could be “AWS - Production”. Click on “Create and configure tasks”.

This will be our production environment but you can of course add a staging environment and testing environment too. Using the “Clone environment” option after our production environment is finished this is very fast to setup.

You should now see the screen below. This is a similar list of tasks you saw on the job configuration page for a build plan:

Create deployment tasks

There are 2 tasks made for you already, which you should always leave as first two tasks for your deployment project: the clean working directory and the artifact download. These tasks make sure you have an empty work directory with just our drupal.tar.gz artifact file.

The next tasks will then be using this drupal.tar.gz file and get it on our target environment. The exact tasks in our deployment will be:

  • Copy the tarball to the target environment via scp
  • Extract the tarball to a new release directory
  • Added the needed symbolic links
  • Run databases updates (if needed)
  • Set this new release as the new live version
  • Run cache clears
  • Clean up older releases

This is what it will look like when we’ve set up all these tasks (I’ve switched to my Narfum website deployment plan again now for these screenshots):

Completed deployment environment tasks

Environment variables

Before we continue with the tasks, we first need to setup some variables. End your task setup process and go back to the deployment plan page. You will get an incomplete task warning but just ignore that for now.

Unfinished deployment environment

Click the “Variables” button at the bottom and add the variables “deploy.hostname” and “deploy.username” with the values needed for your server:

Deployment environment variables

We can now go back to configure our environment tasks.

Environment tasks

Remember that tasks inside a job can halt the deployment process if they fail? That’s the main reasons we split up all these things into separate tasks.

Task 1: Copy the artifact to the remote server

This is adding a “SCP Task” where you simply copy the artifact to the remote server. We can use the variable “deploy.hostname” as “${bamboo.deploy.hostname}” inside tasks, the same goes for “deploy.username”.

I’m also not using a password but ssh keys to login to the remote server. Sadly you have to upload the private key in every task, this is one of the few shortcomings Bamboo still has.

Task 1: Copy the artifact to the remote server

Task 2: Extract the tarball on the remote server

This tasks uses the “SSH Task” type we will be using for the rest of the tasks. It simply allows you the enter shell commands that will be executed on the remote server over an SSH connection.

This task makes a new release directory inside the “releases” directory on the server, extracts the tarball there and then deletes it again.

Task 2: Extract the tarball on the remote server

In this task we add the symbolic links to our env.settings.php file and our sites/default/files content. See this blog post how and why we do this.

Task 3: Update symbolic links

Task 4: Databases updates

This task is currently not present for my project, but you can easily add it here yourself. Make the same SSH Script as above and use whatever drush commands you would like.

The idea for this task is:

  • Your Drupal code has all the needed hook_update_xxx() to upgrade your database schema, enable modules, set variables etc
  • Bamboo runs a simple drush updatedb command and all those update hooks get executed

Task 5: Set this new version as the live version

This simply makes the “www” folder, which is the Apache or nginx document root, a symbolic link to the newly uploaded release folder:

Task 5: Set the new uploaded version as the live version

Task 6: Cache clear

Because this is always a good thing to do.

For most of my project I also do a sudo php-fpm reload here, to make sure the PHP opcache is cleared, but permission to execute that command needs to be set up on your server first and is outside the scope of this blog post.

Task 6: Cachec clear

Task 7: Clean up older releases

This is a nice to have task. For production environments we mostly do this manually when the server raises a disk space warning, but for testing and staging environments this can be automated.

This script only keeps the last 5 recent deployments (determined by the timestamp of the release folder) and deletes the rest. The chmod command is needed because Drupal removes the write flag from the sites/default folder:

Task 7: Clean up older releases

And that’s all for the environment tasks.

Running a deployment

Now that we have a working build plan with a linked deployment plan we can run a deployment. The steps we have to do are always:

  • Push your code to the master branch.
  • Run your build plan.
  • If the build is successful create a release on the build result page. Otherwise fix your code and go back to step 1.
  • Deploy this release to an environment.
  • Check if everything is working

You probably made some errors in your config along the way. Luckily Bamboo will show you a nice big log file where you can debug your problem, so go ahead and test with your own projects now. Your automated Drupal deployment setup is now finished!

Sidenote: Using Triggers it’s possible to automate deployments whenever a build runs successful. That might be a good thing to do for automated deployment to a dev or test environment, but for a production environment you still want to keep that a manual task.

Room for improvement

This blog post of course only shows a very simple deployment setup. To keep this blog post somewhat short I only covered the basic steps in creating the whole deployment setyp. It’s up to you to extended these build and deployment plans for your own project.

Here are a few pointers what can still be improved:

  • Use build scripts (e.g. Ant or Maven tasks with a build file) instead of Bamboo SSH scripts for you tasks. This makes re-use of deployment scripts easier and also adds them to your version control system instead of being hardcoded into Bamboo. Bamboo has special tasks for running these build scripts.
  • Add more tests in the build and deployment phase. Make them proper Unit tests and Bamboo will display them in a special tab in a build, making it easy to see how many of your tests failed.
  • While it’s not possible to run actual tests during deployment phases, you can write deployment tasks that have a fail status (when the script exit code is any other number than 0) to halt a deployment that didn’t got as expected.
  • Almost every step of a build or deployment can have triggers and notifications. You can use these to schedule builds, automate deployments on a successful build and to send out mails or Hipchat/Slack notices when a build or deployment has succeeded.
  • Add more branches. A build plan can have multiple branches so you can build your project from other branches than the master branch. Bamboo can even auto-detect and auto-build these branches using triggers.

There is also a big marketplace of plugins for Bamboo, free and commercial ones, that make your life easier.

Newer versions of Bamboo will most likely add more useful features, so make sure you keep upgrading your Bamboo installation to the latest version.

The end.

That’s all folks, I hope you learned something useful from this post. Use the comments section if you have any questions or remarks!

A better alternative for using phpMyAdmin

Almost every week I run into at least one production site that has a phpMyAdmin installed in the document root of the site, or as a separate vhost on the server. While this used to be pretty required in 2005 to make changes to the database in production, now in 2015 we have better ways to do that.

The problem

The reason phpMyAdmin is installed on the website is that the MySQL server only listens on localhost via a UNIX socket, or on the loopback interface 127.0.0.1 via tcp. That way it’s impossible to connect to it from a remote address.

The bad thing about this is that we have an extra web application on our site we need to take care of. These phpMyAdmin installs are often never updated and might contain security issues that allow attackers to gain access to you production database.

A better alternative: SSH tunnels

If your server is reachable via SSH (even via a VPN connection) we can use a better method: SSH tunnels.

How this works is pretty simple:

  • We connect to our server via an SSH connection
  • Over this SSH connection we set up a tunnel with a port forward that allows a SQL client on our own computer to use the remote database as if it was a local connection

This might sound complicated, but there are a lot of SQL clients available that do this SSH tunneling for you. Below is a screenshot from Sequel Pro for OSX:

Sequel Pro for Mac OSX

You can see 2 things here:

  • the MySQL connection (which always connects to 127.0.0.1)
  • the SSH connection (which is your normal SSH login)

Once this connection has been setup the SQL client works just as it would on a local connection.

SQL clients that support tunneling

These are the clients I use on a daily basis:

  • OSX: Sequel Pro
  • Linux, Windows & OSX: MySQL Workbench
  • And ofcourse using the mysql commandline program in an SSH connection (mostly via drush sql-cli when it’s a Drupal website)