Running Drupal 7 on AWS - part 2

Update 2016/07/11: AWS has released EFS, which is a better choice for our Drupal 7 setup than using S3. Check this blog post for a newer stack that replaces S3 with EFS.

This blog post continues with an actual code example for the blog post about Running Drupal 7 on AWS - part 1. It provides a full CloudFormation setup to get a full Drupal 7 stack running on AWS.

This is the stack we are creating (click the image for a larger version):

Drupal 7 stack on AWS

All the code referenced here is available in this Github repository: https://github.com/karelbemelmans/drupal7-on-aws

AWS CloudFormation

When you are creating large software stacks, creating it by hand is not an option anymore as that takes too long to setup and is too prone to errors. For this reason AWS has created AWS CloudFormation, their infrastructure-as-a-code service. Check the video on that page for a short introduction.

Sidenote: AWS currently has its own container service called Elastic Container Service (ECS), which we could use since our Drupal 7 site comes in a Docker container. We are however doing it the old school way and will manage our own EC2 instances.

Creating our Drupal 7 stack with CloudFormation

Creating the stack from the drupal7.json file is quite simple:

  • Go to the CloudFormation page on your AWS account
  • Create a new stack, give it a name and select the drupal7.json file
  • Review some of the settings you can change, they should be pretty straight-forward
  • Create the stack and after about 10-15 minutes everything should be up and running

When the stack has been created you will get a value in the Outputs for the WebsiteURL parameter, which is the hostname of the Elastic Load Balancer. The last step to add here would be to create a Route 53 ALIAS record to this name to map it to your real website url.

Surfing to the url will give you an error though, as we have a valid settings file but an empty database. You can either copy your own database now to the RDS server (see the Q&A section how to do that) or simply browse to /install.php and install a fresh copy of Drupal.

Structure of the stack

The biggest piece is the Launch Configuration resource “LaunchConfigurationForDrupalWebServer”. This contains the setup scrip that will be used on the web servers. It installs Docker, generates a Drupal settings.php and builds a new Drupal container that contains this settings.php file.

All the rest is pretty straight-forward AWS stuff: a VPC with 2 subnets, NAT instances for the private subnets, Internet Gateways for the public subnets, a MySQL DB, a memcached instance and an EC2 setup with LC, ASG and ELB.

Some Q&A

How do you ssh into this instance now?

You can’t. You will need to create a bastion (relay) host in a public subnet and assing it a public ip. The web servers run inside the private subnet, which has no direct connection possible from the outside (because of the subnet routing table not using an Internet Gateway). You then ssh to the bastion and then ssh to the instances in the private subnet (or configure ssh forwarding in your local ssh config).

How can I copy my existing database to the RDS db?

Use the bastion host to set up an ssh tunnel. SequelPro for Mac can do this. Or just ssh to the bastion and cat your SQL file to the RDS MySQL instance using the hostname, username and password.

How do I get the logs from the Docker containers in a central location?

Use an rsyslog server in your docker-compose.yml file, like Papertrail:

drupal:
  build: .
  ports:
    - "80:80"
  log_driver: syslog
  log_opt:
    syslog-address: hostname:port
    tag: "drupal"

I can’t seem to send emails, do I need to configure an SMTP server?

Yes. You should configure Amazon Simple Email Service (SES) in Drupal in your settings.php file. You can script this in the Launch Configuration too as you build the settings.php there.

But what about drush in this setup?

Drush is not used here. We don’t want to install it inside the Dockerfile to keep the container as clean as possible. So simple use curl and ADD to download Drupal modules and themes.

In an actual Drupal production site you would also not use the base FROM: drupal:7-apache in your Dockerfile. You would use your own Drupal docker image that contains your full Drupal stack (core, modules, themes, config…) and just overwrite the settings.php file in the Launch Configuration (like already is being done right now).

Todo list

There are still a few things missing for this CloudFormation stack:

  • Use 2 CloudFront distributions:
    • One for the S3 static content
    • One for the ELB so anonymous users also get a cached page
  • Add Papertrail logging to the Docker containers
  • Use more CloudWatch metrics for the Auto Scaling Group adjusments
  • Configure SES so Drupal can send emails

I might add those in the future, but right now these are left as an exercise for you to implement.

Problems with this stack

There are still some problems with running this setup on AWS though:

  • CSS and JS aggregation does not work with the s3fs module
  • Question: Is the session fixation on the ELB the right way to go?

This stack is still a theoretic one, I don’t really use this in production. I’m sure there will be more problems showing up when you actually start using it for a production setup, feel free to use the comments section to point them out and I’ll see if I can find a decent solution for the.

Further reading

While writing this blog post I did a lot of research on writing CloudFormation stacks, and as it usually goes, I found a lot of better examples than the ones I was writing. So, looking back now on my blog post, most of the code of my CF templates comes from the official AWS examples below, so make sure to check them out too. They have a lot of examples for some common stacks, with or without Multi-AZ support, and you can pretty much copy/paste entire stacks as a starting point for your own stacks.

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-whatis-howdoesitwork.html http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-sample-templates.html

Running Drupal 7 on AWS - part 1

The last 5 months I’ve been doing a lot of work on Amazon Web Services (AWS) for my new job as a Cloud Architect at Nordcloud Sweden. Learning how to build applications that take fully advantage of The Cloud has made me very anxious to redo some of my previous projects and rebuild them for AWS. In this blog post I’ll start of with the best way to run a Drupal 7 website on AWS.

While this blog post is written with Drupal 7 as an example, it could easily by adapted for any other PHP based application.

1. The current Drupal server setup

If you are a Drupal builder, you are most likely using a combination of two typical web server setups for your production sites:

  • A Shared Hosting server, where multiple websites run on the same server
  • A dedicated Virtual Private Server (VPS) per website

How you deploy your code and if you use Docker or not is currently not relevant, the main thing is that you have dedicated (virtual or physical) servers that run 24-7 with the exact same hardware configuration.

On these servers you probably have this software stack installed:

  • nginx or apache
  • MySQL/MariaDB database
  • A local disk where your user content gets uploaded to
  • Shell access via ssh so you can run drush and cronjobs
  • Maybe an Apache Solr server for search indexing
  • Maybe a varnish cache in front of the web server
  • Maybe a memcached bin to offload your database

All of this is managed by you, or maybe a hosting company that does it for you, using some kind of provisioning tool like chef or puppet. Making changes to this setup is hard and keeping the setup in sync with your development stack is probably even harder (even when you use Docker).

If you use a managed hosting provider you already got rid of being responsible for the hardware, but you still run the same kind of static server setup that you would have if you did it yourself.

Problems with this setup

Problem 1: There are a lot of single point of failures in this setup: a lot of non-redundant single-instance services are running on the same server. If any component crashes, your entire site is offline.

Problem 2: The CPU/RAM of this server does not scale up or down automatically depending on the server load, there’s always a manual intervention required to make changes to the hardware configuration. If you get an unexpected traffic boost, this might cause your server to go down.

Problem 3: The whole setup is constantly running at full power, no matter what load it’s currently having. This is a waste of resources and even worse, your money.

2. Moving things to AWS

So let’s see now how we can move this setup to AWS, and while doing so, get rid of the problems from the previous paragraph.

When you move this web server setup to the cloud you can basically do it two ways: the wrong way and the right way.

The wrong way: lift and shift

If you just see AWS as another managed hosting provider you could go for the lift and shift solution. In this scenario you re-create your entire server just like you did in the old setup. You run a single EC2 instance (= the AWS equivalent of a virtual server) with your full stack inside of it.

This works of course, but it does not scale, it’s not redundant and it will probably cost you more than running your old setup. So it doesn’t fix any of the problems we’ve described in the previous chapter.

AWS has a tool to calculate the cost of such a move, called the TCO Calculator. Just keep in mind that if you just compare the cloud cost to your own datacenter cost using the same hardware setup you are not using the cloud the right way and you will pay a lot more than you should.

The right way: build your application for AWS

Before we continue to optimize our setup for AWS I have to explain a few AWS concepts that will be important to understand: Managed Services and High-Availability

High-Availability

High-Availability (HA) is a concept that you will see pop up everywhere when using AWS. It’s about not having a single point of failure in your setup by using redundant setups using the tools AWS offers you.

An important part of HA setups is the concept of regions and availability zones (AZ). Each region has several availability zones, which are independent data centers that can communicate with each other as if they were a local network.

Example:

  • Region: eu-west-1 (Ireland))
  • Availability Zones (AZ): eu-west-1a, eu-west-1b, eu-west-1c

Certain things are automatically replicated within the AZ’s for a region (e.g. all the managed services we’ll see in the next topic), but you’re also required to use them intelligently yourself. For a web server setup for example, using EC2 instances, you would create 2 servers, each in a different AZ, and have a EC2 LoadBalancer (which is also HA since it’s a managed service) in front of them. If one of servers goes down, or even the whole AZ, the load balancer will keep working and only send traffic to the server in the AZ that is still working.

In a Lucid Chart diagram this HA setup would look like this:

An example of a High-Availability setup on AWS

AWS Services

AWS Services are simply put your usual services from your software stack, but managed by Amazon. They offer them as high-available software-as-a-service where you don’t have to worry about anything else than using it.

For our Drupal 7 setup we’ll be using these AWS Services:

  • Web servers: Amazon EC2 (EC2 instances, Elastic Load Balancer, Auto Scaling Groups)
  • Database: Amazon RDS (MySQL, MariaDB or even Aurora if you want)
  • Configuration files and User uploaded content: Amazon S3
  • Key/value caching server: Amazon Elasticache (memcached)
  • Reverse proxy content cache: Amazon CloudFront

Now that we have all the AWS tools explained, let’s go build our Drupal 7 site using them.

3. Building Drupal 7 on AWS

To deal with the problems we had when running on a Shared Hosting or VPS server we have to make sure we cover these two items:

  • Our setup needs to have High-Availability: no single point of failure
  • It has to have automatic scaling: scale in and out when needed

Scaling up and down means increasing or decreasing the amount of RAM or CPU cores in a system, while scaling in and out means adding more similar servers to a setup or remove some of them. Scaling in and out obviously only works if you have a load balancer that distributes traffic among the available servers.

Look at this Lucid Chart diagram to get an idea of what the final stack will look like (click for a larger version):

Drupal 7 stack on AWS

Database: AWS RDS MySQL

The database is probably the easiest component to configure in our setup: we simply use an Amazon RDS MySQL instance. We connect to it using the something.amazonaws.com hostname and the username and password we supply.

We can make this stack HA by using the Multi-AZ option. This is not a master-master setup, but a standby instance in a different AZ that will get booted by AWS in the event the main one goes down. You do not need to configure anything for this, AWS will update the ip address of the hostname automatically.

Backups of the RDS instance are taken by using daily snapshots, which will be enabled by default for any RDS database you create.

Upload content: Amazon S3

Update 2016/07/11: AWS has released EFS, which is a better choice for our Drupal 7 setup than using S3. Check this blog post for a newer stack that replaces S3 with EFS.

Since our setup will include web servers running on AWS EC2 that will scale in and out depending on the usage, we cannot have any permanent data inside of them. All the content that gets uploaded by Drupal will have to be stored in a central file storage that is accessible by all web servers: Amazon S3.

Drupal can not use S3 out of the box, but there are https://drupal.org/download available to achieve this. When writing this blog post I was still experimenting which one was best suited for the task, I’ll update this post later on with my findings.

While S3 has versioning support, it’s not a bad idea to have a second AWS account copy all the files from S3 every day, hour or even when they get created.

Besides the user uploaded content we will store another type of files in S3: configuration files used by instances and load balancers. More about this later.

Memcached: AWS Elasticache

There’s not much to say about Elasticache. Simply create a memcached server and configure your instances to use it.

Caching: AWS CloudFront

CloudFront is Amazon’s CDN service with edge locations all over the world. The most important though to know here is that invalidating requests is not easy and you should pretty much rely on your Drupal site setting the correct cache headers for each request it serves. If you need to clear your entire cache, it might be easier (and cheaper) to just create a new CloudFront distribution and delete the old one.

We use CloudFront like you would use any other cache: just put it in front of the web server. In this case it will be put in front of the Elastic Load Balancer (see next topic) with the DNS record for our site pointing to the CloudFront distribution.

Web servers and AutoScaling: Amazon EC2

Now we get to the core of the setup: the actual web servers. We will be using a set of AWS EC2 services to accomplish that task.

Let’s start with pointing out that our Drupal code is in a Docker container, pushed to a (public or private) repository. The EC2 instances can reach the registry and can check out the images without authentication.

Configuring our EC2 instances will be done by something called a Launch Configurations. A Launch Configuration can be best seen as a configuration file that will be used by an Auto Scaling Group to create servers. The Launch Configuration contains the base server image to be used, the type of EC2 instance to be used, some other things I will not go into detail here, and most important: the user-data script.

The user-data script is simply a bash shell script that we will use to install the required software on the web servers:

  • Install certain OS packages we need (e.g. aws-cli, docker)
  • Install extra packages using simple curl commands (e.g. docker-compose)
  • Configure rsyslog monitoring (if we don’t use it via docker-compose)
  • More things as you like
  • And as last step: start the Drupal Docker container.

The user-data script will also handle the creation of a custom settings.php file for Drupal. It will overwrite the default one inside the Docker Drupal container with our values for the datasbase, the memcached server, etc…

This Launch Configuration will now be used by an Auto Scaling Group (ASG) to fire up a set of instances. This ASG can become intelligent if you connect it to AWS Cloud Watch, where it will create or remove instances by monitoring certain metrics (server load, RAM usage,…) but it can also be quote simple as to just have a single web server running in each available AZ all the time.

The third component in our web server setup is an Elastic Load Balancer (ELB). The ASG will create servers and the ELB distributes traffic between them and performs the health checks. If a server becomes unhealthy, the ELB will remote it from the rotation and kills it. The ASG will create a new one which will then be picked up by the ELB again and put into the load balancing rotation.

Together these 3 services - LC, ASG and ELB - create a setup that scales in and out when needed, exactly what we wanted for our Drupal 7 setup.

If this all sounds a bit difficult to visualize, check the AWS Auto Scaling article for a longer explanation with some examples.

Route 53

Route 53 is AWS’s DNS service. While you can use any DNS service you want and just point the CNAME records to AWS hostnames I strongly recommend using Route 53. Because AWS internally updates ip address all the time, using a CNAME record might give you situations where DNS lookups can go to the wrong ip.

To deal with this issue, AWS has created an ALIAS record where you can point to an internal AWS resource (ELB, CloudFront distribution, S3 location, …) and won’t be affected by any downtime when ip address change.

4. Did we solve all our problems?

Now that we’ve listed all the services we will be using to build our Drupal site, did we actually meet all the requirements we set out to achieve?

High Availability

Do we have a High Available setup with no single points of failure? Yes. We either use Amazon HA services or we create services in 2 AZ’s at all times.

Auto Scaling

Is this a setup that automatically scales in and out without manual interaction? Yes. The combination of Launch Configuration, Auto Scaling Groups and Elastic Load Balancers takes care of that.

5. Cost

We have managed to turn our Drupal stack into a high available, auto scaling setup, but what will this cost us to run this? To get that cost, we use the Simple Monthly Calculator that Amazon provides.

Before we start calculating we have to make some decisions about which instance types and data usage we will be talking.

This is a very basic setup for now. I’m not going into detail about instance types, snapshot storage space, CloudWatch monitoring, etc… Adding these will of course increase the total cost of running your site on AWS.

  • We use a db.t2.medium 10GB MySQL RDS database, with the Multi-AZ option
  • An S3 bucket that contains 500GB of files
  • A cache.t2.micro memcached instance, which is more than enough for our setup
  • A CloudFront bucket that has worldwide edge location coverage

Our EC2 setup is as follows:

  • We use 2 Availability Zones
  • In each AZ we create a t2.small (1 CPU core, 2GB RAM) web server with a 30GB EBS root disk
  • One Launch Configuration that handles creating the EC2 instances
  • One Auto Scaling Group that scales out instances in pairs, one per AZ
  • One Elastic Load balancer, created in both AZ’s

We expect about 100GB traffic per month to our site.

As you can see I’m using only small instance types for this calculation. Don’t go too big too fast, our scaling setup adds more capacity than in a setup where you would have only one big instance.

For price calculation I’m taking the EU West-1 region (Ireland). Prices vary between different regions, so this is not a complete picture. But still, you should go for the region that is closest to your customers and has all the services you need (e.g. in Europe the Frankfurt region does not have all the services Ireland currently offers).

Adding all of this into AWS’s Monthly Cost tool gives you this number: $75,74 (€66,70) a month or (see calculation details).

6. Conclusion

I hope this blog post was a good example to show you have to optimize your Drupal site for AWS. It can easily be applied to Drupal 8 or any other PHP application, as long as you focus on the important goals of this setup: High Availability and Auto Scaling.

7. Next steps

Even though this is a lengthy blog post, there are still a lot of topics I haven’t covered yet. There are many more AWS services you can use to monitor, scale and build your application. I also haven’t addressed how you should run cron jobs or nightly import/sync tasks in a setup like this. This is all stuff for upcoming blog posts.

Drupal 7 on AWS Part 2: CloudFormation

This blog post focused on the “how?” and “why?” running Drupal on AWS. Part 2 is an actual example of such a setup, with a complete infrastructure setup provided as a CloudFormation stack. CloudFormation is AWS’s Infrastructure-as-a-code tool, something you should definitely should be using for any large software stack.

Getting started with Amazon Web Services (AWS)

Getting started with AWS is actually quite simple. The best way is to start learning for these 5 AWS Certification exams:

Associate (beginner) level:

Professional level:

Don’t get blinded by the names, no matter what your job title is you should learn about all 3 facets of AWS: architect, developer and sysop. So just do all the exams.

The associate (beginner) ones you can start learning straight away if you have some basic experience with server setups. The professional ones you should take if you have at least one or two years experience of working with AWS.

A good place to learn for these exams is A Cloud Guru. They have a bundle that gives you lifetime access to training material for all 5 exams and it only costs you $189.

My home network & server park, update April 2016

A little over a year ago I wrote a blog post about what my home network and server setup looked like. Today I’ll give you an update how it looks in my appartement here in Sweden.

Previous setup

It was a pretty heavy setup for a home network, but still pretty simple because of virtualisation. It came down to this hardware:

  • A Synology NAS
  • A Dell rackmount server running VMWare ESXi
  • A bunch of gigE switches
  • A wifi access point
  • A cable modem

Since I moved to Stockholm in Januari 2016 I was forced to leave most of that hardware behind in Belgium and just use my laptop for everything. And guess what? I don’t really miss it.

The current setup

At home I have a 100Mbit/sec down, 20Mbit/sec upload connection from ComHem. We have a fiber coming into the building that gets distributed to each appartement as a normal cablemodem connection. It’s a very decent connection with no download limit at all. I guess that really only exists in Belgium?

There are basically 2 things that have replaced everything else: Docker and AWS.

All the virtual servers I had running have now been replaced by Docker containers running on my laptop. I spin them up and down using Docker Compose when I need them. Some of them connect to services on my AWS account, which they can do over a VPN connection running on my laptop.

My code is on Github (public code) and Bitbucket (private code).

The files on my NAS have been moved to a combination of Apple iCloud, Google Drive and Dropbox. I’m currently moving stuff away from Dropbox simply to drop that $10 a month.

The big media collection on my NAS is probably the only thing I miss, but Netflix here in Sweden has a pretty decent library so I get by with that. I have a lot less free time here (work, gym), so I don’t have a lot of time to watch stuff anyway.

Todolist

Right now I have only 2 things on my mind that I would change:

  1. Get my Synology NAS back with all my media stuff and cancel Netflix
  2. Install a Raspberry Pi 3 Docker cluster setup

A Raspberry Pi 3 cluster setup? Why would I do that? You already know the answer: Because I can.

A Docker Drupal 8 deployment container

Update 2016/07/06: I’ve restarted working on this project. The README on Github should have all the needed information.

I wrote a small example project that creates a deployable Drupal 8 container: https://github.com/karelbemelmans/docker-drupal8

This can of course be used by any PHP project, so this is just an example using Drupal 8.