Another year is almost gone.. this was my 2018

I haven’t posted a single blog post yet this year, so it’s time for a catch-up post. It’s not that 2018 was a boring year, I’ve just been too caught up into work and real life to take time to write some stuff down.

nginx

If you would ask me to summarize the year into one word, I would tell you “nginx”. Nginx has become our main tool at work to build our global load balancing service, where nginx takes on the role of CDN, edge load balancer (via an anycast setup), API gateway, reverse proxy and sidecar proxy for applications.

We used a combination of open source nginx and the licensed nginx-plus version (which has a ton more features than the open source one) to replace old, monolithic hardware load balancers with (mostly F5 BigIp’s) and it worked out great. With nginx we can now see load balancers as software solutions that run on everyday harware and perfectly scale out horizontally, and not as exotic hardware powerhouses that need specialised hardware (and support contracts) to keep running.

Expect a big blog post coming in 2019 where I go a bit more into depth about our setup, for now I’ll leave you hungry for the details.

Public Cloud

I’m still a big fan of AWS (I renewed my AWS Solutions Architect - Associate certification), but I’ve not worked with it quite as much as I’d like. Google and Azure are still somewhere at the bottom of my todolist, I just haven’t had the time or a project to dig deeper into them.

The public cloud is an awesome tool for a lot of problems nowadays, but there are still a lot of cases when a public cloud is not the best solution. When it comes to my own experience:

  • If you already have datacenters all over the world, you better use them all the way and not go hybrid.
  • The cloud only works if you can go all-in and not just see it as “yet another virtualisation platform” and keep that cloud-agnostic approach. You really need to embrace the polycloud or stick to on-prem.
  • But maybe you can go all-in for just a small set of your infrastructure? I’ll give you a big tip: Big Data, data processing, machine learning and A.I really shine on the public cloud.

Company values

Oh boy, have I learned a lot about what a big multinational company thinks about their values and how every employe has to make sure they live up to them.

My job is no longer about just coming up with the right technical solution, I also have to learn how to deal with all kinds of people on a daily basis, and get them to accept that my solution is also in their best interest.

It’s quite a challenge but I’d like to think I’ve gotten a lot better at it the past year.

Data

Big Data, Machine Learning and A.I. have been important topics the past years, but in 2019 they will really explode in adaption by even non-tech companies. I will def. post some things about that the next year.

Podcasts

I still haven’t started with them I’m afraid, reading things at my own pace is still my preferred method of getting to know new things. I’ve also read more books (non-fiction) than any other year before.

Personal life

I stopped sharing personal things online, with the exception of maybe a few home network updates. Posting vague pictures with cheesy comments on Instagram for a hand full of close friends is as far as I’m willing to go nowadays.

Golf

Let me step aside from that previous paragraph already: In 2018 I started to play golf and I absolutely love it. It will be a long journey before I’m any good at it, but I’m convinced it’s going to be fun and worth the frustrations I’m going through right now.

Docker nginx container with GeoIp database

This is a Docker nginx container that includes the MaxMind GeoIP Country database. It injects the X-Origin-Country-Code and X-Origin-Country-Name headers into http requests to the backend with the country code of the requester.

Note: This container needs some more configuration before it actually runs, like the backend “backend” needs to exist. But you are most likely to copy/paste parts of this code into your own project anyway :)

Docker PHP container with New Relic

I created a small example Docker container that runs PHP with the New Relic agent installed. The New Relic agent on the Docker host cannot monitor things inside the containers, so we need to install the agent inside each Docker container.

The Dockerfile looks like this:

# My PHP base container: https://github.com/karelbemelmans/docker-php-base
FROM karelbemelmans/php-base:7.1
MAINTAINER mail@karelbemelmans.com

# You should override this when you run the container!
# It will get appended to the New Relic appname in the entrypoint scripts e.g. my-php-container-local
ENV environment local

# Install New Relic
RUN set -x && DEBIAN_FRONTEND=noninteractive \
  && wget -O - https://download.newrelic.com/548C16BF.gpg | apt-key add - \
  && echo "deb http://apt.newrelic.com/debian/ newrelic non-free" > /etc/apt/sources.list.d/newrelic.list \
  && apt-get update \
  && apt-get install -y newrelic-php5 \
  && newrelic-install install \
  && rm -rf /var/lib/apt/lists/*

# We need to copy the New Relic config AFTER we installed the PHP extension
# or we get warnings everywhere about the missing PHP extension.
COPY config/newrelic.ini /usr/local/etc/php/conf.d/newrelic.ini

# Generate an example PHP file in the webroot
RUN echo '<?php phpinfo();' > /var/www/html/index.php

# Our entrypoint script that also modifies the New Relic config file
COPY entrypoint.sh /
CMD ["/entrypoint.sh"]

The entrypoint.sh scripts:

#!/bin/bash -e

# Update the New Relic config for this environment
echo "newrelic.appname=my-php-container-${environment}" >> /usr/local/etc/php/conf.d/newrelic.ini

# Proceed with normal container startup
exec apache2-foreground

Besides that you need to create the config/newrelic.ini file too with your license id (and probably more options):

# First of all enable the extension
extension=newrelic.so

# Our license key is required
newrelic.license="REPLACEME"

# Enable this if you configured your account for High Security
#newrelic.high_security=true

Source code on Github: https://github.com/karelbemelmans/docker-php-newrelic

Note: This container is built on top of my PHP 7 Base Docker container, which you might also find useful.

Common misconceptions about the public cloud

I’m currently a big fan of public clouds, mostly because of the Infrastructure as a Service tools they offer: by uploading a simple JSON or YAML template I can create infrastructure services and network services that scale accross multiple physical and even geographical locations, and automatically install all the applications that run on top of them.

But when I talk to people in other companies I still hear a lot of misconceptions about how all of this works and what it costs. I will try to get rid of the biggest ones in this blog post.

Disclaimer: While most of the things I’m going to address are not unique to a specific public cloud, I will be using AWS as a reference since that’s the public cloud I’m currently most familiar with.

Sidenote: Have a look at my CloudFormation templates on Github for some examples to use on Amazon Web Services.

Misconception: You have no idea about costs on a public cloud

False.

1. Set limits

While the public cloud offers you a virtually endless amounts of resources to use, you can, and MUST, set limits on everything. E.g. when you are creating an Auto Scaling Group (a service that creates and destroys instances depending on the resources needed) you always set an upper and a lower limit for the number of instances it can create when it executes a scaling action.

2. Set warnings

Pretty trivial to point out, but you can track your costs on a daily basis, with warnings if a certain threshold has been reached. But it’s your job to monitor those costs and act upon them if they are not what you expected them to be.

A big aide in this is using tags for your resources. Tags allow you to group resources together easily in the billing overview. Tags could include Environment (e.g. prod, staging, test, …), CostCenter (a common used tag for grouping resources per department), Service (e.g. Network, VPC, Webservers) and whatever tag you want to use. The key really is “more is better” when it comes to tags since that allows you to refine to a very low level.

3. Simulate your costs well

Before moving to the public cloud it’s perfectly doable to create a cost similation of what the setup will cost. AWS offers you the Simple Monthly Calculator and TCO Calculator. It is however YOUR job to do this as detailed as possible, with taking storage usage and bandwidth usage into account to make this a good estimate.

4. Don’t keep things running 24 / 7 if they don’t need to

On AWS you pay per hour you use a resource, e.g. a virtual server. On Google Compute Engine you even pay per minute, so destroying resources when you don’t need them is a must to keep costs down.

Using Infrastructure as a Code you can create templates that will build your infrastructure, networking and application setups, as I’ve stated above already. But this also allows you to create identical stacks for a staging, QA or development environment whenever you need it, which you can destroy again when you are done using them.

A simple example would be a QA environment, identical to the production environment, that only runs during office hours, since nobody will be using it outside of those.

If you provide enough input parameters for your IaaC templates you can even optimize costs more: production runs in 2 physical locations, QA could run in only one, since it does not require the same level of high availability.

Misconception: But AWS crashes all the time :(

One of AWS’s slogans is actually Design for faillure.

Hardware crashes, there is no hosting company that will give you a 100% uptime guarantee on any single piece of hardware. Even with the most redundant setup there are still weak links in the chain that can break. What you need to do on the public cloud is to make sure that your application can handle faillure

Examples:

  • run Auto Scaling instances behind a load balancer instead of a single host (yes you will need to redesign your application for this)
  • run in multiple datacenters and even in multiple geographical regions
  • get a good understanding of which service from the public cloud is offered as high-available and what you still have to do yourself

Designing your application for the public cloud will be a challenge, but there are enough example cases already that shows you how to do this. And with container technology becoming more mature every month this suddenly got a whole lot easier to achieve.

Misconception: We don’t want vendor lock-in to public cloud provider X

In all the applications I have created or moved to the cloud there was very little vendor lock-in besides the IaaC tool we used (and you can even use something like Terraform to remove that lock-in). Things that all applications use but are not specific to any public cloud:

  • Linux virtual servers
  • Docker containers
  • MySQL/PostgreSQL databases
  • memcache and/or Redis
  • NFS storage
  • DNS

The main thing of the application is still the Docker container, where your actual code runs. On AWS this will run on ECS with ALB’s, but you can just as well run the containers on Google Compute Engine or Microsoft Azure with the equivalents of those systems, it will not require change in your application code at all.

But… if you want to make most use of public cloud provider X you will need to develop for it. On AWS you would e.g. make your application run on ECS and use S3, SNS and SQS to glue things together. But once you do this you will realise how powerfull and virtually without limits the public cloud is.

I hope you found this blog post usefull, feel free to leave a comment below.

My favorite developer tools for OSX

Macbook Pro

Having the right tools on your computer is the key to work fast and efficient, without having to waste much time on repetitive and boring actions. Here’s a list of the current tools I use on my Macbook:

iTerm 2

The standard Terminal application in OSX is quite limited so we need a better one:

iTerm 2

iTerm2 is a replacement for Terminal and the successor to iTerm. It works on Macs with macOS 10.8 or newer. iTerm2 brings the terminal into the modern age with features you never knew you always wanted.

iTerm 2 is simply the standard terminal application for OSX. You can split windows with simple commands (cmd-d and cmd-shift-d), easy copy/paste text (just select select, no copy command needed) and click on hyperlinks without having to copy the text first (hold down CMD when you hover about the link) and many more handy features.

Website: https://www.iterm2.com/

Homebrew

Homebrew

Homebrew installs the stuff you need that Apple didn’t.

Homebrew is literally the first thing you install on a new Mac after installing iTerm2. It’s a command line package manager that you can best compare to yum or apt-get and has about every little piece of GNU and open source software available.

Homebrew in action

Some of the other tools in the blog post will also be installed using Homebrew.

Website: http://brew.sh/

ShiftIt

ShiftIt is an application for OSX that allows you to quickly manipulate window position and size using keyboard shortcuts. It intends to become a full featured window organizer for OSX.

I hate having windows that are not using up the full width or half width of my screen. ShiftIt offers me a few simple keyboard shortcuts to maximize or move windows to fill up my whole or part of my screen, without ever having to use my mouse.

  • ctrl-alt-cmd-m: maximize current window
  • ctrl-alt-cmd-ARROWKEY: scale windows to take up half a screen size, attached to the side of the screen which arrow key you press. Then use ctrl-alt-cmd-EQUAL and ctrl-alt-cmd-MINUS to stretch them out a bit.
  • ctrl-alt-cmd-NUMBERKEY: scale windows to 1/4th of your screen in a corner, the corner depends on the number you press

Website: https://github.com/fikovnik/ShiftIt

Homebrew install: brew cask install shiftit

1Password

1Password

I never ever use a password for more than 1 site, and neither should you. But since remembering tons of passwords is nearly impossible we need a password manager. For me that password manager is 1Password: it stores encrypted passwords in a file that you can sync on iCloud, Dropbox, on a network share or just copy around manual.

1Password has native apps for almost all platforms (Windows, OSX, iOS, Android) and a browser extension for all the popular browsers making filling in password forms easy. The latest version of 1Password even includes a 2FA system.

1Password is not free software, but it’s worth every cent.

Website: https://1password.com/

Deploying a Hugo website to Amazon S3 using AWS CodeBuild

A month ago I blogged about using Bitbucket Pipelines as a deployment tool to deploy my Hugo website to AWS S3. It was a fully automated setup that deployed a new version of the site every time I pushed a commit to the master branch of the git repo.

Lately I’ve been moving more things to AWS, as having everything on AWS makes it easier to integrate stuff, including my Hugo blog. Let me show you how I set up the build process on AWS.

CodeCommit

Firstly I moved my git repo from the public, free Bitbucket server to AWS CodeCommit. There really is nothing special to say about that: CodeCommit is simply git on AWS (details on pricing)

The only thing I want to stress, again, is that you should not use your admin user to push code but create a new IAM user with limited access so it can only push code and nothing more. The CodeCommit page will guide you with that, up to the point of creating SSH keys.

The AWS Managed Policy AWSCodeCommitFullAccess should be all the access needed, there is no need to write your own policy.

CodeBuild

Secondly, I needed a replacement for Bitbucket Pipelines: AWS CodeBuild. Launched in December 2016, CodeBuild is almost exactly the same build system as Bitbucket Pipelines (and Travis CI, and GitLab templates, and so many other Docker-driven build systems) and there is just one thing you need to create yourself: a build template.

Here’s what I used as buildspec.yml for building and deploying my Hugo blog:

version: 0.1

environment_variables:
  plaintext:
    AWS_DEFAULT_REGION: "YOUR_AWS_REGION_CODE"
    HUGO_VERSION: "0.17"
    HUGO_SHA256: "f1467e204cc469b9ca6f17c0dc4da4a620643b6d9a50cb7dce2508aaf8fbc1ea"

phases:
  install:
    commands:
      - curl -Ls https://github.com/spf13/hugo/releases/download/v${HUGO_VERSION}/hugo_${HUGO_VERSION}_Linux-64bit.tar.gz -o /tmp/hugo.tar.gz
      - echo "${HUGO_SHA256}  /tmp/hugo.tar.gz" | sha256sum -c -
      - tar xf /tmp/hugo.tar.gz -C /tmp
      - mv /tmp/hugo_${HUGO_VERSION}_linux_amd64/hugo_${HUGO_VERSION}_linux_amd64 /usr/bin/hugo
      - rm -rf /tmp/hugo*
  build:
    commands:
      - hugo
  post_build:
    commands:
      - aws s3 sync --delete public s3://BUCKETNAME --cache-control max-age=3600

The Docker image I used was the standard Ubuntu Linux 14.04 one since I don’t require any custom software during my build plan.

For more complex jobs you can provide your own Docker image to run the build process in. Make sure it includes libc, otherwise AWS will not be able to run it. Sadly this will exclude most alpine-based images, but for a build process that probably shouldn’t be a big issue.

Instead of using an IAM user by providing the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in my build template, I used the CodeBuild IAM role to define my access to the S3 bucket. CodeBuild will generate this role for you when creating a build plan, just add this custom IAM policy to that role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:List*",
                "s3:Put*",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::BUCKETNAME",
                "arn:aws:s3:::BUCKETNAME/*"
            ],
            "Effect": "Allow"
        }
    ]
}

Replace BUCKETNAME with the name of your S3 bucket.

Some remarks

Right now deployment is a manual action: I log into the AWS CodeBuild site and push the Run build button. CodeBuild has no easy “Build on new commits” option, but you can of course use AWS Lambda to build that yourself. I will do that soon for my blog, and then I’ll update this post with the Lambda I used.

If you are looking for a complete pipeline system like GoCD, AWS CodePipeline is what you need.

Advanced networking on Amazon Web Services video

Watching talks like this one from AWS reInvent 2016 makes me an even bigger fan of public clouds than I already am. It starts with a simple VPC setup and scales out to a multi-region network setup using a Direct Connect connection, managed AWS services and a few Cisco EC2 instances.

You can basically create a multi-region network setup as a single engineer running a few templates and connecting a few services. The license costs for the Cisco’s for the lowest bandwidth setup is a whopping $3000 per month, but that’s still peanuts compared to running your own hardware and managing that.

I admit, it’s not a simple setup and you need a deep understanding of networking, infrastructure and application development, but isn’t that a nice goal to achieve? :)

CloudFormation template for automated EBS volume backups using AWS Lambda and CloudWatch

This is a CloudFormation template that creates a small stack that uses AWS Lambda and CloudWatch to take daily backups of EBS volumes and deletes the snapshots when their retention period has been reached.

It simply looks for EC2 instances with the tag Backup:True and then creates snapshots of all the EBS volumes attached to the.

The original idea for this setup came from these 2 blog posts, I’ve simply created a CloudFormation YAML template to automate that setup:

You can find the YAML template in my Github CloudFormation templates repository.

EC2 UserData script that waits for volumes to be properly attached before proceeding

When creating EC2 instances with extra volumes it might be needed to format these volumes in the UserData script when creating them. With big volumes you can run into the issue that the volume is not yet attached to the instance when you try to format it, so you need to add a wait condition in the UserData to deal with this.

The UserData script below does just that: wait for a volume /dev/sdh to be attached properly before trying to format and mount it.

(This is a code snippet from a CloudFormation stack in YAML format)

UserData:
  "Fn::Base64": !Sub |
    #!/bin/bash -xe
    #
    # See: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html
    #
    # Make sure both volumes have been created AND attached to this instance !
    #
    # We do not need a loop counter in the "until" statements below because
    # there is a 5 minute limit on the CreationPolicy for this EC2 instance already.

    EC2_INSTANCE_ID=$(curl -s http://instance-data/latest/meta-data/instance-id)

    ######################################################################
    # Volume /dev/sdh (which will get created as /dev/xvdh on Amazon Linux)

    DATA_STATE="unknown"
    until [ "${!DATA_STATE}" == "attached" ]; do
      DATA_STATE=$(aws ec2 describe-volumes \
        --region ${AWS::Region} \
        --filters \
            Name=attachment.instance-id,Values=${!EC2_INSTANCE_ID} \
            Name=attachment.device,Values=/dev/sdh \
        --query Volumes[].Attachments[].State \
        --output text)

      sleep 5
    done

    # Format /dev/xvdh if it does not contain a partition yet
    if [ "$(file -b -s /dev/xvdh)" == "data" ]; then
      mkfs -t ext4 /dev/xvdh
    fi

    mkdir -p /data
    mount /dev/xvdh /data

    # Persist the volume in /etc/fstab so it gets mounted again
    echo '/dev/xvdh /data ext4 defaults,nofail 0 2' >> /etc/fstab

That’s all.

Deploying a Hugo website to Amazon S3 using Bitbucket Pipelines

Atlassian recently released a new feature for their hosted Bitbucket product called “Pipelines”. It’s basically their version of Travis CI, that can do simple building, testing and deployment.

In this blog post I’ll show you how I use Pipelines to deploy my Hugo site to AWS S3. This is short and to-the-point, if you know AWS this should tell you enough to set up your own deployment in about 5 minutes.

Create an AWS user for Pipelines

You need an AWS user that can deploy to your bucket, do NOT use your admin user for this! Simply create a new user called “pipelines” and give it only access to your blog bucket.

This inline policy should be enough access to do these deployments (replace BUCKETNAME with the name of your bucket):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:List*",
                "s3:Put*",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::BUCKETNAME",
                "arn:aws:s3:::BUCKETNAME/*"
            ],
            "Effect": "Allow",
            "Sid": "AllowPipelinesDeployAccess"
        }
    ]
}

Configure Pipelines with your AWS credentials

Generate an access and secret key for this new user and add these 3 variables in the environment variables settings page in Bitbucket:

AWS Variables:

AWS_ACCESS_KEY_ID: xxx
AWS_SECRET_ACCESS_KEY: xxx
AWS_DEFAULT_REGION: (your bucket's region)

Bitbucket Pipelines environment settings page:

Bitbucket Pipelines environment variables settings page

Create the Pipelines build config

I’m assuming your hugo site lives in the root of your git repository. In my case my repository looks like this:

karel:Hostile ~/KarelBemelmans/karelbemelmans-hugo$ tree -L 2
.
├── README.md
├── bitbucket-pipelines.yml
├── config.toml
├── content
│   ├── about-me.md
│   └── post
├── public
│   ├── 2015
│   ├── 2016
│   ├── 404.html
│   ├── CNAME
│   ├── about-me
│   ├── categories
│   ├── css
│   ├── favicon.png
│   ├── goals
│   ├── images
│   ├── index.html
│   ├── index.xml
│   ├── js
│   ├── page
│   ├── post
│   ├── sitemap.xml
│   ├── touch-icon-144-precomposed.png
│   └── wp-content
├── static
│   ├── CNAME
│   ├── css
│   ├── images
│   └── wp-content
└── themes
    └── hyde-x

Then create the file bitbucket-pipelines.yml in the root of your repository, replace BUCKETNAME with the name of your blog’s bucket:

image: karelbemelmans/pipelines-hugo

pipelines:
  default:
    - step:
        script:
          - hugo
          - aws s3 sync --delete public s3://BUCKETNAME

Docker Hub and Github links for this Docker image, feel free to fork and modify:

That’s all.

One single remark though

As you can see I use the aws s3 sync method to upload to S3. When I do this from my laptop, where files persist over deployments, that actually makes sense and saves me some upload traffic.

Doing this on Pipelines, where the hugo site is always completely re-generated from scratch inside a Docker container, is actually useless as it will always upload the entire site as every file is “new”.