Enforce SSL for an Apache vhost

Here’s a proper way to enforce SSL with the apache2 web server, without hardcoding the ServerName in the config. You can also use this in a .htaccess file:

<IfModule mod_rewrite.c>
   <IfModule mod_ssl.c>
     RewriteEngine on
     RewriteCond %{HTTPS} !^on$ [NC]
     RewriteRule . https://%{HTTP_HOST}%{REQUEST_URI} [L]
    </IfModule>
 </IfModule>

You need the rewrite module and ssl module enabled (obviously), which can be easily enabled with the a2enmod commands:

a2enmod ssl
a2enmod rewrite

Drupal watchdog logging: dblog vs syslog

Sometimes logging events to your Drupal database is not the best way to go and can be quite a performance killer. This is when you want to start using a syslog server.

Database logging (dblog)

A standard Drupal 7 installation comes with database logging enabled, so your watchdog() calls get logged in a permanent storage system. You can then filter these logs and use “drush ws” to view them from a console, and get valuable feedback from it. So it’s a pretty good thing to have.

You can fine-tune this logging on the Drupal log settings page (admin/config/development/logging) as shown below:

Drupal logging settings page

For a development site you mostly turn on the display of warnings and errors, while for a production site it should be turned off. You can also configure this via your settings.php file:

// Disable logging
$conf['error_level'] = 0;

// Set nr. of log items
$conf['dblog_row_limit'] = 10000;

If you have a big site, where a lot of things are logged, like node edit messages, user login messages, API calls, order checkouts etc… you want the setting for “Database log messages to keep” be as big as possible. Only the last log items are kept (cron clears older records for you), but if this number is too low, you might end up losing important data as it’s being pushed out by newer log entries.

For one of our sites this settings somehow got raised to 1000000 (one million), but that had quite a dramatic impact on our server performance. Below is a New Relic performance graph that shows the watchdog table is a huge bottleneck for database queries:

New Relic MySQL queries performance report of a Drupal site that has watchdlog log entries set to 1 million.

As a lot pages on this site add some kind of log action when using them, this was having quite a negative impact on the overall performance of the site.

Syslog

A better way to handle logs for such a big site, is using an external tool like syslog.

Syslog is a UNIX service, installed on every Linux server, that basically listens on a TCP or UDP port for log entries and then does something with it. Most of the time that’s writing the logs to a file (that then gets rotated on a daily base by logrotate), but you can also send them to a remote server, or do both.

If you have a lot of servers you want to monitor from one central place, a remote syslog server that collects all the logs is a good choice. You can set it up yourself using a Linux server, or you can use an external service like Papertrail.

Most consumer-based NAS solutions also have a syslog server installed, like my Synology that I’m using for my home network.

What we did for the website I mentioned before was now:

  • enable the Drupal syslog module
  • disable the Drupal dblog module
  • configure the Linux server to use a remote Papertrail syslog server

The basic configuration of the Drupal syslog module is quite ok, we’re using LOCAL0 to log here and that’s also what we are going to send to Papertrail:

Drupal syslog configuration

The server’s syslog entry looks like this (Papertrail will guide you through this setup process):

# Drupal sites log to LOCAL0 and get their logs sent to Papertrail
local0.* @XXXX.papertrailapp.com:XXXXX

We actually did this for 5 servers for this client, so we can now monitor all their logs via one central website on Papertrail.

Configure Telenet ipv6 on pfSense 2.2.x

It took me some time to figure this out, but thanks to @iworx his input I managed to get ipv6 up and running with my Telenet business modem (the modem-only, non-wifi version) with my pfSense router.

The pfSense version I used was 2.2.2.

Here are 2 screenshots of the interface pages. The one thing that fixed it was disabling the “Block Bogon networks” on the WAN interface page, do not forget to uncheck it!

Screenshot 1 Screenshot 2

That’s all.

Reverse proxy configuration for Drupal 7 sites

Update 13/7/2015: If you’re doing this for your Drupal 7 site, you should probably also read this blog post about updating your varnish and Apache / nginx configuration for proper logging the real ip address of visitors.

A common mistake I see a lot of times by developers that have a varnish server (or other type of content cache) in front of their Drupal 7 site, is that they forget to add these lines to their settings.php:

// reverse proxy support to make sure the real ip gets logged by Drupal
$conf['reverse_proxy'] = TRUE;
$conf['reverse_proxy_addresses'] = array('127.0.0.1');
$conf['reverse_proxy_header'] = 'HTTP_X_FORWARDED_FOR';

The default.settings.php does contain a very clear comment about why and when to use it:

/**
 * Reverse Proxy Configuration:
 *
 * Reverse proxy servers are often used to enhance the performance
 * of heavily visited sites and may also provide other site caching,
 * security, or encryption benefits. In an environment where Drupal
 * is behind a reverse proxy, the real IP address of the client should
 * be determined such that the correct client IP address is available
 * to Drupal's logging, statistics, and access management systems. In
 * the most simple scenario, the proxy server will add an
 * X-Forwarded-For header to the request that contains the client IP
 * address. However, HTTP headers are vulnerable to spoofing, where a
 * malicious client could bypass restrictions by setting the
 * X-Forwarded-For header directly. Therefore, Drupal's proxy
 * configuration requires the IP addresses of all remote proxies to be
 * specified in $conf['reverse_proxy_addresses'] to work correctly.
 *
 * Enable this setting to get Drupal to determine the client IP from
 * the X-Forwarded-For header (or $conf['reverse_proxy_header'] if set).
 * If you are unsure about this setting, do not have a reverse proxy,
 * or Drupal operates in a shared hosting environment, this setting
 * should remain commented out.
 *
 * In order for this setting to be used you must specify every possible
 * reverse proxy IP address in $conf['reverse_proxy_addresses'].
 * If a complete list of reverse proxies is not available in your
 * environment (for example, if you use a CDN) you may set the
 * $_SERVER['REMOTE_ADDR'] variable directly in settings.php.
 * Be aware, however, that it is likely that this would allow IP
 * address spoofing unless more advanced precautions are taken.
 */
# $conf['reverse_proxy'] = TRUE;

If you don’t configure this header when you have varnish, all your Drupal request will have 127.0.0.1 (= the ip adddress of the varnish server) as the source ip address for connection attempts. You can easily see this in the webserver and watchdog logs.

This might not seem a big deal, but Drupal also has something called ‘flood protection’. This protection bans users by ip address if they have made too many failed logins in a period of time (the default is 50 failed logins over 1 hour).

And what do you think happens when all your users come from the same ip and the flood protection gets triggered? Yup, everyone gets banned.

Elasticsearch backup script with snapshot rotation

Edit 2015/10/16: Added the example restore script.

Edit 2015/3/31: It seems there is also a python script called curator that is intended to be a housekeeping tool for elasticsearch. While curator is being a more complete tool, my script below works just as well and doesn’t need python installed. Use whatever tool you prefer.

Elasticsearch 1.4 has an easy way to make backups of an index: snapshot and restore. If you use the filesystem way, you can just make a snapshot, rsync/scp/NFS export the files to another host and restore them from those files.

Setup the snapshot repository

Setup the snapshot repository location:

curl -XPUT 'http://localhost:9200/_snapshot/my_backup' -d '{
  "type": "fs",
  "settings": {
    "location": "/mount/backups/my_backup",
    "compress": true
  }
}'

Take snapshots

A backup script you can run on cron would be as simple as this:

#!/bin/bash
SNAPSHOT=`date +%Y%m%d-%H%M%S`
curl -XPUT "localhost:9200/_snapshot/my_backup/$SNAPSHOT?wait_for_completion=true"

While it’s very easy to set up this backup, there is currently no logrotate included to remove old snapshots. I wrote a small script using the jq program that keeps the last 30 snapshots and deletes anything older:

#!/bin/bash
#
# Clean up script for old elasticsearch snapshots.
# 23/2/2014 karel@narfum.eu
#
# You need the jq binary:
# - yum install jq
# - apt-get install jq
# - or download from http://stedolan.github.io/jq/

# The amount of snapshots we want to keep.
LIMIT=30

# Name of our snapshot repository
REPO=my_backup

# Get a list of snapshots that we want to delete
SNAPSHOTS=`curl -s -XGET "localhost:9200/_snapshot/$REPO/_all" \
  | jq -r ".snapshots[:-${LIMIT}][].snapshot"`

# Loop over the results and delete each snapshot
for SNAPSHOT in $SNAPSHOTS
do
 echo "Deleting snapshot: $SNAPSHOT"
 curl -s -XDELETE "localhost:9200/_snapshot/$REPO/$SNAPSHOT?pretty"
done
echo "Done!"

Restore snapshots

Get a list of all the snapshots in the snapshot repository:

curl -s -XGET "localhost:9200/_snapshot/my_backup/_all?pretty"

From that list pick the snapshot id you want to restore and then make a script like this:

#!/bin/bash
#
# Restore a snapshot from our repository
SNAPSHOT=123

# We need to close the index first
curl -XPOST "localhost:9200/my_index/_close"

# Restore the snapshot we want
curl -XPOST "http://localhost:9200/_snapshot/my_backup/$SNAPSHOT/_restore" -d '{
 "indices": "my_index"
}'

# Re-open the index
curl -XPOST 'localhost:9200/my_index/_open'