Quantcast
Channel: Chris – cenolan.com
Viewing all articles
Browse latest Browse all 22

How-To: Automated incremental daily backups to Amazon S3 using Duplicity

$
0
0

This guide shows how to use Amazon S3 with duplicity to make secure GPG encrypted automated daily incremental backups (snapshots) of a Linux server or desktop. I have been using this method on various servers for several months and it has proved to be a reliable, secure, cheap, and robust method to create automated backups.

I have used this method on Fedora, YDL, and CentOS but the instructions should equally apply to other Linux distributions including Debian and Ubuntu. It will even work on OS X using the MacPorts version of duplicity.

Aims of this guide

This guide explains how to create a simple wrapper script for duplicity that allows you to automatically create GPG encrypted incremental backups that are saved to an Amazon S3 bucket. The script is designed to be executed as a daily cron job so that incremental snapshot backups are created each day. The script creates a full backup set on the 1st day of each month (or when an appropriate full backup cannot be found) and then creates incremental backups on subsequent days.

This guide provides a walk-through of how to create the GPG encryption key, and provides full scripts and example usage for both backup and restore. You could easily adapt the backup script so that it makes full backups each week, or otherwise adjust it to suit your individual needs.

This guide is written with the general Linux user in mind: you do need some understanding of basic linux concepts such as cron, permissions, and directory structures.

What is duplicity?

From the duplicity home page:

Duplicity backs [up] directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server.

I think that says it all much more concisely than I could manage.

One thing to note is that in my experience, and on certain machines, duplicity can cause a lot of overhead and take a long time to complete. Thus duplicity is not always a viable option when backing up huge amounts of data. That said, for backing up the critical data from a standard web server it can be a great solution. Remember, that if you’re backing up databases then you need to dump them into SQL files first. For MySQL databases I recommend automysqlbackup for this. As always, YMMV.

Before we start

You need to install duplicity (version >= 0.4.3 for S3 support). This how-to doesn’t cover that aspect, but suffice to say that duplicity is available as a package for most major distros so crack open your package manager (be it yum, apt, synaptics or whatever) and install duplicity along with all it’s dependencies.

You also need GnuGP and librsync but they should both be automatically installed as dependencies of duplicity.

Step 1 – Generate a new GPG key

If you already have a GPG key that you want to use then skip this bit – you’ll just need to know what your key is which you can get through “gpg –list-keys” – it is the bit after the / in the “pub” line. Otherwise, read on…

I am going to presume that you’ll be running your backup jobs as root, so open a terminal and become root. If you’re going to run them as a different user then become that user instead but ensure that the user you have chosen has sufficient permissions to backup the data you require.

Now run “gpg –gen-key” to generate your key and follow the prompts:

# gpg --gen-key
gpg (GnuPG) 1.4.9; Copyright (C) 2008 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Please select what kind of key you want:
   (1) DSA and Elgamal (default)
   (2) DSA (sign only)
   (5) RSA (sign only)
Your selection?

Accept the default (Enter) or press 1 for DSA and Elgamal.

DSA keypair will have 1024 bits.
ELG-E keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) 

Again, the default (2048) is fine. Just hit Enter.

Requested keysize is 2048 bits
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0) 

I don’t want my key to expire, so I just hit Enter again to accept the default. Do whatever you want.

Key does not expire at all
Is this correct? (y/N) 

Sure is. Hit y and then Enter.

You need a user ID to identify your key; the software constructs the user ID
from the Real Name, Comment and Email Address in this form:
    "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"

Real name: Duplicity Backup
Email address: duplicity@mydomain.com
Comment: Key for duplicity
You selected this USER-ID:
    "Duplicity Backup (Key for duplicity) <duplicity@mydomain.com>"

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit?

Enter the requested details and then press O for Okay.

You need a Passphrase to protect your secret key.

Enter Passphrase:

Enter a passphrase here. It should be something long and complex. Anything will do, but make sure you remember it because you’ll need it later. When finished press Enter and then re-enter your passphrase when prompted and then press Enter again.

At this stage you may have to help generate some entropy by doing some other task – I find that running “updatedb” in another shell is pretty good, or just randomly tapping the keyboard can do the trick too.

Once it has finished you should get a message like this:

gpg: key BE9274BD marked as ultimately trusted
public and secret key created and signed.

gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
pub   1024D/BE9274BD 2008-11-30
      Key fingerprint = 2FB4 A20E 57BA 80BA 9576  3ABD F79F D430 BE92 74BD
uid                  Duplicity Backup (Key for duplicity) <duplicity@mydomain.com>
sub   2048g/F8F35AD8 2008-11-30

Make a note of the key (BE9274BD in this case) as you’ll need that later too.

Important: Remember to backup your GPG key pair somewhere safe and off the current machine. Without this key pair your backups are totally useless to you, so if you lose it and need to restore a backup then you’re up a creak without a paddle. This article shows the proper way to export (and import) your GPG key pair.

Step 2 – The backup wrapper script

This bash wrapper script does a full backup on the 1st day of each month followed by incremental backups on subsequent days. It will also delete old backup sets after X months have passed and it also emails a log report each day giving some valuable statistics about your backup and reporting any errors.

You will need to have the following information handy to edit this backup script for your needs:

  • Your Amazon S3 Access Key ID
  • Your Amazon S3 Secret Access Key
  • Your GPG key
  • Your GPG key passphrase
  • A list of directories you want to back up
  • An email address to send the logs to
  • A unique name for an Amazon S3 bucket (the bucket will be created if it doesn’t yet exist)

The script is as follows, you need to change the bits in bold at least but pay attention to all the variables as you may want to tweak them to suit your needs.

Note that includes/excludes work on a ‘fist match’ basis. So if you want to exclude something in a directory, you need to exclude the file/subdirectory before including the directory. For more info see the duplicity man pages.

#!/bin/bash

# Set up some variables for logging
LOGFILE="/var/log/backup.log"
DAILYLOGFILE="/var/log/backup.daily.log"
HOST=`hostname`
DATE=`date +%Y-%m-%d`
MAILADDR="sysadmin@mydomain.com"

# Clear the old daily log file
cat /dev/null > ${DAILYLOGFILE}

# Trace function for logging, don't change this
trace () {
        stamp=`date +%Y-%m-%d_%H:%M:%S`
        echo "$stamp: $*" >> ${DAILYLOGFILE}
}

# Export some ENV variables so you don't have to type anything
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY_ID"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_ACCESS_KEY"
export PASSPHRASE="YOUR_GPG_PASSPHRASE"

# Your GPG key
GPG_KEY=YOUR_GPG_KEY

# How long to keep backups for
OLDER_THAN="3M"

# The source of your backup
SOURCE=/

# The destination
# Note that the bucket need not exist
# but does need to be unique amongst all
# Amazon S3 users. So, choose wisely.
DEST="s3+http://your_s3_bucket_name"

FULL=
if [ $(date +%d) -eq 1 ]; then
        FULL=full
fi;

trace "Backup for local filesystem started"

trace "... removing old backups"

duplicity remove-older-than ${OLDER_THAN} ${DEST} >> ${DAILYLOGFILE} 2>&1

trace "... backing up filesystem"

duplicity \
    ${FULL} \
    --encrypt-key=${GPG_KEY} \
    --sign-key=${GPG_KEY} \
    --volsize=250 \
    --include=/vhosts \
    --include=/etc \
    --include=/home \
    --include=/root \
    --exclude=/** \
    ${SOURCE} ${DEST} >> ${DAILYLOGFILE} 2>&1

trace "Backup for local filesystem complete"
trace "------------------------------------"

# Send the daily log file by email
cat "$DAILYLOGFILE" | mail -s "Duplicity Backup Log for $HOST - $DATE" $MAILADDR

# Append the daily log file to the main log file
cat "$DAILYLOGFILE" >> $LOGFILE

# Reset the ENV variables. Don't need them sitting around
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE= 

Save the script somewhere and give it an appropriate name. I saved it a /usr/bin/duplicity-backup and make sure to chmod the script to 700 – it contains some sensitive information so we don’t want none privileged users to have read access to it. Run the script as a test then set it up as a daily cron job to run at an appropriate time of night when the server isn’t doing much else.

Step 3 – The restore wrapper script

Clearly we need a way to restore from a backup, so use the following script to do just that:

#!/bin/bash
# Export some ENV variables so you don't have to type anything
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY_ID"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_ACCESS_KEY"
export PASSPHRASE="YOUR_GPG_PASSPHRASE"

# Your GPG key
GPG_KEY=YOUR_GPG_KEY

# The destination
DEST="s3+http://your_s3_bucket_name"

if [ $# -lt 3 ]; then echo "Usage $0 <date> <file> <restore-to>"; exit; fi

duplicity \
    --encrypt-key=${GPG_KEY} \
    --sign-key=${GPG_KEY} \
    --file-to-restore $2 \
    --restore-time $1 \
    ${DEST} $3

# Reset the ENV variables. Don't need them sitting around
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE= 

Again, save this file as something sensible and chmod it to 700 to prevent prying eyes. I saved it as /usr/bin/duplicity-restore but feel free to put it wherever you like.

To do a restore simply invoke the script as follows:

duplicity-restore <date> <file> <restore-to>

Some notes on usage: Paths are relative not absolute. So /home/username would be backed up as home/username. You can restore whole directories but the destination needs to exist first. For example, to restore /home/username from November 20 2008 to a local directory ‘restore’, doing the following would not work because ./home does not exist:

cd ~
mkdir restore
cd restore
duplicity-restore "2008-11-20" home/username home/username

However, the following would work and would restore the directory to ./username:

duplicity-restore "2008-11-20" home/username username

That’s all there is to it. As mentioned I’ve been using this method for several months to back up a variety of servers and it works very nicely. I hope it works just as well for you too!

Credits

This solution is the combination of a couple of tips and tricks I found while trawling the web, notably from this howto at randys.org and this post over at the linode.com forums. Credit and thanks goes to the original authors – I have merely hacked their ideas together and added a few touches of my own.

If you find this useful or have any comments or questions then please respond below!


Viewing all articles
Browse latest Browse all 22

Trending Articles