Two Easy Scripts to Automate Amazon EC2 EBS Volumes
I was faced with the conundrum of finding an automated solution for backing up essential data stored on EBS volumes that were both storage volumes and the root volume of the instance itself. If a MySQL database is being run from the instance there are special procedures one should do before doing running the scripts. Mainly stopping the MySQL server before starting and then re-enabling it after; Googling for such a circumstance turns up a script by Eric Hammond mentioned at this blog (http://alestic.com/2009/09/ec2-consistent-snapshot).
I am using RDS and Multi-AZ for my MySQL databases and am managing the automated backup and retention period quite easily, so I have no need for a more advanced script. I was wondering what scripts are out there or if I could write my own. Googling turned up a few dirty examples, but nothing worthwhile I could find. I did run across this example (http://blog.taggesell.de/index.php?/archives/86-Amazon-EC2-how-to-automatically-create-snapshots-of-attached-EBS-volumes.html) and I actually used it in the first script I ended up writing, however, the second script was the main thing. I needed something to cleanup the EBS backups because I didn’t want them piling up. I set a retention period and delete them after that period. I wrote it in bash using Amazon API tools installed on my Ubuntu server.
Of course, this means you have to have the tool setup correctly for your specific environment, (http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/setting-up-your-tools.html) and (http://blog.taggesell.de/index.php?/archives/66-Managing-Amazon-EC2-virtual-machines-101-part-1-creating-AMI-images.html) if you’re wondering what was in my ./awsapicfg file it was the environment variables needed.
Ready to see the code? First, a backup script that I can call with a schedule cron job.
#EBS Backup Script - Tyler Whitney
. ./.awsapicfg
MY_CERT=cert-X.pem
MY_KEY=pk-X.pem
# fetching the instance-id from the metadata repository
MY_INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
# get list of locally attached volumes via EC2 API:
VOLUME_LIST=$(ec2-describe-volumes -C $MY_CERT -K $MY_KEY \
| grep ${MY_INSTANCE_ID} | awk '{ print $2 }')
sync
#actually creating the snapshots
for volume in $(echo $VOLUME_LIST); do
ec2-create-snapshot -C $MY_CERT -K $MY_KEY -d "Nightly Backup_$volume_$(date +%m-%d-%Y)" $volume
done
As you can see I am listing ALL the current volumes associated with the current instance I am running the script from. Then I loop through them and created a snapshot for each of them; giving them the “Nightly Backup” name to grep on later. If you need to do this for multiple servers (because the backup script will grep the same name) then you may want to give it a better name like “ServerA Nightly” or something like that. To keep things simple, otherwise you can check on other things in the scripts.
Now, I needed to write the script to delete all items older than a specific time period. Let’s jump right into it.
#Backup cleanup script -- Tyler Whitney
. ./.awsapicfg
MY_CERT=cert-X.pem
MY_KEY=pk-X.pem
# fetching the instance-id from the metadata repository
MY_INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
# get list of locally attached volumes via EC2 API:
SNAP_LIST=$(ec2-describe-snapshots -C $MY_CERT -K $MY_KEY \
| grep "Nightly Backup" | awk '{ print $2"_"$5 }')
sync
#actually checking and deleting the snapshots
OLD="$(date --date='3 minutes ago' +%s)"
for volume in $(echo $SNAP_LIST); do
check="${volume#*_}"
SNAP="$(date --date=${check:0:19} +%s)"
if (( OLD > SNAP )); then
ec2-delete-snapshot -C $MY_CERT -K $MY_KEY ${volume%_*}
fi
done
As you can see here, I am doing something very similar to the first script except I am looping through the EBS snapshots rather than the volumes of the current instance. So nothing is attaching the snapshot to the instance, so I grep on the same name I used before, hence my comments above. From there, I convert the date they were created to seconds from the Unix epoch, and I do the same with converting “3 minutes ago” which will delete anything that is older than 3 minutes. Obviously you will probably want to set this to something higher. My retention period on my production box is 8 days, so I have “8 days ago” at that location. I check via a loop and if to see if one is older than the other and if it is I delete the snapshot. Easy as that!
What do you think? Please comment, I will try to respond to any questions I see!

