This website is dockerlised and run on AWS.
As a blog, SLA is not the biggest concern for me, so I set up the snapshot lifecycle policy to backup 3 times a day and daily for 3 days. So at max, there will be 7 snapshot for my blog.(3+3+1 for the original snapshot from AMI).
But on AWS dashboard, I found much more than that. There are snapshots in the list which is backup-ed daily about 10pm UTC.
How to Solve, my original thought
OK, if it worked weird, with no reason, I may just solve it brutally.
- Simply build a Lambda function to delete untagged snapshot every day (With Cloud Watch Event Bridge Cron Job)
- My thought was it might be some kind of default backup from AWS? (Actually it is not)
from datetime import datetime, timedelta, timezone
import boto3
class Ec2Instances(object):
def __init__(self, region):
print("region "+ region)
self.ec2 = boto3.client('ec2', region_name=region)
def delete_snapshots(self, older_days=1):
delete_snapshots_num = 0
snapshots = self.get_nimesa_created_snapshots()
for snapshot in snapshots['Snapshots']:
fmt_start_time = snapshot['StartTime']
if (fmt_start_time < self.get_delete_data(older_days)):
self.delete_snapshot(snapshot['SnapshotId'])
delete_snapshots_num+1
return delete_snapshots_num
def get_nimesa_created_snapshots(self):
snapshots = self.ec2.describe_snapshots(Filters=[{'Name': 'description', 'Values': ['Created by Nimesa']}])
return snapshots
def get_delete_data(self, older_days):
delete_time = datetime.now(tz=timezone.utc) - timedelta(days=older_days)
return delete_time;
def delete_snapshot(self, snapshot_id):
self.ec2.delete_snapshot(SnapshotId=snapshot_id)
def lambda_handler(event, context):
print("event " + str(event))
print("context " + str(context))
ec2_reg = boto3.client('ec2')
regions = ec2_reg.describe_regions()
for region in regions['Regions']:
region_name = region['RegionName']
instances = Ec2Instances(region_name)
deleted_counts = instances.delete_snapshots(1)
print("deleted_counts for region "+ str(region_name) +" is " + str(deleted_counts))
return 'completed'
With this simple python script, it will simply brutally delete all snapshots older than a day crossing all AWS regions.
What is the real problem?
After I deployed and tested my lambda function and go to the CloudWatch Event Bridge, I just realized where the weird snapshots are from.
This is cloudwatch event was configured by me for the snapshot backup long time ago.
Solution and result
So the solution is to simply delete that event in CloudWatch.
Summary
- Always use AWS Service properly. (In this case, not use cloudwatch event for trigger backup instead use EBS Snapshot Policy or AWS Backup for that)
- Use AWS Config to log what you did before, make your life easier to trace where the wrong config is from
- Write a blog as an external logging system, :), If I wrote it down that time when used the CloudWatch Events for triggering the snapshot backup, I might figure out the root cause faster.