Amazon,  AWS,  I/O,  Monitoring,  S3,  Storage

S3 Monitoring Step #1 – Bucket Size and Number of Objects

The first step in Amazon S3 monitoring is to check the current state of your S3 buckets and how fast they grow. You can easily get this information from the CloudWatch Management console, running a AWS CLI command or AWS SDK script.

Bucket Size

Here is an example of AWS CLI command to get the size of a bucket for every day within --start-time and --end-time date range:

aws cloudwatch get-metric-statistics \
  --metric-name BucketSizeBytes --namespace AWS/S3 \
  --start-time 2018-10-01T00:00:00Z --end-time 2018-10-08T00:00:00Z \
  --statistics Maximum --unit Bytes --region us-east-1 \
  --dimensions Name=BucketName,Value=cloudsqale Name=StorageType,Value=StandardStorage \
  --period 86400 --query 'Datapoints[*].[Timestamp, Maximum]' \
  --output text | sort  | python cloudwatch_s3_metrics.py

I use a simple Python script cloudwatch_s3_metrics.py to format the data and calculate the bucket growth per day:

import sys
prev  = None
print "Date\tBucket Size\tGrowth per Day"
for i in sys.stdin:
    line = i.split('\t')
    # Using 1000 instead of 1024 to match CloudWatch metrics
    cur = float(line[1])/(1000*1000*1000*1000)
    diff = (cur - prev) if prev is not None else float('nan')
    prev = cur
    print line[0][0:10] + '\t' + "%.3f TB" % cur + '\t' + "%.3f TB" % diff

So the sample output is as follows:

Date            Bucket Size    Growth per Day
2018-10-01      2087.301 TB    nan
2018-10-02      2099.817 TB    12.516 TB
2018-10-03      2117.809 TB    17.992 TB
2018-10-04      2138.358 TB    20.549 TB
2018-10-05      2158.940 TB    20.582 TB
2018-10-06      2179.499 TB    20.559 TB
2018-10-07      2203.798 TB    24.299 TB

Number of objects

Now let’s check the number of objects in the bucket:

aws cloudwatch get-metric-statistics \
  --metric-name NumberOfObjects --namespace AWS/S3 \
  --start-time 2018-10-01T00:00:00Z --end-time 2018-10-08T00:00:00Z \ 
  --statistics Maximum --unit Count --region us-east-1 \
  --dimensions Name=BucketName,Value=cloudsqale Name=StorageType,Value=AllStorageTypes \
  --period 86400 --query 'Datapoints[*].[Timestamp, Maximum]' \
  --output text | sort | python cloudwatch_s3_metrics_obj.py

Again I use a simple Python script cloudwatch_s3_metrics_obj.py to format the data and calculate the growth of the number of objects per day:

import sys
prev  = None
print "Date\tNumber of Objects\tGrowth per Day"
for i in sys.stdin:
    line = i.split('\t')
    cur = float(line[1])
    diff = (cur - prev) if prev is not None else float('nan')
    prev = cur
    print line[0][0:10] + '\t' + "%.0f" % cur + '\t' + "%.0f" % diff

Here is my sample result:

Date            Number of Objects   Growth per Day
2018-10-01      8851954             nan
2018-10-02      8912936             60982
2018-10-03      8975252             62315
2018-10-04      9031277             56025
2018-10-05      9078046             46768
2018-10-06      9129067             51020
2018-10-07      9170534             41467

As soon as you know how your S3 operates in general, it is time to see what actually drives this growth.