AWS,  I/O,  S3

S3 Multipart Upload – S3 Access Log Messages

Most applications writing data into S3 use the S3 multipart upload API to upload data in parts. First, you initiate the load, then upload parts and finally complete the multipart upload.

Let’s see how this operation is reflected in the S3 access log. My application uploaded the file data.gz into S3, and I can view it as follows:

$ aws s3 ls s3://cloudsqale/
2020-04-17 15:52:55  700156671 data.gz

And what I can see in S3 access log configured for my bucket (I skipped a few columns for clarity):

Request Timestamp      Operation          Key      Size       Total Time (ms)   Turnaround Time (ms)
-----------------      ----------         ---      ----       ----------        ---------------
17/Apr/2020:15:52:54   REST.POST.UPLOADS  data.gz  NULL	      16                9
17/Apr/2020:15:53:47   REST.PUT.PART	  data.gz  134237279  1653              168
17/Apr/2020:15:54:56   REST.PUT.PART      data.gz  135973725  1764              21
17/Apr/2020:15:56:06   REST.PUT.PART      data.gz  136060019  2440              833
17/Apr/2020:15:57:13   REST.PUT.PART      data.gz  136033896  1916              297
17/Apr/2020:15:57:13   REST.PUT.PART      data.gz  134786426  1600              126
17/Apr/2020:15:57:25   REST.PUT.PART      data.gz  23065326   372               91
17/Apr/2020:15:59:48   REST.POST.UPLOAD   data.gz  700156671  114               74

REST.POST.UPLOADS starts a multipart upload. Note that the file creation time is the start of upload. Then 6 parts were uploaded by REST.PUT.PART operation. Each part except the last one is about 130 MB. If you sum up the sizes of all parts you get 700156671 i.e. the size of the uploaded file.

Finally, REST.POST.UPLOAD commits the multipart upload and logs the total size of the file.