Most applications writing data into S3 use the S3 multipart upload API to upload data in parts. First, you initiate the load, then upload parts and finally complete the multipart upload.
Let’s see how this operation is reflected in the S3 access log. My application uploaded the file data.gz
into S3, and I can view it as follows:
$ aws s3 ls s3://cloudsqale/ 2020-04-17 15:52:55 700156671 data.gz
And what I can see in S3 access log configured for my bucket (I skipped a few columns for clarity):
Request Timestamp Operation Key Size Total Time (ms) Turnaround Time (ms) ----------------- ---------- --- ---- ---------- --------------- 17/Apr/2020:15:52:54 REST.POST.UPLOADS data.gz NULL 16 9 17/Apr/2020:15:53:47 REST.PUT.PART data.gz 134237279 1653 168 17/Apr/2020:15:54:56 REST.PUT.PART data.gz 135973725 1764 21 17/Apr/2020:15:56:06 REST.PUT.PART data.gz 136060019 2440 833 17/Apr/2020:15:57:13 REST.PUT.PART data.gz 136033896 1916 297 17/Apr/2020:15:57:13 REST.PUT.PART data.gz 134786426 1600 126 17/Apr/2020:15:57:25 REST.PUT.PART data.gz 23065326 372 91 17/Apr/2020:15:59:48 REST.POST.UPLOAD data.gz 700156671 114 74
REST.POST.UPLOADS
starts a multipart upload. Note that the file creation time is the start of upload. Then 6 parts were uploaded by REST.PUT.PART
operation. Each part except the last one is about 130 MB. If you sum up the sizes of all parts you get 700156671 i.e. the size of the uploaded file.
Finally, REST.POST.UPLOAD
commits the multipart upload and logs the total size of the file.