April 2020 – Large-Scale Data Engineering in Cloud

Flink, JVM, Memory, YARN

Flink 1.9 – Off-Heap Memory on YARN – Troubleshooting Container is Running Beyond Physical Memory Limits Errors

April 29, 2020

On one of my clusters I got my favorite YARN error, although now it was in a Flink application:

Container is running beyond physical memory limits. Current usage: 99.5 GB of 99.5 GB physical memory used; 105.1 GB of 227.8 GB virtual memory used. Killing container.

Why did the container take so much physical memory and fail? Let’s investigate in detail.

Read More

dmtolpeko
AWS, I/O, S3

S3 Multipart Upload – S3 Access Log Messages

April 17, 2020

Most applications writing data into S3 use the S3 multipart upload API to upload data in parts. First, you initiate the load, then upload parts and finally complete the multipart upload.

Let’s see how this operation is reflected in the S3 access log. My application uploaded the file data.gz into S3, and I can view it as follows:

Read More

dmtolpeko
AWS, Flink, I/O, S3

Flink – Tuning Writes to S3 Sink – fs.s3a.threads.max

April 12, 2020

One of our Flink streaming jobs had significant variance in the time spent on writing files to S3 by the same Task Manager process.

What settings do you need to check first?

Read More

dmtolpeko