In the previous article about YARN container memory (see, Tez Memory Tuning – Container is Running Beyond Physical Memory Limits) I wrote about the physical memory. Now I would like to pay attention to the virtual memory in YARN.
A typical YARN memory error may look like this:
Container is running beyond virtual memory limits. Current usage: 1.0 GB of 1.1 GB physical memory used; 2.9 GB of 2.4 GB virtual memory used. Killing container.
So what is the virtual memory, how to solve such errors and why is the virtual memory size often so large?
Let’s find a YARN container and investigate its memory usage:
$ ssh -i private_key [email protected] $ sudo jps -l -v 45117 org.apache.tez.runtime.task.TezChild -Xmx1152m ... ...
Process PID 45117
is a YARN container for a Tez task (a task of a Apache Hive query in my case). Using the top
command we can check its virtual memory usage:
$ sudo top -p 45117 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 45117 yarn 20 0 3.010m 832m 41m S 157.7 0.7 0:31.38 java
You can see that the process virtual memory is 3.0 GB while the Java process was launched with maximum heap size of 1152 MB (-Xmx1152m). So what did take another ~2 GB?
Using the pmap
command we can see details of the memory map for the process:
$ sudo pmap 45117 ... 0000000001ec4000 19432K rw--- [ anon ] 00000000b8000000 670208K rw--- [ anon ] 00000000e0e80000 116224K ----- [ anon ] 00000000e8000000 392704K rw--- [ anon ] 00000000fff80000 512K ----- [ anon ] 0000000100000000 4784K rw--- [ anon ] 00000001004ac000 1043792K ----- [ anon ] 00007fa01d121000 512K rw--- [ anon ] 00007fa01d1a1000 1536K ----- [ anon ] 00007fa01d321000 20K r-x-- /usr/lib/hadoop/lib/native/libsnappy.so.1.1.3 00007fa01d326000 2044K ----- /usr/lib/hadoop/lib/native/libsnappy.so.1.1.3 ... 00007fa02218e000 40K r--s- /usr/lib/hadoop-yarn/lib/jersey-core-1.9.jar 00007fa022198000 20K r--s- /usr/lib/hadoop-yarn/lib/commons-lang-2.6.jar 00007fa02219d000 76K r--s- /usr/lib/hadoop-yarn/lib/zookeeper-3.4.10.jar 00007fa0221b0000 12K r--s- /usr/lib/hadoop-yarn/lib/jsr305-3.0.0.jar 00007fa0221b3000 8K r--s- /usr/lib/hadoop-yarn/lib/stax-api-1.0-2.jar ... 00007fa022acc000 16K r--s- /usr/lib/hadoop/lib/commons-io-2.4.jar 00007fa022ad0000 36K r--s- /usr/lib/hadoop/lib/jets3t-0.9.0.jar 00007fa022ad9000 20K r--s- /usr/lib/hadoop/lib/commons-net-3.1.jar 00007fa022ade000 8K r--s- /usr/lib/hadoop/lib/commons-codec-1.4.jar 00007fa022ae0000 8K r--s- /usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar 00007fa022ae2000 8K r--s- /usr/lib/hadoop/lib/curator-client-2.7.1.jar 00007fa022ae4000 24K r--s- /usr/lib/hadoop/lib/commons-httpclient-3.1.jar ... 00007fa023ab7000 12K ----- [ anon ] 00007fa023aba000 1016K rw--- [ anon ] 00007fa023bb8000 12K ----- [ anon ] 00007fa023bbb000 1016K rw--- [ anon ] 00007fa023cb9000 12K ----- [ anon ] 00007fa023cbc000 1016K rw--- [ anon ] 00007fa023dba000 12K ----- [ anon ] 00007fa023dbd000 1016K rw--- [ anon ] ... 00007fa04d9e3000 28K r-x-- /lib64/librt-2.17.so 00007fa04d9ea000 2044K ----- /lib64/librt-2.17.so 00007fa04dbe9000 4K r---- /lib64/librt-2.17.so 00007fa04dbea000 4K rw--- /lib64/librt-2.17.so 00007fa04dbeb000 84K r-x-- /lib64/libgcc_s-4.8.3-20140911.so.1 00007fa04dc00000 2048K ----- /lib64/libgcc_s-4.8.3-20140911.so.1 00007fa04de00000 4K rw--- /lib64/libgcc_s-4.8.3-20140911.so.1 00007fa04de01000 1028K r-x-- /lib64/libm-2.17.so 00007fa04df02000 2044K ----- /lib64/libm-2.17.so 00007fa04e101000 4K r---- /lib64/libm-2.17.so 00007fa04e102000 4K rw--- /lib64/libm-2.17.so 00007fa04e103000 920K r-x-- /usr/lib64/libstdc++.so.6.0.19 00007fa04e1e9000 2044K ----- /usr/lib64/libstdc++.so.6.0.19 ... 00007fa050144000 4K rw--- [ anon ] 00007ffd9b59b000 136K rw--- [ stack ] 00007ffd9b5cc000 8K r---- [ anon ] 00007ffd9b5ce000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 3083100K
Although JVM does not immediately allocate the maximum heap size (-Xmx) specified for the process it reserves its maximum amount (in my example 1152 MB) in the virtual memory.
But besides JVM heap areas (marked as [ anon ]
in the output above), various other I/O and system areas, there are a lot of .so
shared libraries and .jar
files mapped to the virtual address space of the process. In my case there are about 200 .so
and 400 .jar
files that’s why the virtual memory takes ~3 GB.
In YARN, there is the option yarn.nodemanager.vmem-pmem-ratio
that is set to 2.1 by default. If you allocate relatively small containers at ~1 GB this ratio can be low and you may often face the "Container is running beyond virtual memory limits"
errors.
It is recommended to set this ratio to a higher value, for example, 5 since the virtual address space of a YARN container may be overcrowded by a large number of .so
and .jar
files.
Another less recommended solution is to disable the virtual memory check by setting yarn.nodemanager.vmem-check-enabled
to false
.