Hive on Tez – Shuffle Failed with Too Many Fetch Failures and Insufficient Progress

February 26, 2020

On one of the clusters I noticed an increased rate of shuffle errors, and the restart of a job did not help, it still failed with the same error.

The error was as follows:

 Error: Error while running task ( failure ) : 
  org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: 
    error in shuffle in Fetcher 
 at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal
 (Shuffle.java:301)

Caused by: java.io.IOException: 
  Shuffle failed with too many fetch failures and insufficient progress!failureCounts=1,
    pendingInputs=1, fetcherHealthy=false, reducerProgressedEnough=true, reducerStalled=true

Hadoop, JVM, Memory, YARN

Hadoop YARN – Container Virtual Memory – Understanding and Solving “Container is running beyond virtual memory limits” Errors

February 19, 2020
In the previous article about YARN container memory (see, Tez Memory Tuning – Container is Running Beyond Physical Memory Limits) I wrote about the physical memory. Now I would like to pay attention to the virtual memory in YARN.

A typical YARN memory error may look like this:
```
Container is running beyond virtual memory limits. Current usage: 1.0 GB of 1.1 GB physical memory used; 2.9 GB of 2.4 GB virtual memory used. Killing container.
```
So what is the virtual memory, how to solve such errors and why is the virtual memory size often so large?
Read More

dmtolpeko
Hadoop, YARN

Hadoop YARN Cluster Idle Time

February 14, 2020

In the previous article Calculating Utilization of Cluster using Resource Manager Logs I showed how to estimate per-second utilization for a Hadoop cluster.

This information can be useful to calculate the idle time statistics for a cluster i.e. time when no any containers are running.

Read More

dmtolpeko