September 2023 – Large-Scale Data Engineering in Cloud

Spark

Spark – LIMIT on Large Datasets – CollectLimit, GlobalLimit, LocalLimit, spark.sql.limit.scaleUpFactor

September 17, 2023

You use the LIMIT clause to quickly browse and review data samples, so you expect that such queries complete in less than a second. But let’s consider Spark’s LIMIT behaviour on very large data sets and what performance issues you may have.

Read More

dmtolpeko