ETL – Large-Scale Data Engineering in Cloud

ETL, Hive, Presto

Presto vs Hive – SLA Risks for Long Running ETL – Failures and Retries Due to Node Loss

December 4, 2019

Presto is an extremely powerful distributed SQL query engine, so at some point you may consider using it to replace SQL-based ETL processes that you currently run on Apache Hive.

Although it is completely possible, you should be aware of some limitations that may affect your SLAs.

Read More

dmtolpeko
ETL, Snowflake

Snowflake – Reloading Data from Stage – TRUNCATE, DELETE, COPY and Transactions

May 6, 2019

Sometimes you need to reload the entire data set from the source storage into Snowflake. For example, you may want to fully refresh a quite large lookup table (2 GB compressed) without keeping the history. Let’s see how to do this in Snowflake and what issues you need to take into account.

Read More

dmtolpeko