🧠 Second Brain


Search IconIcon to open search


Last updated Feb 21, 2024

ETL (Extract, Transform, and Load) and ELT (Extract, Load, and Transform) are key processes in data handling, with ETL occurring within a Data Warehouse and ELT within a Data Lake (refer to Data Warehouse vs Data Lake for more).

ETL is traditionally the approach for transferring data from source systems to a Data Warehouse. This involves loading data into the staging layer of your DWH, cleansing, and transforming it according to the Dimensional Model (encompassing Facts and Dimensions), before finally loading it into a Data Mart or Cube for subsequent Data Visualizations.

ELT, on the other hand, is suited for constructing a Data Lake. It entails extracting data—typically in file form—loading it into your Data Lake stored on Cloud Storage, and then beginning the transformation and cleansing processes. This approach naturally leads to data exploration and analysis, which often reveals the need for data cleansing.

As noted by Mehdi Ouazza in RW Data Lake Search LinkedIn:

A #data stack that integrates a cloud #datawarehouse (e.g., Snowflake, BigQuery) greatly benefits from the presence of a #datalake (or delta lake) due to:

  1. Cost-efficiency at scale: Utilizing object storage like AWS S3 Glacier for archiving can significantly reduce storage costs. Only data currently in use should be loaded into the data warehouse.

  2. Increased flexibility: Choose any storage and compute engine based on your use case (OLAP vs. OLTP for low latency). The data format is not restrictive.

  3. Avoidance of vendor lock-in: Using an open format like parquet/delta lake simplifies data migration.

While many cloud data warehouses are incorporating features from data lakes, maintaining both offers the best advantages.

For a comprehensive comparison, consider exploring the detailed list of Top ETL tools or the spreadsheet comparing various tools in the market.