🧠 Second Brain

Search

Search IconIcon to open search

ELT

Last updated Feb 27, 2024

ELT (Extract, Load, and Transform) represents a Data Integration methodology where data is first extracted (E) from source systems, then loaded (L) as raw data into a target system, followed by transformation (T) within the target. This approach, executed within the destination Data Warehouse, contrasts with the traditional ETL method, where data undergoes transformation prior to reaching its destination. For a comprehensive comparison, see ETL vs ELT.

The evolution from ETL to ELT has been propelled by the decreasing costs of cloud computing and storage, alongside the emergence of cloud-based data warehouses like Redshift, BigQuery, and Snowflake.

ELT is notably utilized in Data Lake environments. Airbyte emerged as a benchmark for open-source ELT in 2020, while Fivetran, being the pioneer, operates on a closed-source model.

# Connectors

For an extensive list of connectors, visit the Connector Catalog by Whaly.

# History

ELT has become increasingly popular due to a number of factors. Data is being generated in ever-larger volumes, often without human input. Storage costs are getting cheaper either on-prem or in the cloud. Compute’s cost has decreased over time with the plurality of open source tools (e.g., Apache Spark, Apache Hadoop, Apache Beam) and cloud offerings (e.g., AWS, Microsoft Azure, and Google Cloud).

Modern cloud data platforms offer low-cost solutions to analyze heterogeneous, remote, and distributed data sources in a single environment. In combination with real-time data integration that allows transforming and processing streaming data in flight, the data can be ready for analysis the moment it arrives at the target platform. A definite shift to ELT technology happened when enterprises started moving from on-prem data warehouses built on relational databases to Map-Reduce deployments (with Hadoop being the most popular at the time), NoSQL environments, and streaming data platforms (e.g., Kafka, Apache Flink, Apache Storm), and especially when all these increased their footprint in the cloud.

More on The History, Present, and Future of ETL Technology ( Zotero).


Origin: Data Warehouse vs Data Lake | ETL vs ELT | ssp.sh
References:
Created 2022-07-31