ETL (Extract, Transform, Load)

ETL stands for Extract, Transform, Load: the pipeline that moves data out of operational source systems, cleans and reshapes it to fit a target schema, and loads it into an analytical store such as a data warehouse. It exists because operational data is scattered across many systems, in many formats, and is not directly fit for analysis.

The reason ETL is a separate stage is the same reason warehouses exist. Microsoft’s architecture guidance observes that transactional source data is “highly normalized,” so analytical queries against it “tend to be complex, because most queries need to denormalize the data by using joins,” and the common practice is to “offload historical data to other systems, such as a data mart or data warehouse.” Transforming and loading the data into a purpose-built store is what makes later analysis fast.

A more recent variation is ELT, or Extract, Load, Transform, which loads raw data into the warehouse first and then transforms it using the warehouse’s own compute. Cloud warehouses made this practical: Snowflake’s documentation describes loading data and, separately, “transforming data while loading it into a table using the COPY INTO command,” as well as continuous ingestion through tools like Snowpipe. With cheap elastic compute available inside the warehouse, transformation can happen after the load rather than before it.

The ETL-versus-ELT choice is mostly about where the transformation work runs. ETL transforms data in a dedicated pipeline before loading; ELT loads first and transforms in place. Both serve the same end: turning raw operational records into clean, integrated data ready for analytical queries.

ETL (Extract, Transform, Load)

Sources

Related