Data Transformation
Data transformation is the process of converting data from one format into a different format. Reasons for doing this could be to optimize the data for a different use case than it was originally intended for, or to meet the requirements for storing data in a different system. Data transformation may involve steps such as cleansing, normalizing, structuring, validation, sorting, joining, or enriching data.
# How is data transformation done
Data is often transformed as part of an ETL or ELT approach to Data Integration.
See ETL vs ELT for a comparison of these two approaches.
Additionally, a hybrid approach has recently emerged which is known as EtLT. This combines aspects of both ETL and ELT.
# Benefits of data transformation
When used correctly, data transformation can provide the following benefits:
- Improved query-time efficiency and speed.
- Conversion of data into a format that is required by a target system.
- Enrichment of data with additional information that allows insights to be more easily extracted.
- Improved data quality by validating and fixing data, and removal of duplicates.
# Examples of data transformation
Below are some examples of how data may be transformed to achieve some of the benefits mentioned above.
# Improved efficiency and speed
One kind of transformation could be the extraction of structured data from data that is stored in a string. Imagine data that looks as follows:
|
|
In order to efficiently process this data in the future, it may preferable to transform this data into additional/new fields, and store it as:
|
|
Storing the data in this manner makes it much more efficient to analyze with operations such as:
|
|
# Enriching data
Data enrichment is a data transformation that adds additional information to the data that makes new kinds of queries possible.
Origin: Airbyte
References: Data Integration