Search
Idempotency
Idempotency is the property of a certain operation that can be applied multiple times without changing the resulting outcome by being given the same inputs. It is used in Functional Programming and was the base foundation for Functional Data Engineering.
Idempotence can be mathematically summarized as f(f(x)) =3D f(x).
Idempotent pipelines go hand-in-hand with reproducibility. If you can’t reproduce bugs, you’ll have a painful time debugging data quality errors!
Here are some signs your pipeline isn’t idempotent:
- it uses INSERT INTO instead of INSERT OVERWRITE or MERGE
- when you filter on dates you have date > start but no date < end. This bug will cause backfill costs to grow exponentially
- the source tables you pull from are always “latest” and not daily snapshots. The only exception here is properly modeled slowly changing dimension tables.
If you can’t reproduce bugs, you’ll have a painful time debugging data quality errors.
# Further Reads
- Wikipedia.
- Idempotency Is Easy Until the Second Request Is Different | <span class=“text-terminal-purple”>Dochia</span> CLI Blog
Origin: