Search

Search IconIcon to open search

Idempotency

Last updatedUpdated: by Simon Späti · CreatedCreated:

Idempotency is the property of a certain operation that can be applied multiple times without changing the resulting outcome by being given the same inputs. It is used in Functional Programming and was the base foundation for Functional Data Engineering.

Idempotence can be mathematically summarized as f(f(x)) =3D f(x).

Zach Wilson says:

Idempotent pipelines go hand-in-hand with reproducibility. If you can’t reproduce bugs, you’ll have a painful time debugging data quality errors!

Here are some signs your pipeline isn’t idempotent:

  • it uses INSERT INTO instead of INSERT OVERWRITE or MERGE
  • when you filter on dates you have date > start but no date < end. This bug will cause backfill costs to grow exponentially
  • the source tables you pull from are always “latest” and not daily snapshots. The only exception here is properly modeled slowly changing dimension tables.

If you can’t reproduce bugs, you’ll have a painful time debugging data quality errors.

# Further Reads


Origin: