🧠 Second Brain


Search IconIcon to open search

Data Engineering Approaches

Last updated Feb 9, 2024

Because I encountered these bottlenecks myself, and more frequently lately, I asked myself, how can we:

I’m aware that nowadays a lot is going on, especially around open-source tools and frameworks, data ops, and deployments with container-orchestration systems and the like.

Nevertheless, I tried to collect some approaches that helped me make this complex construct more open and ease the overall experience. Some will present themselves as more complicated in the short term, but significantly leaner and less complex over time. You can apply each of them separately, yet the more you use them, the more apparent the flow as a whole will be.

Origin: Business Intelligence meets Data Engineering with Emerging Technologies |
References: Use a data lake Use transactional processing Use less of surrogate keys, instead go back to business keys that everyone understands Notebooks Use python (and SQL if possible) Use open-source tools Load incremental and Idempotency Don’t do structure changes (ALTER) in traditional DDL manner Use a container-orchestration system Use declarative pipelining instead of imperative Use data catalogs to have a central metadata store Use closed-source if you don’t have the developers or the time