🧠 Second Brain


Search IconIcon to open search

DAG: Directed Acyclic Graphs

Last updated Feb 9, 2024

DAGs, or Directed Acyclic Graphs, represent a conceptual or mathematical model of a data pipeline, embodying a series of activities in a specific arrangement.

In a DAG, data flows through a finite set of nodes connected by edges. Notably, these graphs lack a designated start or end node and, crucially, prevent data from looping back to its point of origin.

Their popularity in data engineering stems from their clear depiction of Data Lineage and their suitability for functional approaches, ensuring idempotency in restarting pipelines without side effects.

Example: ^f136d7

graph LR
    A[Start] -->|process 1| B[Middle]
    B -->|process 2| C[End]
    A -->|process 3| C

This example illustrates a DAG starting at A, diverging at B, and converging at C. The absence of cycles means no path leads back to its starting point.

# Why Directed Acyclic Graphs Matter

The unique combination of features in DAGs may seem arbitrary, yet they are instrumental in crafting efficient ETL pipelines.

From a mathematical perspective, a directed acyclic graph (DAG) is a directed graph without any directed cycles. Comprising vertices “Vertex (graph theory)”) and edges “Edge (graph theory)”), each edge leads from one vertex to another, ensuring no closed loops form.

A DAG can be topologically ordered, aligning the vertices linearly while respecting edge directions. This graph type finds diverse applications across various fields, from biology and sociology to computation.

To understand the role of DAGs in data orchestration, refer to the Why you need an Orchestrator section.

Created 2022-05-19