Search

Search IconIcon to open search

Airflow DAG Factory Pattern

Last updated by Simon Späti

Dynamically generate Apache Airflow DAGs from YAML configuration files. A declarative way of using Airflow.

# Benefits

  • Construct DAGs without knowing Python
  • Construct DAGs without learning Airflow primitives
  • Avoid duplicative code
  • done with YAML

# Quickstart Example

Source

The following example demonstrates how to create a simple DAG using dag-factory. We will be generating a DAG with three tasks, where task_2 and task_3 depend on task_1. These tasks will be leveraging the BashOperator to execute simple bash commands.

screenshot

  1. To install dag-factory, run the following pip command in your  Apache Airflow® environment:
1
pip install dag-factory
  1. Create a YAML configuration file called config_file.yml and save it within your dags folder:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
example_dag1:
  default_args:
    owner: 'example_owner'
    retries: 1
    start_date: '2024-01-01'
  schedule_interval: '0 3 * * *'
  catchup: False
  description: 'this is an example dag!'
  tasks:
    task_1:
      operator: airflow.operators.bash_operator.BashOperator
      bash_command: 'echo 1'
    task_2:
      operator: airflow.operators.bash_operator.BashOperator
      bash_command: 'echo 2'
      dependencies: [task_1]
    task_3:
      operator: airflow.operators.bash_operator.BashOperator
      bash_command: 'echo 3'
      dependencies: [task_1]

We are setting the execution order of the tasks by specifying the dependencies key.

  1. In the same folder, create a python file called generate_dags.py. This file is responsible for generating the DAGs from the configuration file and is a one-time setup. You won’t need to modify this file unless you want to add more configuration files or change the configuration file name.
1
2
3
4
5
6
7
8
9
from airflow import DAG  ## by default, this is needed for the dagbag to parse this file
import dagfactory
from pathlib import Path

config_file = Path.cwd() / "dags/config_file.yml"
dag_factory = dagfactory.DagFactory(config_file)

dag_factory.clean_dags(globals())
dag_factory.generate_dags(globals())

After a few moments, the DAG will be generated and ready to run in Airflow. Unpause the DAG in the  Apache Airflow® UI and watch the tasks execute!

screenshot

Read more on GitHub - astronomer/dag-factory: Dynamically generate Apache Airflow DAGs from YAML configuration files.

# Further Reads


Origin: Anna Geller on LinkedIn
References: Factory Pattern
Created 2024-08-13