🧠 Second Brain
Search
Kestra
Kestra is a universal open-source orchestrator that makes both scheduled and event-driven workflows easy. By bringing Infrastructure as Code best practices to data, process, and microservice orchestration, you can build reliable workflows and manage them with confidence.
Kestra is an open-source orchestrator designed to bring Infrastructure as Code (IaC) best practices to all workflows — from those orchestrating mission-critical operations, business processes, and data pipelines to simple Zapier-style automation.
In just a few lines of code, you can create a flow directly from the UI. Thanks to the declarative YAML interface for defining orchestration logic, business stakeholders can participate in the workflow creation process.
Kestra offers a versatile set of language-agnostic developer tools through YAML (extensive DSL (Domain Specific Language)) while simultaneously providing an intuitive user interface tailored for business professionals.
The YAML definition gets automatically adjusted any time you make changes to a workflow from the UI or via an API call. Therefore, the orchestration logic is always managed declaratively in code, even if some workflow components are modified in other ways (UI, CI/CD, Terraform, API calls).
Kestra API-first Philosophy
Built with an API-first philosophy, Kestra enables users to define and manage data pipelines through a simple YAML configuration file. This approach frees you from being tied to a specific client implementation, allowing for greater flexibility and easier integration with various tools and services.
More on Kestra Docs.
# History
First public release on 2022-02-01 on Introducing Kestra first public release :tada: with main features as:
- an orchestrator: Build a complex pipeline in couple of minutes.
- a scheduler: Launch your flows whatever your need!
- a rich ui: Create, run, and monitor all your flows with a real-time user interface.
- a data orchestrator: With its many plugins, build your data orchestration directly.
- cloud native & scalable: Scale to millions of executions without stress or hassle.
- an all-in-one platform: No need to use multiple tools to deliver a complete pipeline.
- a pluggable platform with the option to choose from several plugins or to build your own.
Summarizing their first blog, Kestra started in 2019 with this initial commit by Ludovic Dehon. At this time, Kestra was at the proof-of-concept stage. Leroy Merlin rejected Apache Airflow for their cloud-based data platform due to instability, performance issues, and lack of features.
Challenged by a co-worker, the author decided to create a new open-source workflow management system. Over 30 months, they built Kestra, choosing Kafka, ElasticSearch, and Vue.js as core technologies.
Kestra was released as open-source under the Apache License. The author, drawing from experience with another open-source project, AKHQ, created a company to support Kestra’s development.
Kestra offers deep integration with tools and databases through plugins, simplifying complex tasks compared to bash commands. Despite being a first public release, Kestra is production-ready. It’s been used at Leroy Merlin since August 2020, managing thousands of flows and millions of tasks monthly.
# Company Behind
It is a French company, and with a 3$ Mio round in 2023-10-05, Article.
# Architecture
- Kestra’s architecture has been designed to offer a transparent separation between the orchestration and data processing capabilities.
- Kestra’s Executor is responsible for executing tasks and workflows without directly interacting with the user’s infrastructure.
- The Executor relies on Workers, which are stateless processes that carry out the computation of runnable tasks and polling triggers.
- For privacy reasons, workers are the only components that interact with the user’s infrastructure, including the internal storage and external services.
- Kestra’s internal storage:
- data stored in users private bucket, not internal Kestra database. KV Store are based on internal storage, to store that data locally
# Java
Kestra is written in Java.
A
comparison by Julien Hurault with docker compose with two Pyhton Orchestrators:
# Concepts
- Flowable Tasks: Control your orchestration logic.
- Runnable Tasks: Data processing tasks handled by the workers.
- Revision: Manage versions of flows.
- Secret: Store sensitive information securely.
- Key Value (KV) Store: Build stateful workflows with the KV Store.
- Pebble Templating Engine: Dynamically render variables, inputs and outputs.
- Blueprints: Ready-to-use examples designed to kickstart your workflow.
- Backfill: Backfills are replays of missed schedule intervals between a defined start and end date.
- Task Runners: Task Runners is an extensible, pluggable system capable of executing your tasks in arbitrary remote environments.
- Replay: Replay allows you to re-run a workflow execution from any chosen task run.
- Expression: Expressions to dynamically render various flow and task properties.
More on Concepts.
# Features
- Kestra’s Realtime Triggers:
Kestra Become the First Real-Time Orchestration Platform - YouTube:
- React to events as they happen with millisecond latency. As soon as you add a Realtime Trigger to your workflow, Kestra starts an always-on thread that listens to the external system for new events. When a new event occurs, Kestra starts a workflow execution to process the event. Let us understand how we can implement Realtime Trigger for some of the messaging systems.
- KV Store
- Kestra is stateless by default. But with KV store you can save data beyond input/output data that are store in kestra internal storage.
- KV Store allows you to persist any data produced in your workflows in a key-value format
- Built on top of Kestra’s internal storage (which can be any cloud storage service like
S3
orGCS
):- there is no limit of size and you can set time life with TTL
# Releases
# v0.18.0
Kestra v0.18.0:
- an embedded Key-Value Store
- a new, improved way to manage your workflow execution Outputs
- new
ForEach
task - new
SELECT
andMULTISELECT
input types - new tasks to upload, download or delete namespace files
- improved
Purge
mechanism along with a more flexible way of deleting Executions and related logs, metrics and files - human-readable second-level
Schedule
trigger - improved JSON and ION handling, along with new plugins to transform data with
JSONata
andGrok
- SCIM Directory Sync
- many enhancements to Secrets
- a more powerful Audit Logs interface
- new capabilities in Task Runners (now in GA!)
- improved Namespace Management (now available in OSS!)
- …and a bunch of new plugins!
Additionally, SQL Server is now available in preview as a Kestra EE backend database.
See
release blog post to learn more about all enhancements.
# Other Data Orchestrators
See Data Orchestrator and Kestra vs Dagster.
Origin: Data Orchestrators
References:
Kestra, Open Source Declarative Data Orchestration, Kestra Inc
Created 2024-01-12