Software-defined Assets were first introduced by Dagster. They represent a novel, declarative approach to managing data and orchestrating its maintenance. This approach involves using code to define the data assets you want to create. These asset definitions are version-controlled through git and inspectable via tooling. This transparency allows anyone in your organization to understand the canonical set of data assets, reproduce them at any time, and lays the groundwork for asset-based orchestration.
“Software-defined assets as additional microservices to a data asset discussed here”
The key to software-defined assets is that you can declare a data asset/product pre-runtime. The SW-defined function defined in Dagster is like a microservice or simpler, just the function on a single asset (that can live independently).
The declarative model provides comprehensive code-based information, assisting Data Orchestrators in understanding asset lineage, operational requirements, and more. Traditional DAGs for jobs, tasks, and operations remain relevant for scheduling purposes. However, in cases like an ML model generating a BigQuery table, you can define upstream datasets without relying on an orchestrator. This shift towards a single function or a software-defined asset is transformative.
One of the most significant outcomes is the accurate data lineage of physical assets, as opposed to an arbitrary lineage of tasks, which may hold more value for engineers than data consumers.
During the Dagster Community Day, Nick Schrock, the founder of Dagster, made an analogy: “Think of an iPhone: It feels like one device, but it’s inherently complex and heterogeneous. This complexity mirrors that of orchestration, which might be the future of bundling the Open Data Stack into a unified data stack.” He also posited the possibility of vertical integration with major vendors.
# A bridge between tasks and jobs
Software-defined assets essentially act as a bridge between tasks, jobs, graphs, and the assets themselves. While ops, jobs, and graphs form Dagster’s foundation, Nick Schrock anticipates a future where more code is written using software-defined assets.
This shift is significant as it minimizes the need for extensive boilerplate coding, given its declarative nature. It focuses on defining what an asset should do and include, rather than its operational mechanics within Dagster. This trend is further discussed in Data Orchestration Trends- The Shift From Data Pipelines to Data Products, where I recently elaborated on moving from an imperative pipeline of ops, jobs, and graphs to asset-centric software-defined assets.
Showcase by Dagster: Dagster Data Orchestration 10 min walkthrough.
In response to Benn Stencil article, I discussed the essence of software-defined assets. The primary advantage is the ability to declare a data asset/product before runtime. The software-defined function, as conceptualized in Dagster, resembles a microservice. It focuses on a single, independently existing asset.
Introducing Software-Defined Assets | Dagster Blog