🧠 Second Brain

Search

Search IconIcon to open search

Data Fusion

Last updated Jan 22, 2023

# Data Fusion

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

DataFusion supports both an SQL and a DataFrame API for building logical query plans as well as a query optimizer and execution engine capable of parallel execution against partitioned data sources (CSV and Parquet) using threads.

DataFusion also supports distributed query execution via the Ballista (Arrow) crate.

# Use Cases

DataFusion can be used without modification as an embedded SQL engine or can be customized and used as a foundation for building new systems. Here are some examples of systems built using DataFusion:

By using DataFusion, the projects are freed to focus on their specific features, and avoid reimplementing general (but still necessary) features such as an expression representation, standard optimizations, execution plans, file format support, etc.

# Why DataFusion?

# Comparisons with other projects

Here is a comparison with similar projects that may help understand when DataFusion might be be suitable and unsuitable for your needs:


Origin:
References: ROAPI Apache Arrow Rust
Created: 2021-10-14