🧠Second Brain
Search
Dremio
Dremio revolutionizes self-service analytics for data teams. It empowers data analysts to explore and visualize data with rapid query response times. Concurrently, data engineers benefit from the ability to ingest and transform data directly in the data lake, fully supporting DML operations.
A standout feature is the ability for analysts to join data in the lake with external database data. This integration means there’s no need to move data to object storage for value extraction. Dremio’s open Lakehouse platform, grounded in community-driven standards like Apache Iceberg and Apache Arrow, empowers organizations to utilize top-tier processing engines while avoiding vendor lock-in.
Primarily, Dremio leverages In-Memory Formats for executing queries across diverse data landscapes. This is particularly beneficial for rapid querying and joining of varied data sources. Dremio fits within the Data Virtualization category.
Fun Fact
The name Dremio, takes the Drem from Demel and adds IO. So DremIO
# Older Illustrations
Comparison: traditional approach versus Dremio.
Efficiency in data processing with Apache Arrow:
Advantages of a unified data layer:
# Dremio 2.0
This major release brings a suite of new features, performance enhancements, and stability improvements. Below are the highlights. Join our product team for a live discussion on these features:
- Starflake Reflections: Auto-acceleration of queries on datasets with joins is more efficient. Reflections on datasets joining a fact table with multiple dimension tables now support accelerated queries for any subset of these joins, simplifying reflection creation.
- New Reflection Management Engine: Enhancements in scalability, debuggability, and resilience mark this update. The engine optimizes reflection maintenance, reducing overhead and resource utilization.
- Expanded REST APIs: Perform most operations via REST APIs, including querying, data catalog management, reflections management, manual reflection refreshing, and job status checks.
- External Reflections: Leverage external summary tables or digests (e.g., Parquet files by Spark or Hive) within Dremio’s reflection framework to accelerate queries, defined using SQL commands.
- Crowd-sourced Dataset Acceleration Votes: Admins can now view all datasets with user acceleration votes, aiding in understanding demand and popularity. This feature is exclusive to Dremio Enterprise Edition.
- Source Configuration Change Warnings: Alerts for configuration changes impacting reflection, format, and sharing settings.
- Enhanced Performance on Amazon S3: Notable improvements in upload performance and memory usage for reflections on Amazon S3.
- Optimized IN Clause Performance: Significant enhancements in SQL IN clause execution and planning for push-downs.
Discover more in the release notes: Dremio 2.0 Release Notes. Kelly
References: My Dremio Setup