Search

Search IconIcon to open search

Open Table Format Catalogs

Last updated by Simon Späti

Open Catalogs are similar to the Hive Metastore before, an index for what tables you have in your data lake.

In a relational database, this is equivalent with the INFORMATION_SCHEMA where SELECT * FROM INFORMATION_SCHEMA.tables; that most databases support.

A great overview from YouTube Discussion:
Looking at the compatibility matrix in your image, here’s a nicely formatted markdown table:

Engine Unity Catalog Glue Catalog Snowflake Horizon Polaris Catalog BigQuery Metastore
Databricks ~
AWS ~ ~
Fabric
Snowflake ~
OSS Iceberg Clients ~ ~
BigQuery

Legend: ✓ Full Support | ~ Partial Support | ✗ No Access — Image inspired by The Whys of Managed Iceberg with Databricks (see orig)

# Different Catalogs

Open Source Catalogs:

  • Apache Polaris Catalog: Fully open source, designed for broad compatibility with Iceberg clients
  • Iceberg Catalog: Reference implementation, lightweight and standards-compliant
  • DuckLake: Catalog + Table Format in one by DuckDB Labs
  • Apache Gravitino: Open data catalog for building a high-performance, geo-distributed and federated metadata lake

Vendor-Managed Catalogs:

  • Unity Catalog (Databricks): Advanced governance features, strong integration with Databricks ecosystem
  • AWS Glue Catalog: Deep AWS integration, serverless metadata management
  • Snowflake Horizon Catalog: Native Snowflake integration with governance capabilities
  • BigQuery Metastore: Google Cloud native, designed for multi-engine support
  • R2 Data Catalog: Cloudflare serviced that manages the Iceberg metadata and now performs ongoing maintenance, including compaction, to improve query performance.
    Lightweight Alternatives:
  • File-based catalogs: Solutions like boring-catalog that use simple JSON files for basic catalog functionality

# Utilities

  • GitHub - boringdata/boring-catalog: A lightweight, file-based Iceberg catalog implementation using a single JSON file (e.g., on S3, local disk, or any fsspec-compatible storage).

# Further Reads


Origin: Data Lake Table Format
References:
Created 2025-04-30