Search

Search IconIcon to open search

Open Table Format Catalogs

Last updatedUpdated: by Simon Späti · CreatedCreated:

Open Catalogs are similar to the Hive Metastore before, an index for what tables you have in your data lake.

In a relational database, this is equivalent with the INFORMATION_SCHEMA where SELECT * FROM INFORMATION_SCHEMA.tables; that most databases support.

A great overview from YouTube Discussion:
Looking at the compatibility matrix in your image, here’s a nicely formatted markdown table:

Engine Unity Catalog Glue Catalog Snowflake Horizon Polaris Catalog BigQuery Metastore
Databricks ~
AWS ~ ~
Fabric
Snowflake ~
OSS Iceberg Clients ~ ~
BigQuery

Legend: ✓ Full Support | ~ Partial Support | ✗ No Access — Image inspired by The Whys of Managed Iceberg with Databricks (see orig)

# Different Catalogs

Open Source Catalogs:

  • Apache Polaris Catalog: Fully open source, designed for broad compatibility with Iceberg clients
  • Iceberg Catalog: Reference implementation, lightweight and standards-compliant
  • DuckLake: Catalog + Table Format in one by DuckDB Labs
  • Apache Gravitino: Open data catalog for building a high-performance, geo-distributed and federated metadata lake
  • Lakekeeper: Secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.

Vendor-Managed Catalogs:

  • Unity Catalog (Databricks): Advanced governance features, strong integration with Databricks ecosystem
  • AWS Glue Catalog: Deep AWS integration, serverless metadata management
  • Snowflake Horizon Catalog: Native Snowflake integration with governance capabilities
  • BigQuery Metastore: Google Cloud native, designed for multi-engine support
  • R2 Data Catalog: Cloudflare serviced that manages the Iceberg metadata and now performs ongoing maintenance, including compaction, to improve query performance.
    Lightweight Alternatives:
  • File-based catalogs: Solutions like boring-catalog that use simple JSON files for basic catalog functionality

# Utilities

  • GitHub - boringdata/boring-catalog: A lightweight, file-based Iceberg catalog implementation using a single JSON file (e.g., on S3, local disk, or any fsspec-compatible storage).

# Limitations

See some bottnecks and limitation when using propretiary Open Table Catalog in Data Lakehouse or in the History of General Architecture in Data.

# Further Reads


Origin: Data Lake Table Format