Search
Open Table Format Catalogs
Open Catalogs are similar to the Hive Metastore before, an index for what tables you have in your data lake.
In a relational database, this is equivalent with the INFORMATION_SCHEMA where SELECT * FROM INFORMATION_SCHEMA.tables; that most databases support.
A great overview from
YouTube Discussion:
Looking at the compatibility matrix in your image, here’s a nicely formatted markdown table:
| Engine | Unity Catalog | Glue Catalog | Snowflake Horizon | Polaris Catalog | BigQuery Metastore |
|---|---|---|---|---|---|
| Databricks | ✓ | ~ | ✗ | ✗ | ✗ |
| AWS | ~ | ✓ | ~ | ✗ | ✗ |
| Fabric | ✗ | ✗ | ✗ | ✗ | ✗ |
| Snowflake | ✓ | ~ | ✓ | ✗ | ✗ |
| OSS Iceberg Clients | ✓ | ~ | ~ | ✓ | ✗ |
| BigQuery | ✗ | ✗ | ✗ | ✗ | ✓ |
Legend: ✓ Full Support | ~ Partial Support | ✗ No Access — Image inspired by The Whys of Managed Iceberg with Databricks (see orig)
# Different Catalogs
Open Source Catalogs:
- Apache Polaris Catalog: Fully open source, designed for broad compatibility with Iceberg clients
- Iceberg Catalog: Reference implementation, lightweight and standards-compliant
- DuckLake: Catalog + Table Format in one by DuckDB Labs
- Apache Gravitino: Open data catalog for building a high-performance, geo-distributed and federated metadata lake
Vendor-Managed Catalogs:
- Unity Catalog (Databricks): Advanced governance features, strong integration with Databricks ecosystem
- AWS Glue Catalog: Deep AWS integration, serverless metadata management
- Snowflake Horizon Catalog: Native Snowflake integration with governance capabilities
- BigQuery Metastore: Google Cloud native, designed for multi-engine support
- R2 Data Catalog: Cloudflare serviced that manages the Iceberg metadata and now performs ongoing maintenance, including compaction, to improve query performance.
Lightweight Alternatives: - File-based catalogs: Solutions like boring-catalog that use simple JSON files for basic catalog functionality
# Utilities
- GitHub - boringdata/boring-catalog: A lightweight, file-based Iceberg catalog implementation using a single JSON file (e.g., on S3, local disk, or any fsspec-compatible storage).
# Further Reads
Origin: Data Lake Table Format
References:
Created 2025-04-30