Search

Search IconIcon to open search

Lakebase (Managed Postgres)

Last updated by Simon Späti

Announced at the Databricks Data & AI Summit 2025 on the 2025-06-11.

Essentially an OLTP (Postgres) for the Lakehouse, powered by [Neon]. Although it might only be the compute, a fully managed Postgres, nothing based on Object storage.

It’s just a managed, scalable Postgres inside the Databricks environment.

This gonna get better, or is already the result of recent acquisition of Neon. See Data Engineering Acquisitions.

“Databricks Lakebase is a fully-managed PostgreSQL OLTP engine that lives inside the Databricks Data Intelligence Platform. You provision it as a database instance (a new compute type) and get Postgres semantics—row-level transactions, indexes, JDBC/psql access—while the storage and scaling are handled for you by Databricks.” -  docs


Image from Lakebase | Databricks

# Architecture


Data + AI Summit 2025 - Keynote Recap - YouTube

# Key Capabilities

From Daniel Beach on Lakebase from Databricks. - by Daniel Beach:

  • Postgres–compatible: standard drivers, psql, extensions roadmap.
  • Managed change-data-capture into Delta Lake so OLTP data stays in sync with BI models.
  • Unified governance via Unity Catalog roles & privileges.
  • Lakehouse hooks: can feed Feature Engineering & Serving, SQL Warehouses, Databricks Apps, and RAG pipelines out of the same rows.  docs.databricks.com
  • Elastic scale: separate storage and compute lets you grow read/write throughput without dumping and re-importing data.

# The Idea


Source Lakebase from Databricks. - by Daniel Beach

# My Comment

This solution is only a solution for Databricks, and does not matter if it’s Postgres, Spark, Photon Engine, or Modern OLAP Systems, or anything else, because as a user of Databricks, using their UI or clusters, it doesn’t look different. It’s the Compute and Storage Separation.

It’s only makes a different, if you’d self host your SQL Query Engine, but that’s not what Lakebase is. It’s a close-source Postgres as far as I can see. It’s an abstraction to have less complexity for Lakehouses, but more for Databricks.

It really goes to show how Declarative Data Stacks are the future by abstraction complexity away. The Databricks lakehouse is a a declarative data stack as well, but a closed-source one. And as we learned, with decalarative data stacks we can exchange the compute, the engine. And Lakebase is just another compute for your Lakehouse if you will, that is much less complex, as it’s on Postgres.

Actually Daniel Beach agrees with me on that one:

It’s clear that Databricks, as per normal, has integrated this well into their Platform. Right now a “Databricks instance” is just a new type of compute you can select.

Essentially a managed Postgres with data retention, high availability (HA) and other features out of the box.

# In the End

It’s just a managed Postgres as part of Databricks? Is it different from running Postgres service on Azure?

I guess the users of Databricks don’t know/mind which compute they use (Photon, Spark, Postgres), as long it’s cheap and fast :)

But I’m not sure why they didn’t call it “Managed Postgres”.

It’s not open-source besides they are saying to focus on “ Openness”. Yes, Postgres is open-source, but running it and integrating it is proprietary to Databricks—therefore, a “manage Postgres”. Am I missing something? Bsky

What am I missing?

Please let me know what I’m missing, this is what I have observed at first glance.

# Limitations

As of 2025-06-13 as per Databricks:

  • A workspace allows a maximum of ten instances.
  • Each instance supports up to 1000 concurrent connections.
  • The logical size limit across all databases in an instance is 2 TB.
  • Database instances are scoped to a single workspace. Users are able to see these tables in Catalog Explorer if they have the required Unity Catalog permissions from other workspaces attached to the same metastore, but they cannot access the table contents.

# Integration

# Python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import psycopg2

from databricks.sdk import WorkspaceClient
import uuid

w = WorkspaceClient()

instance = w.database.get_database_instance(name=instance_name)
cred = w.database.generate_database_credential(request_id=str(uuid.uuid4()), instance_names=[instance_name])

# Connection parameters
conn = psycopg2.connect(
    host = instance.read_write_dns,
    dbname = "databricks_postgres",
    user = "<YOUR USER>",
    password = cred.token,
    sslmode = "require"
)

# Execute query
with conn.cursor() as cur:
    cur.execute("SELECT version()")
    version = cur.fetchone()[0]
    print(version)
conn.close()

Source

# References

It sounds similar to DuckLake recently announced, where they used a relational database, like Postgres, to manage metadata of the lake and catalog. But looking at it more, it really isn’t.


Origin: Lakebase from Databricks. - by Daniel Beach
References:
Created 2025-06-13