Search

Search IconIcon to open search

Time-Travel

Last updated by Simon Späti

Time travel is a powerful feature in Data Lake Table Format that enables versioning of big data stored in your data lake. This capability allows you to access any historical version of your data, providing robust data management through:

  • Auditable history tracking
  • Data rollback capabilities for bad writes or deletes
  • Reproducible queries across different versions simultaneously

# Key Features

Features of that Data Lake Table Format bring:

  • Audit data changes
    • You can look at the history of table changes using the DESCRIBE HISTORY command or through the UI.
  • Time-travel is not really a history if you have huge data, otherwise it’s too expensive to save all data duplicates - If you have small data, you can hold them all
    • Increase your retention interval to higher, the default is only 7 days. That would only be a debugging feature

# Example: Delta Time Travel Overview

Based on Delta Time Travel for Data Lakes , Delta Time Travel is a data versioning feature in Databricks Delta Lake that automatically versions data stored in data lakes. This capability addresses three key challenges in modern data management:

  1. Data Auditing: Enables tracking and reviewing historical changes for compliance and debugging
  2. Experiment Reproducibility: Allows data scientists to access specific data versions for reproducing models and experiments
  3. Error Recovery: Facilitates easy rollback of bad writes or accidental deletions

# Key Functions

  • Access historical data using timestamps or version numbers
  • Query different versions simultaneously
  • Track changes through DESCRIBE HISTORY command
  • Integrate with MLflow for machine learning reproducibility
  • Maintain consistent views across downstream jobs

# Implementation Methods

Data can be accessed through:

  • Timestamp queries (e.g., TIMESTAMP AS OF "2019-01-01")
  • Version numbers (e.g., VERSION AS OF 5238)

This feature significantly improves developer productivity while helping organizations maintain a clean, centralized, and versioned data repository in cloud storage.


Origin: Delta Lake
References: Open Table Formats, Transaction Log (Delta Lake), ACID Transactions
Created