Storage Layer (Object-Store)

Last updated Mar 25, 2024

The Storage Layer, encompassing object storage services from the three major cloud providers - AWS S3, Azure Blob Storage, and Google Cloud Storage - is an integral part of modern data infrastructure. This layer supports Data Lake File Formats and Data Lake Table Format, offering a robust foundation for data management.

These storage solutions are characterized by their high configurability, solid security and reliability, user-friendly web interfaces, and flexible storage options.

However, their utility extends beyond mere basic storage. In the context of a Data Lakehouse, they empower advanced data processing capabilities through powerful engines like Apache Spark, Trino, Druid/ClickHouse, and various Python libraries.

According to insights from Emerging Architectures for Modern Data Infrastructure - a16z:

“The storage layer is evolving significantly. Technologies such as Delta Lake, Apache Iceberg, and Apache Hudi may not be new, but their adoption is accelerating, and they are increasingly integrated into commercial products. Technologies like Iceberg, in particular, offer interoperability with cloud data warehouses like Snowflake. This interoperability is crucial in a heterogeneous data environment, likely becoming a central component of the multimodal data stack.”

# Open-Source Object Storage Tools / Self Hosted S3

