Search

Apache Hive

Last updatedUpdated: Jun 12, 2026 by Simon Späti · 1 min read

Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data queries and analysis. [2] Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries ( HiveQL) into the underlying Java without the need to implement queries in the low-level Java API.

The Apache Hive ™ is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale and facilitates reading, writing, and managing petabytes of data residing in distributed storage using SQL.

^8274d3

The original Table Format was Apache Hive. In Hive, a table is defined as all the files in one or more particular directories. While this enabled SQL expressions and other analytics to be run on a data lake, It couldn’t effectively scale to the volumes and complexity of analytics needed to meet today’s needs. Other table formats were developed to provide the scalability required.

# Components

It consists of Hive Catalog and Hive Metastore.

Origin:
Created

Interactive Graph

Table of Contents

Components

Backlinks

Apache Hudi
Data Lake Table Formats (Open Table Formats)
Data Lakehouse
SQL Query Engine
Blog: Data Lake / Lakehouse Guide: Powered by Data Lake Table Formats (Delta Lake, Iceberg, Hudi)

Home
Blog
GitHub
Support
AI Use
Discuss