INFORMATION_SCHEMA provides access to database metadata, information about the MySQL server such as the name of a database or table, the data type of a column, or access privileges. Other terms that are sometimes used for this information are data dictionary and system catalog.
These are the Open Table Format Catalogs of today’s world, where you query what tables are in a relational database.
Similar to the Hive Metastore before, an index for what tables you have in your Data Lake.
# Queries for Database Exploration
Here are some of the most commonly used queries.
1
2
3
4
5
|
-- List all tables in the database
SELECT table_schema, table_name, table_type
FROM INFORMATION_SCHEMA.TABLES
WHERE table_schema NOT IN ('information_schema', 'pg_catalog', 'sys')
ORDER BY table_schema, table_name;
|
1
2
3
4
5
6
|
-- Get column details for a specific table
SELECT column_name, data_type, character_maximum_length,
is_nullable, column_default
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_schema = 'your_schema' AND table_name = 'your_table'
ORDER BY ordinal_position;
|
# Primary Keys
1
2
3
4
5
6
7
8
9
10
|
-- Find primary keys
SELECT
tc.table_schema,
tc.table_name,
kc.column_name
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc
JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kc
ON tc.constraint_name = kc.constraint_name
WHERE tc.constraint_type = 'PRIMARY KEY'
ORDER BY tc.table_schema, tc.table_name;
|
# Foreign Keys
Foreign Key exploration:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
-- Find foreign key relationships
SELECT
fk.table_schema,
fk.table_name,
fk.column_name,
fk.constraint_name,
rc.referenced_table_schema,
rc.referenced_table_name,
rc.referenced_column_name
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE fk
JOIN INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS rc
ON fk.constraint_name = rc.constraint_name
WHERE rc.referenced_table_name IS NOT NULL
ORDER BY fk.table_schema, fk.table_name;
|
# Indexes
1
2
3
4
5
6
7
8
9
10
11
12
|
-- List indexes (implementation varies by database system)
-- For PostgreSQL:
SELECT
t.relname AS table_name,
i.relname AS index_name,
a.attname AS column_name
FROM pg_class t
JOIN pg_index ix ON t.oid = ix.indrelid
JOIN pg_class i ON i.oid = ix.indexrelid
JOIN pg_attribute a ON a.attnum = ANY(ix.indkey) AND a.attrelid = t.oid
WHERE t.relkind = 'r'
ORDER BY t.relname, i.relname;
|
# Database Size and Growth
1
2
3
4
5
6
|
-- Database size (PostgreSQL example)
SELECT
pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) DESC;
|
# Table Size
1
2
3
4
5
6
7
|
-- Table sizes (PostgreSQL example)
SELECT
table_name,
pg_size_pretty(pg_total_relation_size(table_name)) AS total_size
FROM information_schema.tables
WHERE table_schema = 'your_schema'
ORDER BY pg_total_relation_size(table_name) DESC;
|
# Supported Databases
Full Support:
- MySQL/MariaDB: Extensive implementation of INFORMATION_SCHEMA
- Microsoft SQL Server: Comprehensive support with Microsoft extensions
- PostgreSQL: Supports INFORMATION_SCHEMA plus additional system catalogs
- Oracle Database: Has INFORMATION_SCHEMA views (but traditionally relies more on its own data dictionary views like ALL_TABLES, DBA_TABLES)
- IBM Db2: Supports INFORMATION_SCHEMA
Partial/Alternative Implementations:
- SQLite: Does not have INFORMATION_SCHEMA but provides similar functionality through SQLite system tables and PRAGMA statements
- DuckDB: Has INFORMATION_SCHEMA
- H2 Database: Supports INFORMATION_SCHEMA
- Snowflake: Supports INFORMATION_SCHEMA with cloud data warehouse adaptations
- Amazon Redshift: Provides INFORMATION_SCHEMA views
- Google BigQuery: Has INFORMATION_SCHEMA with cloud-specific adaptations
Origin: Open Table Format Catalogs