SQL - SQL Index Internals (B-Tree, Hash, and Bitmap Indexing)

Indexes are one of the most important performance optimization tools in SQL databases. Internally, an index is a data structure that allows the database engine to locate and retrieve rows much faster than scanning the entire table. Understanding how different types of indexes work internally helps in choosing the right one for specific use cases and avoiding performance issues.

What Happens Internally in an Index

When you create an index on a column, the database builds a separate structure that stores the indexed column values along with pointers (row identifiers) to the actual table rows. Instead of scanning the full table, the database uses this structure to quickly locate matching records.

Think of it like a book index: instead of reading every page, you look up a keyword and jump directly to the relevant page.

B-Tree Index (Balanced Tree Index)

The B-Tree index is the most commonly used indexing structure in relational databases.

Internal Structure

A B-Tree is a balanced tree consisting of:

Root node
Intermediate (branch) nodes
Leaf nodes

The data is stored in a sorted manner. Each node contains keys and pointers to child nodes. The leaf nodes contain the actual indexed values and references to table rows.

How It Works

When a query searches for a value:

The database starts at the root node
It compares the search value with keys in the node
It follows the appropriate branch
This continues until it reaches the leaf node
The matching row pointer is retrieved

Because the tree is balanced, the number of steps required is very small, even for large datasets.

Use Cases

Equality searches (WHERE id = 10)
Range queries (WHERE salary BETWEEN 50000 AND 100000)
Sorting (ORDER BY)

Advantages

Efficient for both exact matches and range queries
Automatically keeps data sorted
Scales well with large datasets

Limitations

Slight overhead for insert, update, and delete operations
Not as fast as hash indexes for exact matches

Hash Index

Hash indexing is based on a hash function that transforms a key into a fixed-size value called a hash code.

Internal Structure

A hash function computes a bucket location
Data is stored in buckets based on hash values
Each bucket contains pointers to rows with matching hash codes

How It Works

When searching for a value:

The database applies the hash function to the search key
It finds the corresponding bucket
It retrieves matching rows from that bucket

This makes lookup operations extremely fast.

Use Cases

Exact match queries (WHERE username = 'John')
High-speed key-value lookups

Advantages

Very fast for equality comparisons
Constant-time lookup in ideal conditions

Limitations

Cannot be used for range queries
Performance depends on hash function quality
Collisions (different values mapping to same bucket) can slow performance

Bitmap Index

Bitmap indexes are designed for columns with a limited number of distinct values (low cardinality).

Internal Structure

Instead of storing values directly, a bitmap index uses binary vectors (bitmaps):

Each unique value has a bitmap
Each bit represents a row in the table
A bit is set to 1 if the row contains that value, otherwise 0

How It Works

For a query:

The database retrieves the bitmap for the requested value
It performs bitwise operations (AND, OR) for multiple conditions
The resulting bitmap identifies matching rows

Use Cases

Columns with few distinct values (gender, status, category)
Data warehousing and analytical queries
Complex filtering with multiple conditions

Advantages

Extremely efficient for read-heavy queries
Fast combination of multiple conditions using bit operations
Reduces I/O in large analytical datasets

Limitations

Not suitable for high-cardinality columns (like unique IDs)
Expensive to maintain during frequent updates
Best suited for read-mostly environments

Comparison of Index Types

Feature	B-Tree Index	Hash Index	Bitmap Index
Best For	Range + equality queries	Exact matches	Low-cardinality data
Sorting Support	Yes	No	No
Range Queries	Yes	No	Limited
Insert/Update Cost	Moderate	Low to moderate	High
Query Speed	Balanced	Very fast (exact match)	Very fast (analytics)

Choosing the Right Index

Use B-Tree indexes for most general-purpose queries
Use Hash indexes when queries are strictly equality-based
Use Bitmap indexes in data warehouses with low-cardinality columns

Conclusion

Understanding the internal workings of B-Tree, Hash, and Bitmap indexes is essential for effective database design and performance tuning. Each index type is optimized for different query patterns, and selecting the right one can significantly reduce query execution time. In real-world systems, a combination of these indexing strategies is often used to balance performance across transactional and analytical workloads.