SQL - SQL Index Internals (B-Tree, Hash, and Bitmap Indexing)

Indexes are one of the most important performance optimization tools in SQL databases. Internally, an index is a data structure that allows the database engine to locate and retrieve rows much faster than scanning the entire table. Understanding how different types of indexes work internally helps in choosing the right one for specific use cases and avoiding performance issues.


What Happens Internally in an Index

When you create an index on a column, the database builds a separate structure that stores the indexed column values along with pointers (row identifiers) to the actual table rows. Instead of scanning the full table, the database uses this structure to quickly locate matching records.

Think of it like a book index: instead of reading every page, you look up a keyword and jump directly to the relevant page.


B-Tree Index (Balanced Tree Index)

The B-Tree index is the most commonly used indexing structure in relational databases.

Internal Structure

A B-Tree is a balanced tree consisting of:

  • Root node

  • Intermediate (branch) nodes

  • Leaf nodes

The data is stored in a sorted manner. Each node contains keys and pointers to child nodes. The leaf nodes contain the actual indexed values and references to table rows.

How It Works

When a query searches for a value:

  1. The database starts at the root node

  2. It compares the search value with keys in the node

  3. It follows the appropriate branch

  4. This continues until it reaches the leaf node

  5. The matching row pointer is retrieved

Because the tree is balanced, the number of steps required is very small, even for large datasets.

Use Cases

  • Equality searches (WHERE id = 10)

  • Range queries (WHERE salary BETWEEN 50000 AND 100000)

  • Sorting (ORDER BY)

Advantages

  • Efficient for both exact matches and range queries

  • Automatically keeps data sorted

  • Scales well with large datasets

Limitations

  • Slight overhead for insert, update, and delete operations

  • Not as fast as hash indexes for exact matches


Hash Index

Hash indexing is based on a hash function that transforms a key into a fixed-size value called a hash code.

Internal Structure

  • A hash function computes a bucket location

  • Data is stored in buckets based on hash values

  • Each bucket contains pointers to rows with matching hash codes

How It Works

When searching for a value:

  1. The database applies the hash function to the search key

  2. It finds the corresponding bucket

  3. It retrieves matching rows from that bucket

This makes lookup operations extremely fast.

Use Cases

  • Exact match queries (WHERE username = 'John')

  • High-speed key-value lookups

Advantages

  • Very fast for equality comparisons

  • Constant-time lookup in ideal conditions

Limitations

  • Cannot be used for range queries

  • Performance depends on hash function quality

  • Collisions (different values mapping to same bucket) can slow performance


Bitmap Index

Bitmap indexes are designed for columns with a limited number of distinct values (low cardinality).

Internal Structure

Instead of storing values directly, a bitmap index uses binary vectors (bitmaps):

  • Each unique value has a bitmap

  • Each bit represents a row in the table

  • A bit is set to 1 if the row contains that value, otherwise 0

How It Works

For a query:

  1. The database retrieves the bitmap for the requested value

  2. It performs bitwise operations (AND, OR) for multiple conditions

  3. The resulting bitmap identifies matching rows

Use Cases

  • Columns with few distinct values (gender, status, category)

  • Data warehousing and analytical queries

  • Complex filtering with multiple conditions

Advantages

  • Extremely efficient for read-heavy queries

  • Fast combination of multiple conditions using bit operations

  • Reduces I/O in large analytical datasets

Limitations

  • Not suitable for high-cardinality columns (like unique IDs)

  • Expensive to maintain during frequent updates

  • Best suited for read-mostly environments


Comparison of Index Types

Feature B-Tree Index Hash Index Bitmap Index
Best For Range + equality queries Exact matches Low-cardinality data
Sorting Support Yes No No
Range Queries Yes No Limited
Insert/Update Cost Moderate Low to moderate High
Query Speed Balanced Very fast (exact match) Very fast (analytics)

Choosing the Right Index

  • Use B-Tree indexes for most general-purpose queries

  • Use Hash indexes when queries are strictly equality-based

  • Use Bitmap indexes in data warehouses with low-cardinality columns


Conclusion

Understanding the internal workings of B-Tree, Hash, and Bitmap indexes is essential for effective database design and performance tuning. Each index type is optimized for different query patterns, and selecting the right one can significantly reduce query execution time. In real-world systems, a combination of these indexing strategies is often used to balance performance across transactional and analytical workloads.