SQL - SQL Index Internals (B-Tree, Hash, and Bitmap Indexing)
Indexes are one of the most important performance optimization tools in SQL databases. Internally, an index is a data structure that allows the database engine to locate and retrieve rows much faster than scanning the entire table. Understanding how different types of indexes work internally helps in choosing the right one for specific use cases and avoiding performance issues.
What Happens Internally in an Index
When you create an index on a column, the database builds a separate structure that stores the indexed column values along with pointers (row identifiers) to the actual table rows. Instead of scanning the full table, the database uses this structure to quickly locate matching records.
Think of it like a book index: instead of reading every page, you look up a keyword and jump directly to the relevant page.
B-Tree Index (Balanced Tree Index)
The B-Tree index is the most commonly used indexing structure in relational databases.
Internal Structure
A B-Tree is a balanced tree consisting of:
-
Root node
-
Intermediate (branch) nodes
-
Leaf nodes
The data is stored in a sorted manner. Each node contains keys and pointers to child nodes. The leaf nodes contain the actual indexed values and references to table rows.
How It Works
When a query searches for a value:
-
The database starts at the root node
-
It compares the search value with keys in the node
-
It follows the appropriate branch
-
This continues until it reaches the leaf node
-
The matching row pointer is retrieved
Because the tree is balanced, the number of steps required is very small, even for large datasets.
Use Cases
-
Equality searches (WHERE id = 10)
-
Range queries (WHERE salary BETWEEN 50000 AND 100000)
-
Sorting (ORDER BY)
Advantages
-
Efficient for both exact matches and range queries
-
Automatically keeps data sorted
-
Scales well with large datasets
Limitations
-
Slight overhead for insert, update, and delete operations
-
Not as fast as hash indexes for exact matches
Hash Index
Hash indexing is based on a hash function that transforms a key into a fixed-size value called a hash code.
Internal Structure
-
A hash function computes a bucket location
-
Data is stored in buckets based on hash values
-
Each bucket contains pointers to rows with matching hash codes
How It Works
When searching for a value:
-
The database applies the hash function to the search key
-
It finds the corresponding bucket
-
It retrieves matching rows from that bucket
This makes lookup operations extremely fast.
Use Cases
-
Exact match queries (WHERE username = 'John')
-
High-speed key-value lookups
Advantages
-
Very fast for equality comparisons
-
Constant-time lookup in ideal conditions
Limitations
-
Cannot be used for range queries
-
Performance depends on hash function quality
-
Collisions (different values mapping to same bucket) can slow performance
Bitmap Index
Bitmap indexes are designed for columns with a limited number of distinct values (low cardinality).
Internal Structure
Instead of storing values directly, a bitmap index uses binary vectors (bitmaps):
-
Each unique value has a bitmap
-
Each bit represents a row in the table
-
A bit is set to 1 if the row contains that value, otherwise 0
How It Works
For a query:
-
The database retrieves the bitmap for the requested value
-
It performs bitwise operations (AND, OR) for multiple conditions
-
The resulting bitmap identifies matching rows
Use Cases
-
Columns with few distinct values (gender, status, category)
-
Data warehousing and analytical queries
-
Complex filtering with multiple conditions
Advantages
-
Extremely efficient for read-heavy queries
-
Fast combination of multiple conditions using bit operations
-
Reduces I/O in large analytical datasets
Limitations
-
Not suitable for high-cardinality columns (like unique IDs)
-
Expensive to maintain during frequent updates
-
Best suited for read-mostly environments
Comparison of Index Types
| Feature | B-Tree Index | Hash Index | Bitmap Index |
|---|---|---|---|
| Best For | Range + equality queries | Exact matches | Low-cardinality data |
| Sorting Support | Yes | No | No |
| Range Queries | Yes | No | Limited |
| Insert/Update Cost | Moderate | Low to moderate | High |
| Query Speed | Balanced | Very fast (exact match) | Very fast (analytics) |
Choosing the Right Index
-
Use B-Tree indexes for most general-purpose queries
-
Use Hash indexes when queries are strictly equality-based
-
Use Bitmap indexes in data warehouses with low-cardinality columns
Conclusion
Understanding the internal workings of B-Tree, Hash, and Bitmap indexes is essential for effective database design and performance tuning. Each index type is optimized for different query patterns, and selecting the right one can significantly reduce query execution time. In real-world systems, a combination of these indexing strategies is often used to balance performance across transactional and analytical workloads.