MongoDb - Storage Engines in MongoDB (WiredTiger)
A storage engine is the internal component of a database that manages how data is stored on disk and how it is retrieved. In simple terms, it decides:
-
How data is written to disk
-
How memory (RAM) is used
-
How concurrency (multiple users accessing data at the same time) is handled
-
How crash recovery works
In MongoDB, the default and most important storage engine is WiredTiger.
Why Storage Engines Matter
When you build real-world applications (e-commerce, banking apps, student portals), the database must:
-
Handle many users simultaneously
-
Avoid data corruption during crashes
-
Maintain high performance
-
Use disk space efficiently
The storage engine directly affects all of these.
WiredTiger – Detailed Explanation
1. Document-Level Concurrency
Older storage engines used database-level or collection-level locking. That means if one operation was writing data, other operations had to wait.
WiredTiger supports document-level locking.
This means:
-
Multiple users can modify different documents in the same collection at the same time.
-
Performance improves significantly under heavy load.
Example:
If 100 users update different profiles simultaneously, they won’t block each other.
2. Compression
WiredTiger supports data compression by default.
It compresses:
-
Data files
-
Indexes
Benefits:
-
Reduces disk usage
-
Reduces I/O (Input/Output) operations
-
Improves performance on disk-heavy systems
This is very important in production environments where storage cost matters.
3. Journaling (Crash Recovery)
Journaling ensures that data is not lost if the system crashes.
When a write operation happens:
-
It is first recorded in a journal file.
-
Then written to the main data files.
If the server crashes:
-
MongoDB replays the journal
-
Restores data to a consistent state
This provides durability, which is part of ACID properties.
4. Checkpointing
WiredTiger periodically creates checkpoints.
A checkpoint:
-
Saves a stable snapshot of data to disk.
-
Reduces recovery time after crashes.
This means the system does not need to replay the entire history during recovery.
5. Memory Usage (Cache System)
WiredTiger uses an internal cache system.
-
Frequently accessed data is kept in RAM.
-
Rarely accessed data stays on disk.
By default, it uses around 50% of available RAM for cache (in standalone setups).
This improves read performance significantly.
Other Storage Engines (Brief Mention)
Earlier versions of MongoDB used:
-
MMAPv1 (deprecated)
MMAPv1 had:
-
Collection-level locking
-
Less efficient concurrency
-
No compression
That’s why WiredTiger replaced it.